Working with Files – Part 1
Sequential File Processing Statements and Functions
Processing a Comma-Delimited File
Visual Basic provides the capability of processing three types of files:
sequential files Files that must be read in the same order in which they were written – one after the other with no skipping around
binary files "unstructured" files which are read from or written to as series of bytes, where it is up to the programmer to specify the format of the file
random files files which support "direct access" by record number
These three file types are "native" to Visual Basic and its predecessors (QBasic, GW-BASIC, etc.). The next several topics address VB's sequential file processing capabilities. Binary and Random files will be covered in later topics.
The following sequential file-related statements and functions will be discussed:
Open |
Prepares a file to be processed by the VB program. |
App.Path |
Supplies the path of your application |
FreeFile |
Supplies a file number that is not already in use |
Input # |
Reads fields from a comma-delimited sequential file |
Line Input # |
Reads a line (up to the carriage return) from a sequential file |
EOF |
Tests for end-of-file |
Write # |
Writes fields to a sequential file in comma-delimited format |
Print # |
Writes a formatted line of output to a sequential file |
Close # |
Closes a file |
As you know, a data file consists of records, which consist of fields. The file that will be used for all examples in this section is a simplified employee file, which consists of the following fields:
Field |
Data Type |
Employee Name |
String |
Department Number |
Integer |
Job Title |
String |
Hire Date |
Date |
Hourly Rate |
Single |
Suppose there were five records in the file. A graphic representation of the file populated with the five data records follows (the field names are not stored in the file):
Employee Name |
Dept # |
Job Title |
Hire Date |
Hourly Rate |
ANDY ANDERSON |
100 |
PROGRAMMER |
3/4/1997 |
25.00 |
BILLY BABCOCK |
110 |
SYSTEMS ANALYST |
2/16/1996 |
33.50 |
CHARLIE CHEESEMAN |
100 |
COMPUTER OPERATOR |
3/1/1996 |
15.00 |
DARLENE DUNCAN |
200 |
RECEPTIONIST |
10/11/1998 |
12.75 |
ERNIE EACHUS |
300 |
MAIL ROOM CLERK |
8/19/1997 |
10.00 |
Please note that the data types for these fields are the data types of the variables into which these fields will be stored. On the sequential file, all fields will be represented as a string of characters.
Following are three different ways that the data in this sequential file might be stored; for example, if you opened up a sequential data file in a text editor such as Notepad, this is what you might see.
Each field is separated by a comma. Both string and numeric fields are "trimmed" (contain no extraneous spaces or zeroes). String fields are enclosed in quotes (Note: The quotes enclosing the string fields are optional, VB and other applications that can read comma-delimited files will access the string fields properly with or without the quotes. The only time a string field MUST be enclosed in quotes is when it contains an embedded comma.) If Date fields are enclosed in pound signs (#), VB will automatically recognize the field as the Date data type. If the Date fields are enclosed in quotes instead, you need to use the CDate function to convert the date from string format to the Date data type.
"ANDY ANDERSON",100,"PROGRAMMER",#3/4/1997#,25
"BILLY BABCOCK",110,"SYSTEMS ANALYST",#2/16/1996#,33.5
"CHARLIE CHEESEMAN",100,"COMPUTER OPERATOR",#3/1/1996#,15
"DARLENE DUNCAN",200,"RECEPTIONIST",#10/11/1998#,12.75
"ERNIE EACHUS",300,"MAIL ROOM CLERK",#8/19/1997#,10
In some sequential data files, fields are stored in a fixed position. On each record, a particular field starts and ends in the same position and occupies the same amount of space. In a "print" format file, each line (record) of the file consists of a formatted "detail line" containing each field (as if the lines were intended to be printed on a hard-copy report).
In the example below, a column position guide is shown above the records. From the example, it should be clear that the employee name occupies positions 1 through 20 of each record (note that names shorter than 20 characters are padded with blank spaces); the department number occupies positions 21 through 24; the job title occupies positions 30 through 50; the hire date occupies positions 51 through 60; and the hourly rate occupies positions 61 through 65.
1 1 2 2 3 3 4 4 5 5 6 6
1...5....0....5....0....5...0....5....0....5....0....5....0....5.
ANDY ANDERSON 100 PROGRAMMER 3/4/1997 25.00
BILLY BABCOCK 110 SYSTEMS ANALYST 2/16/1996 33.50
CHARLIE CHEESEMAN 100 COMPUTER OPERATOR 3/1/1996 15.00
DARLENE DUNCAN 200 RECEPTIONIST 10/11/199812.75
ERNIE EACHUS 300 MAIL ROOM CLERK 8/19/1997 10.00
Typical of sequential files originating on mainframe computers and processed by languages such as COBOL, fields are stored one after the other in a continuous string with no distinguishing marks or white space between them. Although some of the character-string fields can be picked out easily, the numbers are run together and are difficult to interpret unless you know something about the record. Also, numeric fields containing a decimal portion are typically stored without the decimal point (they have an implied decimal point). For example, the employee file might look something like this:
1 1 2 2 3 3 4 4 5 5 6 6
1...5....0....5....0....5...0....5....0....5....0....5....0....5.
ANDY ANDERSON 0100PROGRAMMER 030419972500
BILLY BABCOCK 0110SYSTEMS ANALYST 021619963350
CHARLIE CHEESEMAN 0100COMPUTER OPERATOR030119961500
DARLENE DUNCAN 0200RECEPTIONIST 101119981275
ERNIE EACHUS 0300MAIL ROOM CLERK 081919971000
In the example above, the employee name occupies the first 20 positions of each record; the department number occupies the next four bytes (note that it contains a leading zero); the job title occupies the next 17 bytes; the hire date (stored in MMDDYYYY format with no slashes) occupies the next 10 bytes; and finally, the hourly rate occupies the last four bytes of the record. Note that the hourly rate does not contain a physical decimal point; however, the program that processes this file must "know" that the decimal point is implied (i.e., "2500" means "25.00"). Given the proper data definition, COBOL can interpret the implied decimal point just fine; in VB, we have to convert the string "2500" to a number and then divide it by 100. This technique is shown further below.
The Open Statement
The Open statement prepares a file to be processed in the VB program. It identifies the Windows-system file that will be used in the program and assigns the file a file number that will be used to reference that file for the remainder of the program. The general format is:
Open <filename> [For mode] As [#] <filenumber>
· filename is a legal Windows-system filename, which may include a drive and path; the filename can be specified in the Open statement as either a string constant or a string variable
· mode is one of the following three keywords: Input, Output, or Append.
When a file is opened for Input, that file must already exist.
When a file is opened for Output, if it does not exist, it will be created; if it does exist, its previous contents will be overwritten.
When a file is opened for Append, if it does not exist, it will be created, if it does exist, records will be added to the file after the last record in the file (the previous contents of the file will not be overwritten).
The Input # and Line Input # statements may only be used on files opened in the Input mode; the Write # and Print # may only be used on files open in the Output or Append modes.
· filenumber is an integer from 1 to 511 which is used to associate the Windows-system filename with a number; this number will be used to reference the opened file in all further VB file processing statements in the program.
Examples:
Open "C:\Program Files\EmpMaint\EMPLOYEE.DAT" For Input As #1
Open "A:\EMPLOYEE.DAT" For Input As #1
In order to avoid "hard-coding" the path of a file in your VB program, it is recommended that you use App.Path to reference the path of the file. This way, as long as the file resides in the same directory in which your program is running, the correct path will always be referenced.
For example, if both your program and the data file reside in C:\Program Files\EmpMaint, then that is what App.Path would refer to. So if you concatenate App.Path with a backslash and the name of your data file, then you have a complete reference for your file, which can be used in the Open statement.
Examples:
Open App.Path & "\EMPLOYEE.DAT" For Input As #1
You could also use a string variable to hold the filename, as in the following example:
Dim strEmpFileName As String
strEmpFileName = App.Path & "\EMPLOYEE.DAT"
Open strEmpFileName For Input As #1
A special situation comes up if your program and the data file reside in the root directory of a drive (for example A:\). If you concatenate App.Path with a backslash and the filename, you'll come up with an invalid file reference, such as:
A:\\EMPLOYEE.DAT
To cover both situations and alleviate the pesky "extra backslash" problem, you can use code like the following (the new statements are shown in bold):
Dim strEmpFileName As String
Dim strBackSlash As String
strBackSlash = IIf (Right$(App.Path, 1) = "\", "", "\")
strEmpFileName = App.Path & strBackSlash & "EMPLOYEE.DAT"
Open strEmpFileName For Input As #1
Instead of hard-coding the file number, you can use the VB function FreeFile to supply you with a file number that is not already in use by the system. The FreeFile function takes no arguments and returns an integer. To use it, declare an integer variable, then assign FreeFile to it, as follows:
Dim intEmpFileNbr As Integer
intEmpFileNbr = FreeFile
In the Open statement (and any other statement that refers to this file), use the integer variable rather than the hard-coded number. For example:
Open strEmpFileName For Input As #intEmpFileNbr
Thus, a "full-blown" procedure to open a sequential file for input might look like this:
Dim strEmpFileName As String
Dim strBackSlash As String
Dim intEmpFileNbr As Integer
strBackSlash = IIf (Right$(App.Path, 1) = "\", "", "\")
strEmpFileName = App.Path & strBackSlash & "EMPLOYEE.DAT"
intEmpFileNbr = FreeFile
Open strEmpFileName For Input As #intEmpFileNbr
The Input # Statement
The Input # statement reads a series of fields (usually one "record's worth") from a comma-delimited sequential file, and stores the contents of those fields into the specified variables. The general format is:
Input #<filenumber>, <variable list>
· filenumber refers to the file that was Opened As that number (for Input) in the Open statement
· variable list is a list of variables, separated by commas, into which the data fields from the file will be stored
Example:
Recall the comma-delimited version of the employee file shown earlier:
"ANDY ANDERSON",100, "PROGRAMMER",#3/4/1997#,25
"BILLY BABCOCK",110,"SYSTEMS ANALYST",#2/16/1996#,33.5
"CHARLIE CHEESEMAN",100,"COMPUTER OPERATOR",#3/1/1996#,15
"DARLENE DUNCAN",200,"RECEPTIONIST",#10/11/1998#,12.75
"ERNIE EACHUS",300,"MAIL ROOM CLERK",#8/19/1997#,10
Assume you declare the following variables in your program:
Dim strEmpName As String
Dim intDeptNbr As Integer
Dim strJobTitle As String
Dim dtmHireDate As Date
Dim sngHrlyRate As Single
the statement
Input #intEmpFileNbr, strEmpName, intDeptNbr, strJobTitle, dtmHireDate, sngHrlyRate
would cause ANDY ANDERSON to be stored in strEmpName, 100 to be stored in intDeptNbr, PROGRAMMER to be stored in strJobTitle, 3/4/1997 to be stored in dtmHireDate, and 25 to be stored in sngHrlyRate the first time that the statement was executed. The second time the statement was executed, BILLY BABCOCK, 110, SYSTEMS ANALYST, 2/16/1996, and 33.5 would be stored respectively in strEmpName, intDeptNbr, strJobTitle, dtmHireDate, sngHrlyRate; and so on. As VB reads each field into its respective variable, it automatically performs the conversion to the correct data type (Integer, Date, Single, etc.). As mentioned earlier, VB will only convert an incoming field to the Date data type if that field is enclosed in pound signs (#) – if the field was enclosed in quotes, it would be treated as a string and the CDate function would have to be used to convert it to a Date.
VB "knows" that the data is to be read from the "EMPLOYEE.DAT" file because the Input # statement is referring to file #intEmpFileNbr, and file #intEmpFileNbr was associated with "EMPLOYEE.DAT" in the Open statement.
The EOF function
The operating system automatically appends a special character, called the end-of-file marker, to the end of a sequential file. VB can sense the presence of this end-of-file marker with the EOF function.
A programming language will generally recognize EOF at either one of two times: (1) after the last record has been read – OR – (2) at the same time that the last record has been read. COBOL falls into the first category, VB falls into the second.
FYI: This discussion applies only to how VB processes sequential files – because when VB processes the rows of a database table, it actually handles EOF "the COBOL way".
In a language that recognizes EOF after the last record in the file has been read (such as COBOL), the "input" or "read" loop is set up similar like a prompted dialog loop: with a priming read outside the loop; all subsequent reads occur at the bottom of the loop. The pseudocode might be written as follows:
READ (first) RECORD
DO UNTIL EOF
PROCESS THE RECORD
READ (next) RECORD
LOOP
In a language that recognizes EOF when the last record is read (such as VB), the "input" or "read" loop must be modified so that there is NO PRIMING READ and the read occurs as the FIRST statement in the body of the processing loop. The pseudocode might be written as follows:
DO UNTIL EOF
READ A RECORD
PROCESS THE RECORD
LOOP
The syntax of the EOF function is EOF(n) where n is a number corresponding to the file number of the file from which you want to read data. n can either be a hard-coded number or an integer variable, depending on whether or not you used FreeFile in the Open statement.
The EOF function can be used anywhere that a conditional expression can be used; as such, it must always follow keywords such as UNTIL, WHILE, and IF. The EOF function can also be preceded by the keyword NOT: for example, Do Until EOF(1) is equivalent to Do While Not EOF(1).
The main loop to process the employee file might look like this (note that there is no "priming" read and that the input is done at the top of the loop):
Do Until EOF(intEmpFileNbr)
Input #intEmpFileNbr, strEmpName, intDeptNbr, strJobTitle, dtmHireDate, sngHrlyRate
' Processing for the record would go here – for example, load some of these
' fields into an element of an array or list box, print a line of a report, etc...
Loop
Building on what has been discussed thus far, the "full-blown" procedure to process a comma-delimited sequential file for input might look like this:
Dim strEmpFileName As String
Dim strBackSlash As String
Dim intEmpFileNbr As Integer
Dim strEmpName As String
Dim intDeptNbr As Integer
Dim strJobTitle As String
Dim dtmHireDate As Date
Dim sngHrlyRate As Single
strBackSlash = IIf (Right$(App.Path, 1) = "\", "", "\")
strEmpFileName = App.Path & strBackSlash & "EMPLOYEE.DAT"
intEmpFileNbr = FreeFile
Open strEmpFileName For Input As #intEmpFileNbr
Do Until EOF(intEmpFileNbr)
Input #intEmpFileNbr, strEmpName, intDeptNbr, strJobTitle, dtmHireDate, sngHrlyRate
Print strEmpName; _
Tab(25); Format$(intDeptNbr, "@@@@"); _
Tab(35); strJobTitle; _
Tab(55); Format$(dtmHireDate, "mm/dd/yyyy"); _
Tab(70); Format$(Format$(sngHrlyRate, "Standard"), "@@@@@@@")
Loop
When you are finished using a file in your program, you should Close that file. The Close statement tells VB that you are done using a file, and frees up the system resources needed to process that file.
The statement
Close #1
frees the resources used by the file referenced as #1, and also terminates the association between the Windows-system file and the file number – so at this point, if you wanted to, you could Open some other file AS #1.
If you have more than one file open in a program, you can close multiple files with one Close statement by separating the file numbers with commas:
Close #1, #2, #68
The statement
Close
by itself on one line closes all files that are currently open.
In the "full-blown" example above, the following line should be added to the end (after the "Loop" statement):
Close #intEmpFileNbr
To demonstrate the code above
Download the VB project code for the example above here.