Working with Files – Part 1

Sequential File Processing Statements and Functions

Processing a Comma-Delimited File

 

Visual Basic provides the capability of processing three types of files:

 

sequential files            Files that must be read in the same order in which they were written – one after the other with no skipping around

 

binary files                   "unstructured" files which are read from or written to as series of bytes, where it is up to the programmer to specify the format of the file

 

random files                files which support "direct access" by record number

 

These three file types are "native" to Visual Basic and its predecessors (QBasic, GW-BASIC, etc.).  The next several topics address VB's sequential file processing capabilities. Binary and Random files will be covered in later topics.

 

The following sequential file-related statements and functions will be discussed:

 

Open

Prepares a file to be processed by the VB program.

App.Path

Supplies the path of your application

FreeFile

Supplies a file number that is not already in use

Input #

Reads fields from a comma-delimited sequential file

Line Input #

Reads a line (up to the carriage return) from a sequential file

EOF

Tests for end-of-file

Write #

Writes fields to a sequential file in comma-delimited format

Print #

Writes a formatted line of output to a sequential file

Close #

Closes a file

 

As you know, a data file consists of records, which consist of fields.  The file that will be used for all examples in this section is a simplified employee file, which consists of the following fields:

 

Field

Data Type

Employee Name

String

Department Number

Integer

Job Title

String

Hire Date

Date

Hourly Rate

Single

 

Suppose there were five records in the file.  A graphic representation of the file populated with the five data records follows (the field names are not stored in the file):

 

Employee Name

Dept #

Job Title

Hire Date

Hourly Rate

ANDY ANDERSON

100

PROGRAMMER

3/4/1997

25.00

BILLY BABCOCK

110

SYSTEMS ANALYST

2/16/1996

33.50

CHARLIE CHEESEMAN

100

COMPUTER OPERATOR

3/1/1996

15.00

DARLENE DUNCAN

200

RECEPTIONIST

10/11/1998

12.75

ERNIE EACHUS

300

MAIL ROOM CLERK

8/19/1997

10.00

 

Please note that the data types for these fields are the data types of the variables into which these fields will be stored.  On the sequential file, all fields will be represented as a string of characters.

 

Following are three different ways that the data in this sequential file might be stored; for example, if you opened up a sequential data file in a text editor such as Notepad, this is what you might see.

 

Scenario 1: Comma-Delimited Format

 

Each field is separated by a comma.  Both string and numeric fields are "trimmed" (contain no extraneous spaces or zeroes).  String fields are enclosed in quotes (Note: The quotes enclosing the string fields are optional, VB and other applications that can read comma-delimited files will access the string fields properly with or without the quotes.  The only time a string field MUST be enclosed in quotes is when it contains an embedded comma.) If Date fields are enclosed in pound signs (#), VB will automatically recognize the field as the Date data type.  If the Date fields are enclosed in quotes instead, you need to use the CDate function to convert the date from string format to the Date data type.

 

"ANDY ANDERSON",100,"PROGRAMMER",#3/4/1997#,25

"BILLY BABCOCK",110,"SYSTEMS ANALYST",#2/16/1996#,33.5

"CHARLIE CHEESEMAN",100,"COMPUTER OPERATOR",#3/1/1996#,15

"DARLENE DUNCAN",200,"RECEPTIONIST",#10/11/1998#,12.75

"ERNIE EACHUS",300,"MAIL ROOM CLERK",#8/19/1997#,10

 

Scenario 2: Fixed-Width ("Print" Format)

 

In some sequential data files, fields are stored in a fixed position.  On each record, a particular field starts and ends in the same position and occupies the same amount of space.  In a "print" format file, each line (record) of the file consists of a formatted "detail line" containing each field (as if the lines were intended to be printed on a hard-copy report).

 

In the example below, a column position guide is shown above the records.  From the example, it should be clear that the employee name occupies positions 1 through 20 of each record (note that names shorter than 20 characters are padded with blank spaces); the department number occupies positions 21 through 24; the job title occupies positions 30 through 50; the hire date occupies positions 51 through 60; and the hourly rate occupies positions 61 through 65.

 

            1    1    2    2   3    3    4    4    5    5    6    6

   1...5....0....5....0....5...0....5....0....5....0....5....0....5.    

ANDY ANDERSON        100    PROGRAMMER           3/4/1997  25.00

BILLY BABCOCK        110    SYSTEMS ANALYST      2/16/1996 33.50

CHARLIE CHEESEMAN    100    COMPUTER OPERATOR    3/1/1996  15.00

DARLENE DUNCAN       200    RECEPTIONIST         10/11/199812.75

ERNIE EACHUS         300    MAIL ROOM CLERK      8/19/1997 10.00

 

Scenario 3: Fixed-Width ("COBOL-Style" Format)

 

Typical of sequential files originating on mainframe computers and processed by languages such as COBOL, fields are stored one after the other in a continuous string with no distinguishing marks or white space between them.  Although some of the character-string fields can be picked out easily, the numbers are run together and are difficult to interpret unless you know something about the record.  Also, numeric fields containing a decimal portion are typically stored without the decimal point (they have an implied decimal point).  For example, the employee file might look something like this:

 

            1    1    2    2   3    3    4    4    5    5    6    6

   1...5....0....5....0....5...0....5....0....5....0....5....0....5.   

ANDY ANDERSON       0100PROGRAMMER       030419972500

BILLY BABCOCK       0110SYSTEMS ANALYST  021619963350

CHARLIE CHEESEMAN   0100COMPUTER OPERATOR030119961500

DARLENE DUNCAN      0200RECEPTIONIST     101119981275

ERNIE EACHUS        0300MAIL ROOM CLERK  081919971000

 

In the example above, the employee name occupies the first 20 positions of each record; the department number occupies the next four bytes (note that it contains a leading zero); the job title occupies the next 17 bytes; the hire date (stored in MMDDYYYY format with no slashes) occupies the next 10 bytes; and finally, the hourly rate occupies the last four bytes of the record.  Note that the hourly rate does not contain a physical decimal point; however, the program that processes this file must "know" that the decimal point is implied (i.e., "2500" means "25.00").  Given the proper data definition, COBOL can interpret the implied decimal point just fine; in VB, we have to convert the string "2500" to a number and then divide it by 100.  This technique is shown further below.

 

VB Statements and Functions for Sequential File Processing

 

The Open Statement

 

The Open statement prepares a file to be processed in the VB program.  It identifies the Windows-system file that will be used in the program and assigns the file a file number that will be used to reference that file for the remainder of the program.  The general format is:

                        Open <filename> [For mode] As [#] <filenumber>

 

·         filename is a legal Windows-system filename, which may include a drive and path; the filename can be specified in the Open statement as either a string constant or a string variable

 

·         mode is one of the following three keywords: Input, Output, or Append.

 

When a file is opened for Input, that file must already exist.

 

When a file is opened for Output, if it does not exist, it will be created; if it does exist, its previous contents will be overwritten.

 

When a file is opened for Append, if it does not exist, it will be created, if it does exist, records will be added to the file after the last record in the file (the previous contents of the file will not be overwritten).

 

The Input # and Line Input # statements may only be used on files opened in the Input mode; the Write # and Print # may only be used on files open in the Output or Append modes.

 

·         filenumber is an integer from 1 to 511 which is used to associate the Windows-system filename with a number; this number will be used to reference the opened file in all further VB file processing statements in the program.

 

Examples:

            Open "C:\Program Files\EmpMaint\EMPLOYEE.DAT" For Input As #1

Open "A:\EMPLOYEE.DAT" For Input As #1

 

 

Using App.Path

 

In order to avoid "hard-coding" the path of a file in your VB program, it is recommended that you use App.Path to reference the path of the file.  This way, as long as the file resides in the same directory in which your program is running, the correct path will always be referenced. 

 

For example, if both your program and the data file reside in C:\Program Files\EmpMaint, then that is what App.Path would refer to.  So if you concatenate App.Path with a backslash and the name of your data file, then you have a complete reference for your file, which can be used in the Open statement.

 

Examples:

            Open App.Path & "\EMPLOYEE.DAT" For Input As #1

 

You could also use a string variable to hold the filename, as in the following example:

 

            Dim strEmpFileName As String

strEmpFileName = App.Path & "\EMPLOYEE.DAT"

Open strEmpFileName For Input As #1

 

A special situation comes up if your program and the data file reside in the root directory of a drive (for example A:\).  If you concatenate App.Path with a backslash and the filename, you'll come up with an invalid file reference, such as:

 

A:\\EMPLOYEE.DAT

 

To cover both situations and alleviate the pesky "extra backslash" problem, you can use code like the following (the new statements are shown in bold):

 

            Dim strEmpFileName      As String

      Dim strBackSlash        As String

      strBackSlash = IIf (Right$(App.Path, 1) = "\", "", "\")                      

      strEmpFileName = App.Path & strBackSlash & "EMPLOYEE.DAT"

      Open strEmpFileName For Input As #1

 

 

Using FreeFile

 

Instead of hard-coding the file number, you can use the VB function FreeFile to supply you with a file number that is not already in use by the system.  The FreeFile function takes no arguments and returns an integer.  To use it, declare an integer variable, then assign FreeFile to it, as follows:

 

      Dim intEmpFileNbr As Integer

      intEmpFileNbr = FreeFile

 

In the Open statement (and any other statement that refers to this file), use the integer variable rather than the hard-coded number.  For example:

 

            Open strEmpFileName For Input As #intEmpFileNbr

 

Thus, a "full-blown" procedure to open a sequential file for input might look like this:

 

            Dim strEmpFileName      As String

      Dim strBackSlash  As String

      Dim intEmpFileNbr As Integer

 

      strBackSlash = IIf (Right$(App.Path, 1) = "\", "", "\")                      

      strEmpFileName = App.Path & strBackSlash & "EMPLOYEE.DAT"

      intEmpFileNbr = FreeFile

 

Open strEmpFileName For Input As #intEmpFileNbr

 

 

The Input # Statement

 

The Input # statement reads a series of fields (usually one "record's worth") from a comma-delimited sequential file, and stores the contents of those fields into the specified variables.  The general format is:

 

            Input #<filenumber>, <variable list>

 

·         filenumber refers to the file that was Opened As that number (for Input) in the Open statement

 

·         variable list is a list of variables, separated by commas, into which the data fields from the file will be stored

 

Example:

Recall the comma-delimited version of the employee file shown earlier:

 

"ANDY ANDERSON",100, "PROGRAMMER",#3/4/1997#,25

"BILLY BABCOCK",110,"SYSTEMS ANALYST",#2/16/1996#,33.5

"CHARLIE CHEESEMAN",100,"COMPUTER OPERATOR",#3/1/1996#,15

"DARLENE DUNCAN",200,"RECEPTIONIST",#10/11/1998#,12.75

"ERNIE EACHUS",300,"MAIL ROOM CLERK",#8/19/1997#,10

 

Assume you declare the following variables in your program:

     

      Dim strEmpName    As String

      Dim intDeptNbr    As Integer

      Dim strJobTitle   As String

      Dim dtmHireDate   As Date

      Dim sngHrlyRate   As Single

 

the statement

 

      Input #intEmpFileNbr, strEmpName, intDeptNbr, strJobTitle, dtmHireDate, sngHrlyRate

 

would cause ANDY ANDERSON to be stored in strEmpName, 100 to be stored in intDeptNbr, PROGRAMMER to be stored in strJobTitle, 3/4/1997 to be stored in dtmHireDate, and 25 to be stored in sngHrlyRate the first time that the statement was executed.  The second time the statement was executed, BILLY BABCOCK, 110, SYSTEMS ANALYST, 2/16/1996, and 33.5 would be stored respectively in strEmpName, intDeptNbr, strJobTitle, dtmHireDate, sngHrlyRate; and so on.  As VB reads each field into its respective variable, it automatically performs the conversion to the correct data type (Integer, Date, Single, etc.).  As mentioned earlier, VB will only convert an incoming field to the Date data type if that field is enclosed in pound signs (#) – if the field was enclosed in quotes, it would be treated as a string and the CDate function would have to be used to convert it to a Date.

 

VB "knows" that the data is to be read from the "EMPLOYEE.DAT" file because the Input # statement is referring to file #intEmpFileNbr, and file #intEmpFileNbr was associated with "EMPLOYEE.DAT" in the Open statement.

 

 

The EOF function

 

The operating system automatically appends a special character, called the end-of-file marker, to the end of a sequential file.  VB can sense the presence of this end-of-file marker with the EOF function.

 

A programming language will generally recognize EOF at either one of two times: (1) after the last record has been read – OR – (2) at the same time that the last record has been read.  COBOL falls into the first category, VB falls into the second.

 

FYI: This discussion applies only to how VB processes sequential files – because when VB processes the rows of a database table, it actually handles EOF "the COBOL way".

 

In a language that recognizes EOF after the last record in the file has been read (such as COBOL), the "input" or "read" loop is set up similar like a prompted dialog loop: with a priming read outside the loop; all subsequent reads occur at the bottom of the loop.  The pseudocode might be written as follows:

 

            READ (first) RECORD

            DO UNTIL EOF

                PROCESS THE RECORD

                READ (next) RECORD

            LOOP

 

In a language that recognizes EOF when the last record is read (such as VB), the "input" or "read" loop must be modified so that there is NO PRIMING READ and the read occurs as the FIRST statement in the body of the processing loop. The pseudocode might be written as follows:

 

            DO UNTIL EOF

                READ A RECORD

                PROCESS THE RECORD

            LOOP

 

The syntax of the EOF function is EOF(n) where n is a number corresponding to the file number of the file from which you want to read data. n can either be a hard-coded number or an integer variable, depending on whether or not you used FreeFile in the Open statement.

 

The EOF function can be used anywhere that a conditional expression can be used; as such, it must always follow keywords such as UNTIL, WHILE, and IF.  The EOF function can also be preceded by the keyword NOT: for example, Do Until EOF(1) is equivalent to Do While Not EOF(1).

 

The main loop to process the employee file might look like this (note that there is no "priming" read and that the input is done at the top of the loop):

 

    Do Until EOF(intEmpFileNbr)

       Input #intEmpFileNbr, strEmpName, intDeptNbr, strJobTitle, dtmHireDate, sngHrlyRate

       ' Processing for the record would go here – for example, load some of these

       ' fields into an element of an array or list box, print a line of a report, etc...

    Loop

 

Building on what has been discussed thus far, the "full-blown" procedure to process a comma-delimited sequential file for input might look like this:

 

   Dim strEmpFileName     As String

  Dim strBackSlash      As String

  Dim intEmpFileNbr     As Integer

 

  Dim strEmpName  As String

  Dim intDeptNbr  As Integer

  Dim strJobTitle As String

  Dim dtmHireDate As Date

  Dim sngHrlyRate As Single

 

  strBackSlash = IIf (Right$(App.Path, 1) = "\", "", "\")                    

  strEmpFileName = App.Path & strBackSlash & "EMPLOYEE.DAT"

  intEmpFileNbr = FreeFile

 

  Open strEmpFileName For Input As #intEmpFileNbr

 

  Do Until EOF(intEmpFileNbr)

      Input #intEmpFileNbr, strEmpName, intDeptNbr, strJobTitle, dtmHireDate, sngHrlyRate

      Print strEmpName; _

              Tab(25); Format$(intDeptNbr, "@@@@"); _

              Tab(35); strJobTitle; _

              Tab(55); Format$(dtmHireDate, "mm/dd/yyyy"); _

              Tab(70); Format$(Format$(sngHrlyRate, "Standard"), "@@@@@@@") 

  Loop

 

The Close statement

 

When you are finished using a file in your program, you should Close that file.  The Close statement tells VB that you are done using a file, and frees up the system resources needed to process that file.

 

The statement

            Close #1

frees the resources used by the file referenced as #1, and also terminates the association between the Windows-system file and the file number – so at this point, if you wanted to, you could Open some other file AS #1.

 

If you have more than one file open in a program, you can close multiple files with one Close statement by separating the file numbers with commas:

            Close #1, #2, #68

 

The statement

            Close

by itself on one line closes all files that are currently open.

 

In the "full-blown" example above, the following line should be added to the end (after the "Loop" statement):

 

            Close #intEmpFileNbr

 

To demonstrate the code above

 

 

 

Download the VB project code for the example above here.