Binary Files
In a sense, all files are "binary" in that they are just a collection of bytes stored in an operating system construct called a file. However, when we talk about binary files, we are really referring to the way VB.NET opens and processes the file. In the previous topics on text files, techniques for reading and writing those files on a field-by-field or line-by-line basis were demonstrated. In the upcoming topic on random files, the record-oriented techniques for processing those types of files will be demonstrated. On the other hand, binary files typically do not have a simple line-based or record-based structure, but rather have complex internal structures that require special programs to process them. A typical example is an image file, which requires a program such MS-Paint or Adobe Photoshop to do something useful with it.
However, any file can be processed in binary mode; the key is that you must traverse or parse through the file to get at the data that you need.
In this topic we will look at the techniques for processing an "unstructured" binary file (using techniques such as "FileGet" and "FilePut", which are retooled versions of "Get" and "Put" from pre-.NET versions of VB), as well as "new-in-.NET" features to process record-oriented binary files (using BinaryReader and BinaryWriter).
In the first set of sample programs, the following functions will be used:
FileOpen |
|||||||||
Description: |
Opens a file for input or output. You must open a file before any I/O operation can be performed on it. FileOpen allocates a buffer for I/O to the file and determines the mode of access to use with the buffer. If the file specified by FileName doesn't exist, it is created when a file is opened for Append, Binary, Output, or Random modes. The channel to open can be found using the FreeFile() function. |
||||||||
Syntax: |
FileOpen(FileNumber, FileName, Mode, Access)
The parameters are explained as follows:
|
||||||||
Example: |
This example opens the file in Binary mode for writing operations only.
|
LOF |
|
Description: |
Gets the size, in bytes, of a file opened using the FileOpen function. ("LOF" = "Length Of File") |
Syntax: |
LOF(FileNumber)
where FileNumber is any valid file number |
Example: |
Dim lngFileSize As Long FileOpen(1, "C:\TESTFILE.TXT", OpenMode.Input) ' Open file. lngFileSize = LOF(1) ' Get length of file. Console.WriteLine("File Size is {0} bytes.", lngFileSize) FileClose(1) ' Close file. |
FileGet |
|||||||
Description: |
Reads data from an open disk file into a variable. Valid only for files opened in Random or Binary mode. |
||||||
Syntax: |
FileGet(FileNumber, Value [, RecordNumber])
The parameters are explained as follows:
|
||||||
Example: |
The following statements read 20 bytes from file number 1. (The number of bytes read equals the number of characters already in the string – and the current length of strStomeData is 20.) Dim strSomeData As New String(" ", 20) FileOpen(1, "C:\TESTFILE.txt", OpenMode.Binary, OpenAccess.Read) FileGet(1, strSomeData) Console.WriteLine(strSomeData) FileClose(1) |
FilePut |
|||||||
Description: |
Writes data from a variable to a disk file. Valid only for files opened in Random or Binary mode. |
||||||
Syntax: |
FilePut(FileNumber, Value [, RecordNumber])
The parameters are explained as follows:
|
||||||
Example: |
The following statements write 7 bytes to the file number 1. (The number of bytes written equals the number of characters already in the string – and the current length of strSomeData is 7 because it contains the string "Hey now".) Dim strSomeData As String = "Hey now" FileOpen(1, "C:\TESTFILE.txt", OpenMode.Binary, OpenAccess.Write) FilePut(1, strSomeData) FileClose(1) |
InputString |
|||||
Description: |
Returns String value containing characters from a file opened in Input or Binary mode. |
||||
Syntax: |
FilePut(FileNumber, CharCount)
The parameters are explained as follows:
|
||||
Example: |
The following statements store the entire contents of TESTFILE.txt into the variable strSomeData. Note that the second parameter of InputString specifies "LOF(1)", meaning the number of bytes to be read from the file should be the number of bytes that make up the file size. Dim strSomeData As String FileOpen(1, "C:\TESTFILE.txt", OpenMode.Binary, OpenAccess.Read) strSomeData = InputString(1, LOF(1)) FileClose(1) |
Three sample programs will now be presented, using the functions described above. All three read in the same input file and write out the same output file; the difference is in how the input file is read. The first sample program uses the FileGet function to process the file in "chunks", and second uses the FileGet function to process the file all at once, and third uses the InputString function to process the file all at once.
The job of the sample programs is to read in an HTML file, strip out all tags (i.e., everything between the "less than" and "greater than" angle brackets as well as the brackets themselves), and write out the remaining text.
The figure below shows excerpts of both the HTML input file and the plain text output file.
HTML Input File (excerpt) |
Plain Text Output File (excerpt) |
<title>Working with Files</title> <!--[if gte mso 9]><xml> <o:DocumentProperties> <o:Author>Harry Dodson</o:Author> <o:LastAuthor>UNI</o:LastAuthor> <o:Revision>2</o:Revision> <o:TotalTime>42</o:TotalTime> <o:Created>2005-08-14T13:20:00Z</o:Created> <o:LastSaved>2005-08-14T13:20:00Z</o:LastSaved> <o:Pages>1</o:Pages> <o:Words>2393</o:Words> <o:Characters>13644</o:Characters> <o:Company>Logical Decisions</o:Company> <o:Lines>113</o:Lines> <o:Paragraphs>32</o:Paragraphs> <o:CharactersWithSpaces>16005</o:CharactersWithSpaces> <o:Version>10.3311</o:Version> </o:DocumentProperties> </xml><![endif]--><!--[if gte mso 9]><xml> <w:WordDocument> <w:DoNotHyphenateCaps/> <w:PunctuationKerning/> <w:Compatibility> <w:BreakWrappedTables/> <w:SnapToGridInCell/> <w:WrapTextWithPunct/> <w:UseAsianBreakRules/> </w:Compatibility> <w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel> </w:WordDocument> </xml> . . . |
Working with Files
Harry Dodson UNI 2 42 2005-08-14T13:20:00Z 2005-08-14T13:20:00Z 1 2393 13644 Logical Decisions 113 32 16005 10.3311
MicrosoftInternetExplorer4
. . . |
Sample Program 1 – Using the Get Statement to Read a Binary File In "Chunks"
The first sample program uses the technique of reading and processing a binary file one "chunk" at a time (in this case 10,000 bytes at a time) using the Get statement. Since the file size is a little over 80,000 bytes, it will take nine passes to read through the file. The code listed below is heavily commented to aid in the understanding of how the program works.
Code:
Module Module1
Public Sub Main()
Dim strHTMFileName As String
Dim strTextFileName As String
Dim intHTMFileNbr As Integer
Dim intTextFileNbr As Integer
Dim strBuffer As String = ""
Dim strCurrentChar As String
Dim blnTagPending As Boolean
Dim intX As Integer
Dim intBytesRemaining As Integer
Dim intCurrentBufferSize As Integer
Const intMAX_BUFFER_SIZE As Integer = 10000
strHTMFileName = My.Application.Info.DirectoryPath & "\Files_Lesson1.htm"
strTextFileName = My.Application.Info.DirectoryPath & "\TestOut.txt"
Console.WriteLine("Opening files ...")
'Open the input file ...
intHTMFileNbr = FreeFile()
FileOpen(intHTMFileNbr, strHTMFileName, OpenMode.Binary, OpenAccess.Read)
' If the file we want to open for output already exists, delete it ...
If Dir(strTextFileName) <> "" Then
Kill(strTextFileName)
End If
' Open the output file ...
intTextFileNbr = FreeFile()
FileOpen(intTextFileNbr, strTextFileName, OpenMode.Binary, OpenAccess.Write)
' Initialize the "bytes remaining" variable to the length of the input file ...
intBytesRemaining = LOF(intHTMFileNbr)
' Set up a loop which will process the file in "chunks" of 10,000 bytes at a time.
' We will keep track of how many bytes we have remaining to process, and
' the loop will continue as long as there are bytes remaining.
Do While intBytesRemaining > 0
Console.WriteLine("Processing 'chunk' ...")
' Note: The "buffer" is simply a string variable into which the "current
' chunk" of the file will be read.
' Set the current buffer size to be either the maximum size (10,000) as
' long as there are least 10,000 bytes remaining. If there are less (as
' there would be the last time through the loop), set the buffer size
' equal to the number of bytes remaining.
If intBytesRemaining >= intMAX_BUFFER_SIZE Then
intCurrentBufferSize = intMAX_BUFFER_SIZE
Else
intCurrentBufferSize = intBytesRemaining
End If
' Because the FileGet function relies on the size of the string variable (the
' "buffer") into which the data will be read to know how many bytes to read
' from the file, we fill the buffer string variable with a number of blank
' spaces - where the number of blank spaces was determined in the statement
' above.
strBuffer = New String(" ", intCurrentBufferSize)
' The FileGet function now reads the next chunk of data from the input file
' and stores it in the strBuffer variable.
FileGet(intHTMFileNbr, strBuffer)
' The For loop below now processes the current chunk of data character by
' character, writing out only the characters that are NOT enclosed in the
' HTML tags (i.e., it is skipping every character between a pair of angle
' brackets "<" and ">") ...
For intX = 1 To intCurrentBufferSize
strCurrentChar = Mid(strBuffer, intX, 1)
Select Case strCurrentChar
Case "<"
blnTagPending = True
Case ">"
blnTagPending = False
Case Else
If Not blnTagPending Then
' The current character is outside of the tag brackets, so
' write it out ...
FilePut(intTextFileNbr, strCurrentChar)
End If
End Select
Next
' Adjust the "bytes remaining" variable by subtracting the current buffer size
' from it ...
intBytesRemaining = intBytesRemaining - intCurrentBufferSize
Loop
Console.WriteLine("Closing files ...")
' Close the input and output files ...
FileClose(intHTMFileNbr)
FileClose(intTextFileNbr)
Console.WriteLine("Done.")
Console.ReadLine()
End Sub
End Module
Screenshot of run: |
Download the VB project code for the example above here.
Sample Program 2 – Using the Get Statement to Read a Binary File All At Once
The second sample program uses the technique of reading and processing a binary file all at once, using the Get statement in conjunction with the LOF function. The code listed below is heavily commented to aid in the understanding of how the program works.
Code:
Module Module1
Public Sub Main()
Dim strHTMFileName As String
Dim strTextFileName As String
Dim intHTMFileNbr As Integer
Dim intTextFileNbr As Integer
Dim strBuffer As String
Dim strCurrentChar As String
Dim blnTagPending As Boolean
Dim intX As Integer
strHTMFileName = My.Application.Info.DirectoryPath & "\Files_Lesson1.htm"
strTextFileName = My.Application.Info.DirectoryPath & "\TestOut.txt"
Console.WriteLine("Opening files ...")
'Open the input file ...
intHTMFileNbr = FreeFile()
FileOpen(intHTMFileNbr, strHTMFileName, OpenMode.Binary, OpenAccess.Read)
' If the file we want to open for output already exists, delete it ...
If Dir(strTextFileName) <> "" Then
Kill(strTextFileName)
End If
' Open the output file ...
intTextFileNbr = FreeFile()
FileOpen(intTextFileNbr, strTextFileName, OpenMode.Binary, OpenAccess.Write)
Console.WriteLine("Reading input file ...")
' Note: The "buffer" is simply a string variable into which the "current
' chunk" of the file will be read.
' Because the FileGet function relies on the size of the string variable (the
' "buffer") into which the data will be read to know how many bytes to read
' from the file, we fill the buffer string variable with a number of blank
' spaces - where the number of blank spaces is equal to the size of the
' entire file (as determined by the LOF function) ...
strBuffer = New String(" ", LOF(intHTMFileNbr))
' The FileGet function now reads the entire contents of the input file
' and stores it in the strBuffer variable.
FileGet(intHTMFileNbr, strBuffer)
Console.WriteLine("Generating output file ...")
' The For loop below now processes the contents of the file character by
' character, writing out only the characters that are NOT enclosed in the
' HTML tags (i.e., it is skipping every character between a pair of angle
' brackets "<" and ">") ...
For intX = 1 To Len(strBuffer)
strCurrentChar = Mid(strBuffer, intX, 1)
Select Case strCurrentChar
Case "<"
blnTagPending = True
Case ">"
blnTagPending = False
Case Else
If Not blnTagPending Then
' The current character is outside of the tags, so write it out ...
FilePut(intTextFileNbr, strCurrentChar)
End If
End Select
Next
Console.WriteLine("Closing files ...")
' Close the input and output files ...
FileClose(intHTMFileNbr)
FileClose(intTextFileNbr)
Console.WriteLine("Done.")
Console.ReadLine()
End Sub
End Module
Screenshot of run: |
Download the VB project code for the example above here.
Sample Program 3 – Using the Input Function to Read a Binary File All At Once
The third sample program uses the technique of reading and processing a binary file all at once, using the InputString function in conjunction with the LOF function. The code listed below is heavily commented to aid in the understanding of how the program works.
Code:
Module Module1
Public Sub Main()
Dim strHTMFileName As String
Dim strTextFileName As String
Dim intHTMFileNbr As Integer
Dim intTextFileNbr As Integer
Dim strBuffer As String
Dim strCurrentChar As String
Dim intX As Integer
Dim blnTagPending As Boolean
strHTMFileName = My.Application.Info.DirectoryPath & "\Files_Lesson1.htm"
strTextFileName = My.Application.Info.DirectoryPath & "\TestOut.txt"
Console.WriteLine("Opening files ...")
'Open the input file ...
intHTMFileNbr = FreeFile()
FileOpen(intHTMFileNbr, strHTMFileName, OpenMode.Binary, OpenAccess.Read)
' If the file we want to open for output already exists, delete it ...
If Dir(strTextFileName) <> "" Then
Kill(strTextFileName)
End If
' Open the output file ...
intTextFileNbr = FreeFile()
FileOpen(intTextFileNbr, strTextFileName, OpenMode.Binary, OpenAccess.Write)
Console.WriteLine("Reading input file ...")
' Note: The "buffer" is simply a string variable into which the "current
' chunk" of the file will be read.
' The InputString function reads a number of bytes from a file. The first
' argument specifies the file number of the file from which the data is to be
' read. The resulting data is stored in the "strBuffer" variable. The second argument
' of the function specifies how many bytes to read, which in this case is
' the size of the entire file (as determined by the LOF function).
strBuffer = InputString(intHTMFileNbr, LOF(intHTMFileNbr))
Console.WriteLine("Generating output file ...")
' The For loop below now processes the contents of the file character by
' character, writing out only the characters that are NOT enclosed in the
' HTML tags (i.e., it is skipping every character between a pair of angle
' brackets "<" and ">") ...
For intX = 1 To Len(strBuffer)
strCurrentChar = Mid(strBuffer, intX, 1)
Select Case strCurrentChar
Case "<"
blnTagPending = True
Case ">"
blnTagPending = False
Case Else
If Not blnTagPending Then
' The current character is outside of the tags, so write it out ...
FilePut(intTextFileNbr, strCurrentChar)
End If
End Select
Next
Console.WriteLine("Closing files ...")
' Close the input and output files ...
FileClose(intHTMFileNbr)
FileClose(intTextFileNbr)
Console.WriteLine("Done.")
Console.ReadLine()
End Sub
End Module
Screenshot of run: |
Download the VB project code for the example above here.
Sample Program 4 – Using BinaryWriter and BinaryReader to Write and Read a Binary Data File
This sample program demonstrates the BinaryWriter and BinaryReader objects by first writing out a data file with fields in their native format (string, integer, date, and single). Then the program uses BinaryReader to read that file back in and displaying each record on the console, line by line.
In the first part of the program, the binary data file is populated from a text file of employee data. Each record of the employee text file consists of a record containing employee name (read into a String variable), department number (read into an Integer variable), job title (read into a String variable), hire date (read into a Date variable), and hourly rate (read into a Single variable). The Write method of the BinaryWriter object variable that has been established is then used to populate the binary file, field by field. One thing to note is that when a Date variable is written to this type of binary file, it must be converted to a Long Integer (Int64) using the ToBinary method of the Date variable.
In the second portion of the program, the binary data file is read back, using the ReadXXXX methods of the BinaryReader object variable that has been established. The "ReadXXX" methods are a set of methods to read in the specific data types of the data you are expecting: ReadString, ReadInt32, ReadSingle, etc. A Date variable requires special handling: assuming that the Date field was written to the file using the "ToBinary" method, the Date variable must be read back as a Long Integer with the ReadInt64 method, and then the Date.FromBinary method must be used on it to convert it back to a Date data type for use in the program. Once one "record's worth" of data has been read in with the appropriate ReadXXXX methods, the data is formatted into a string and displayed on the console.
Code:
Imports System.IO
Module Module1
Public Sub Main()
Dim strSeqEmpFileName As String
Dim strBinEmpFileName As String
Dim intSeqEmpFileNbr As Integer
Dim intRecordCount As Integer
Dim strEmpName As String
Dim intDeptNbr As Integer
Dim strJobTitle As String
Dim dtmHireDate As Date
Dim sngHrlyRate As Single
strSeqEmpFileName = My.Application.Info.DirectoryPath & "\EMPLOYEE.txt"
strBinEmpFileName = My.Application.Info.DirectoryPath & "\EMPLOYEE.BIN"
'-----------------------------------------------------------------------
' In the first part of this sample program, we will create, or load,
' a binary access version of the comma-delimited sequential employee
' file that was used in one of the sample programs for sequential access
' files.
'-----------------------------------------------------------------------
' Open the sequential employee file for input ...
intSeqEmpFileNbr = FreeFile()
FileOpen(intSeqEmpFileNbr, strSeqEmpFileName, OpenMode.Input)
' If the binary employee file we want to write already exists,
' delete it ...
If File.Exists(strBinEmpFileName) Then
File.Delete(strBinEmpFileName)
End If
' Open the binary employee file for writing ...
Dim objFS As New FileStream(strBinEmpFileName, FileMode.Create, FileAccess.Write)
Dim objBW As New BinaryWriter(objFS)
' Initialize record count variable to keep track of how many records will
' be written to the binary file ...
intRecordCount = 0
' This loop will read a record from the comma-delimited sequential employee file
' and write a corresponding record to its binary access counterpart ...
Do Until EOF(intSeqEmpFileNbr)
' Read a record's worth of fields from the comma-delimited employee file,
' storing the fields into their corresponding variables ...
Input(intSeqEmpFileNbr, strEmpName)
Input(intSeqEmpFileNbr, intDeptNbr)
Input(intSeqEmpFileNbr, strJobTitle)
Input(intSeqEmpFileNbr, dtmHireDate)
Input(intSeqEmpFileNbr, sngHrlyRate)
' Write a record to the binary file (one field at a time)
objBW.Write(strEmpName)
objBW.Write(intDeptNbr)
objBW.Write(strJobTitle)
objBW.Write(dtmHireDate.ToBinary)
objBW.Write(sngHrlyRate)
' Increment the record count variable ...
intRecordCount = intRecordCount + 1
Loop
' Close the sequential file and the binary file ...
FileClose(intSeqEmpFileNbr)
objBW.Close()
objFS.Close()
'-----------------------------------------------------------------------
' In the next part of this sample program, we will display the records
' written to the binary file by reading them back and outputting their
' contents to the console one by one.
'-----------------------------------------------------------------------
' Print headings ...
Console.WriteLine("{0} employee records were written to the binary file.", intRecordCount)
Console.WriteLine()
Console.WriteLine("Contents as follows:")
Console.WriteLine()
Console.WriteLine("EMP NAME".PadRight(20) & " " & _
"DEPT".PadRight(4) & " " & _
"JOB TITLE".PadRight(25) & " " & _
"HIRE DATE".PadRight(10) & " " & _
"HRLY RATE".PadRight(7))
Console.WriteLine("--------".PadRight(20) & " " & _
"----".PadRight(4) & " " & _
"---------".PadRight(25) & " " & _
"---------".PadRight(10) & " " & _
"---------".PadRight(7))
' Open the binary file for reading ...
objFS = New FileStream(strBinEmpFileName, FileMode.OpenOrCreate, FileAccess.Read)
Dim objBR As New BinaryReader(objFS)
' Loop thru the binary file to get one "record's worth" of fields
' and display each record (the fields) on a console line in each pass of the loop
Do While objBR.PeekChar <> -1
strEmpName = objBR.ReadString
intDeptNbr = objBR.ReadInt32
strJobTitle = objBR.ReadString
dtmHireDate = Date.FromBinary(objBR.ReadInt64)
sngHrlyRate = objBR.ReadSingle
Console.WriteLine(strEmpName.PadRight(20) & " " & _
intDeptNbr.ToString.PadLeft(4) & " " & _
strJobTitle.PadRight(25) & " " & _
Format(dtmHireDate, "MM/dd/yyyy").PadRight(10) & " " & _
Format(sngHrlyRate, "Standard").PadLeft(7))
Loop
Console.WriteLine()
' Close the binary file ...
objBR.Close()
objFS.Close()
Console.ReadLine()
End Sub
End Module
Screenshot of run: |
Download the VB project code for the example above here.