Writing code to read or write text files can be tricky because the exact contents of a file cannot be viewed in a text editor if the file contains non-printable characters such as line feeds or carriage returns. This simple utility program will take a filename as a command line argument and print out its exact contents, including descriptions of any non-printable or whitespace characters.
Let's get straight into coding by creating a folder and within it an empty document called filebytereader.py. You can also download the source code as a zip or clone/download the Github repository if you prefer. Open the file and enter the following code.
Source Code Links
filebytereader.py
import sys def main(): print("--------------------") print("| codedrome.com |") print("| File Byte Reader |") print("--------------------\n") if len(sys.argv) != 2: print("input file must be specified") else: read_file(sys.argv[1]) def read_file(filepath): """ Opens the specified text file and outputs details of each ASCII character, one per line. """ mappings = populate_mappings() i = 1 try: print("------------------------------------------------------") print("| Pos | Code | Printable | Character |") print("------------------------------------------------------") fin = open(filepath, "r") while True: c = fin.read(1) if not c: break if ord(c) >= 0 and ord(c) <= 127: print("| {:<5d} | {:<4d} | {:10s} | {:22} |".format(i, ord(c), str(c.isprintable()), mappings[ord(c)])) else: print("| {:<5d} | {:<4d} | {:10s} | {:22} |".format(i, ord(c), str(c.isprintable()), "[Outside ASCII range]")) i += 1 print("------------------------------------------------------") fin.close() except IOError as ioe: print(str(ioe)) def populate_mappings(): """ Creates a list indexed by ASCII codes of either printable characters or descriptions of non-printable characters. """ mappings = [] for i in range(0, 128): mappings.append(chr(i)) # replace non-printable characters with descriptions mappings[0] = "[null]" mappings[1] = "[start of heading]" mappings[2] = "[start of text]" mappings[3] = "[end of text]" mappings[4] = "[end of transmission]" mappings[5] = "[enquiry]" mappings[6] = "[acknowledge]" mappings[7] = "[bell]" mappings[8] = "[backspace]" mappings[9] = "[tab]" mappings[10] = "[line feed]" mappings[11] = "[vertical tab]" mappings[12] = "[form feed]" mappings[13] = "[carriage return]" mappings[14] = "[shift out]" mappings[15] = "[shift in]" mappings[16] = "[data link escape]" mappings[17] = "[device control 1]" mappings[18] = "[device control 2]" mappings[19] = "[device control 3]" mappings[20] = "[device control 4]" mappings[21] = "[negative acknowledge]" mappings[22] = "[synchronous idle]" mappings[23] = "[end of trans. block]" mappings[24] = "[cancel]" mappings[25] = "[end of medium]" mappings[26] = "[substitute]" mappings[27] = "[escape]" mappings[28] = "[file separator]" mappings[29] = "[group separator]" mappings[30] = "[record separator]" mappings[31] = "[unit separator]" mappings[32] = "[space]" mappings[127] = "[delete]" return mappings main()
The main function is very simple - it checks that a command line argument has been supplied and, if so, calls the read_file function. The name of the executable is always passed as the first argument to main so argc is always at least 1. The first actual command line argument is therefore argv[1] which is what we pass to the read_file function.
In read_file we initialise a list using populate_mappings which I'll describe in a moment, and then attempt to open the specified file within a try/except block. We then iterate the file one character at a time using read(1) in a while loop, breaking out of the loop when we get to the end of the file.
This program is intended to work only on ASCII files, not files containing Extended ASCII or Unicode, so we check the character is between 0 and 127 using the ord() function which returns the ASCII/Unicode value of a character. If it is ASCII we print its position in the file (1 rather than 0 based), the ASCII code, whether it is printable with the isprintable method, and finally the value from the mappings array which is either the character itself or a description. For characters outside the 0-127 range we just print a suitable message. Finally we need to close the file.
At the bottom of the function we catch and print any IOError which might occur.
Now let's look at the populate_mappings function. At the core of this program is a list of strings which maps the individual characters in the file to what is actually displayed, using ASCII codes as indexes. Most characters (letters, numerals, punctuation etc.) will be displayed as themselves, but whitespace and non-printable characters will show a description instead. For example a space will be displayed as [space] and a line feed will be displayed as [line feed].
The populate_mappings function creates the required list, first initializing all characters to the character corresponding to the array index, eg. mappings[97] will be initialized to "a". We then replace the non-printable and whitespace characters with their descriptions.
I have included a text file called ascii.txt with the download which simply contains the characters 0 to 127 in order. We can now compile and run the program, and if you run it with ascii.txt it will produce the output shown below.
Running the Program
python3.7 filebytereader.py ascii.txt
The output is
Program Output (partial)
-------------------- | codedrome.com | | File Byte Reader | -------------------- ------------------------------------------------------ | Pos | Code | Printable | Character | ------------------------------------------------------ | 1 | 0 | False | [null] | | 2 | 1 | False | [start of heading] | | 3 | 2 | False | [start of text] | | 4 | 3 | False | [end of text] | | 5 | 4 | False | [end of transmission] | | 6 | 5 | False | [enquiry] | | 7 | 6 | False | [acknowledge] | | 8 | 7 | False | [bell] | | 9 | 8 | False | [backspace] | | 10 | 9 | False | [tab] | | 11 | 10 | False | [line feed] | | 12 | 11 | False | [vertical tab] | | 13 | 12 | False | [form feed] | | 14 | 10 | False | [line feed] | | 15 | 14 | False | [shift out] | | 16 | 15 | False | [shift in] | | 17 | 16 | False | [data link escape] | | 18 | 17 | False | [device control 1] | | 19 | 18 | False | [device control 2] | | 20 | 19 | False | [device control 3] | | 21 | 20 | False | [device control 4] | | 22 | 21 | False | [negative acknowledge] | | 23 | 22 | False | [synchronous idle] | | 24 | 23 | False | [end of trans. block] | | 25 | 24 | False | [cancel] | | 26 | 25 | False | [end of medium] | | 27 | 26 | False | [substitute] | | 28 | 27 | False | [escape] | | 29 | 28 | False | [file separator] | | 30 | 29 | False | [group separator] | | 31 | 30 | False | [record separator] | | 32 | 31 | False | [unit separator] | | 33 | 32 | True | [space] | | 34 | 33 | True | ! | | 35 | 34 | True | " | | 36 | 35 | True | # | | 37 | 36 | True | $ | | 38 | 37 | True | % | | 39 | 38 | True | & | | 40 | 39 | True | ' | | 41 | 40 | True | ( | | 42 | 41 | True | ) | | 43 | 42 | True | * | | 44 | 43 | True | + | | 45 | 44 | True | , | | 46 | 45 | True | - | | 47 | 46 | True | . | | 48 | 47 | True | / | | 49 | 48 | True | 0 | | 50 | 49 | True | 1 | | 51 | 50 | True | 2 | | 52 | 51 | True | 3 | | 53 | 52 | True | 4 | | 54 | 53 | True | 5 | | 55 | 54 | True | 6 | | 56 | 55 | True | 7 | | 57 | 56 | True | 8 | | 58 | 57 | True | 9 | | 59 | 58 | True | : | | 60 | 59 | True | ; | | 61 | 60 | True | < | | 62 | 61 | True | = | | 63 | 62 | True | > | | 64 | 63 | True | ? | | 65 | 64 | True | @ | | 66 | 65 | True | A | | 67 | 66 | True | B | | 68 | 67 | True | C | | 69 | 68 | True | D | | 70 | 69 | True | E | | 71 | 70 | True | F |