Py2HTML: Formatting Python as HTML

Reformatting Python code as HTML for inclusion in posts like this one would be incredibly tedious to do by hand so I put together a quick and dirty utility to do it for me. It's not the greatest piece of software ever written but it does the job, so I thought I might as well share it.

The Problem

Copying and pasting Python into an HTML document just isn't going to work. We need to, at the very least, carry out the following substitutions to produce something a browser will render as we want it:

  • Replace tabs with     . If we don't do this the browser will just show one space. (Of course, you might have different ideas about the size of a tab!)

  • Replace linefeeds with </br>\n. The \n isn't necessary for the browser, but does format the raw HTML better.

  • Code written on a Windows system will have a carriage return as well as a linefeed - these need to be deleted, or specifically replaced with an empty string.

  • Spaces need to be replaced with &nbsp;. Single spaces are shown correctly by browsers, but multiple spaces are shown as only one space. For example, the Python convention is to use four spaces instead of tabs and these will not render correctly unless we replace each space with &nbsp;.

  • The < and > symbols need to be replaced with &lt; and &gt; respectively, or the browser will try to interpret them as HTML tags.

The Solution

What we need is a simple program which will take an input file as a command line argument, perform the above substitutions on that file's contents, and save the result in an output file with the same name as the input file but with .html appended. For example, we would convert the source code for this project to HTML with the following.

Example Program Usage

python3.7 py2html.py py2html.py

This causes the Python program py2html.py to convert its own source code to the HTML file py2html.py.html.

Coding

Let's start coding - create a new folder and within it create an empty file called py2html.py. You can download the source or clone/download from Github if you prefer. Open the file and enter or paste the following.

Source Code Links

ZIP File
GitHub

py2html.py

import sys


def main():

    """
    This program reformats the Python source code file specified
    as a command line argument as HTML and saves it to a new file
    with the original name with ".html" appended.
    """

    print("-----------------")
    print("| codedrome.com |")
    print("| Py2HTML       |")
    print("-----------------\n")

    if len(sys.argv) != 2:
        printf("input file must be specified")
    else:
        generate_html(sys.argv[1])


def generate_html(inputfile):

    """
    Central function performing substitution and file saving.
    """

    outputfile = inputfile + ".html"
    print("Input file:  " + inputfile)
    print("Output file: " + outputfile)

    # The mappings list contains a character or string to substitute
    # for each character indexed using its ASCII code
    mappings = [" "] * 128
    populate_mappings(mappings)

    try:

        fin = open(inputfile, "r")
        fout = open(outputfile, "w+")

        while True:
            c = fin.read(1)
            if not c:
                break
            fout.write(mappings[ord(c)])

        fin.close()
        fout.close()

        print("Output file generated")

    except IOError as ioe:

        print(str(ioe))


def populate_mappings(mappings):

    """
    Creates and returns the list of input to output character substitutions.
    """

    # initialize to default values
    for i in range(0, 128):
        mappings[i] = chr(i)

    # overwrite values we want to replace

    # tab
    mappings[9] = "    "

    # linefeed
    mappings[10] = "</br>\n"

    # carriage return
    mappings[13] = ""

    # space
    mappings[32] = " "

    # <
    mappings[60] = "<"

    # >
    mappings[62] = ">"


main()

Firstly let's look at the main function. We need to check that a file name has been passed as a command line argument so check the length of sys.argv which is how Python makes command line arguments available. (Note that we imported sys at the top of the file.) The first argument is always the name of the program so there is always at least one. Actual arguments passed by the user come after this. We therefore expect two arguments and an error message is printed if there are less or more that two. Otherwise we call generate_html with argument 1. We don't yet know if it is a valid filename but this is checked later.

In the generate_html function we generate the output filename by appending .html to the input filename, and then print both filenames. We then create a list of 128 spaces to use for the mappings between input characters and output characters or strings. This is then passed to the populate_mappings function which I will describe in a moment.

We then try to open both files, reading from one and writing to the other; as this could go wrong the rest of the code is wrapped in a try/except block, printing a message if there is an exception. If all goes well we iterate the input file character by character, writing the corresponding character or string from the mappings list. The list indexes are the ASCII codes of the input characters they provide substitutions for so we use the ord function to get the code from input characters. Finally we close the files and print a message to let the user know the process has finished.

Finally comes the populate_mappings function. This takes the list of spaces created above and replaces the spaces with the character or string which is the output for the corresponding input, using the input characters ASCII codes as the list's indexes.

Most characters will be passed to the output unchanged, for example 'A' maps to 'A'. However, '<' maps to '&lt;' and so on, as per the above list. Therefore we use a loop to set all values to the character equivalents of the indexes, eg. mappings[65] will be 'A'. We then overwrite the few we need to change, so mappings[9] (tab) will become &nbsp;&nbsp;&nbsp;&nbsp; etc..

That's the coding finished so we can now run program using the command given above which I'll repeat here:

Running the program on its own source code

python3.7 py2html.py py2html.py

This will give you the following output:

Program Output

-----------------
| codedrome.com |
| Py2HTML       |
-----------------

Input file:  py2html.py
Output file: py2html.py.html
Output file generated

You can then open the generated HTML in an editor and/or browser. It is not a full HTML document but that is not the intention, although it will display properly in a browser without opening and closing HTML tags etc.. What we have is a snippet which can be pasted into a blog post or other web page, including this one!