Formatting C Code as HTML

All the posts on this blog contain C source code which has been reformatted as HTML. This would be incredibly tedious to do by hand so I put together a quick and dirty utility to do it for me. It's not the greatest piece of software ever written but it does the job, so I thought I might as well share it.

The Problem

Copying and pasting C source code (or any source code for that matter) into an HTML document just isn't going to work. We need to, at the very least, carry out the following substitutions to produce something a browser will render as we want it:

  • Replace tabs with     . If we don't do this the browser will just show one space. (Of course, you might have different ideas about the size of a tab!)

  • Replace linefeeds with <br/>\n. The \n isn't necessary for the browser, but does format the raw HTML better.

  • Code written on a Windows system will have a carriage return as well as a linefeed - these need to be deleted, or specifically replaced with an empty string.

  • Spaces need to be replaced with &nbsp;. Single spaces are shown correctly by browsers, but multiple spaces are shown as only one space. If we have code using four spaces instead of tabs for example, these will not render correctly unless we replace each space with &nbsp;.

  • The < and > symbols need to be replaced with &lt; and &gt; respectively, or the browser will try to interpret them as HTML tags.

The Solution

What we need is a simple program which will take an input file as a command line arguments, perform the above substitutions on the input file, and save the result in the output file with the same name as the input file with ".html" suffixed. For example, we would convert the source code for this project to HTML with the following.

Example Program Usage

./code2html code2html.c

This causes the executable code2html to convert the source code file code2html.c to the HTML file code2html.c.html.

Coding

Let's start coding - create a new folder and within it create an empty file called code2html.c. You can download the source code as a zip or clone/download the Github repository if you prefer.

Source Code Links

ZIP File
GitHub

code2html.c (part 1)

#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<stdbool.h>

//--------------------------------------------------------
// FUNCTION PROTOTYPES
//--------------------------------------------------------
void generate_html(char* inputfile);
void populate_mappings(char** mappings);
void set_value(char** array, int index, char* value);

//--------------------------------------------------------
// FUNCTION main
//--------------------------------------------------------
int main(int argc, char* argv[])
{
    puts("-----------------");
    puts("| codedrome.com |");
    puts("| Code2HTML     |");
    puts("-----------------\n");

    printf("argc %d\n", argc);

    for(int i = 0; i < argc; i++)
    {
        printf("argv[%d]\t%s\n", i, argv[i]);
    }

    if(argc != 2)
    {
        printf("input file must be specified");
    }
    else
    {
        generate_html(argv[1]);
    }

    return EXIT_SUCCESS;
}

The first few lines of main are just to demonstrate how command line arguments are passed to the program. Main has two arguments, argc which is the number of arguments, and argv which holds the arguments themselves as an array of strings. Firstly we print the count, and then use a for loop to print the arguments. There is always at least one argument, the name of the executable itself, which is always the first item. Actual command line arguments are therefore indexed 1, 2, 3 etc..

This program needs one arguments, the input file. We therefore need to check that argc = 2: if not we show an error message. At this stage, even if the correct number of arguments are passed we do not know if the filenames are valid. This will be handled later. Assuming we did receive 2 arguments, we just call the generate_html function with arguments 1.

Before we get to implementing generate_html, we'll write a couple of functions to create a mapping table of characters or strings. The purpose of this is to provide mappings of characters in the input file to characters of strings in the output file. Most characters will be passed to the output unchanged, for example 'A' maps to 'A'. However, '<' maps to '&lt;' and so on, as per the above list. The indexes of this mapping table will be the ASCII codes of the input characters, so to get an output we just need to index the table using the corresponding input.

The code to create this table is in the populate_mappings function, which initially just sets all values to the character equivalents of the indexes, eg. mappings[65] will be 'A' and so on. We then overwrite the few we need to change, so mappings[9] (tab) will become &nbsp;&nbsp;&nbsp;&nbsp; etc.. To avoid lots of repetitive code, the actual overwriting will be farmed out to the set_value function.

Go back to code2html.c and enter the following.

code2html.c (part 2)

//--------------------------------------------------------
// FUNCTION populate_mappings
//--------------------------------------------------------
void populate_mappings(char** mappings)
{
    // initialize to default values
    for(int i = 0; i <= 127; i++)
    {
        mappings[i] = malloc(2);

        sprintf(mappings[i], "%c", i);
    }

    // overwrite values we want to replace

    // tab
    set_value(mappings, 9, "    ");

    // linefeed
    set_value(mappings, 10, "</br>\n");

    // carriage return
    set_value(mappings, 13, "");

    // space
    set_value(mappings, 32, " ");

    // <
    set_value(mappings, 60, "<");

    // >
    set_value(mappings, 62, ">");
}

//--------------------------------------------------------
// FUNCTION set_value
//--------------------------------------------------------
void set_value(char** array, int index, char* value)
{
    array[index] = realloc(array[index], strlen(value) + 1);

    if(array[index] != NULL)
    {
        strcpy(array[index], value);
    }
    else
    {
        puts("Cannot allocate memory");

        exit(EXIT_FAILURE);
    }
}

The mappings are restricted to ASCII, ie. 0 to 127. They do not include extended ASCII (128 to 255), which is non-standard and should be avoided at all cost, and Unicode also is not supported. The probability of me using anything other than standard ASCII in source code is close to 0 so this shouldn't be a problem.

We can now implement generate_html. In code2html.c enter the following.

code2html.c (part 3)

//--------------------------------------------------------
// FUNCTION generate_html
//--------------------------------------------------------
void generate_html(char* inputfile)
{
    char* outputfile = malloc(strlen(inputfile) + 6);

    strcpy(outputfile, inputfile);

    strcat(outputfile, ".html\0");

    printf("outputfile: %s\n", outputfile);

    char* mappings[128];

    populate_mappings(mappings);

    // attempt to open files
    FILE* fpinput;
    FILE* fpoutput;
    fpinput = fopen(inputfile, "r");
    fpoutput = fopen(outputfile, "w");

    char c;

    int position = 0;

    if(fpinput != NULL && fpoutput != NULL)
    {
        // iterate input file,
        // writing corresponding values from mappings array to output file
        while((c = fgetc(fpinput)) != EOF)
        {
            fputs(mappings[c], fpoutput);
            if(c < 0 || c > 127)
            {
               printf("%d\t%c\t%d\n", position, c, c);
            }

            position++;
        }

        fclose(fpinput);

        fclose(fpoutput);

        printf("%s created\n", outputfile);
    }
    else
    {
        puts("Cannot open input or output files");
    }

    // free up malloc'ed memory
    for(int i = 0; i < 128; i++)
    {
        free(mappings[i]);
    }

    free(outputfile);
}

In generate_html we first create the output filename and then populate the mappings array, before attempting to open the input file for reading and the output file for writing. If we cannot open one or both the files for some reason then the file pointers will be NULL so we need to check this before using the files.

If the files open correctly we just iterate them in a while loop until we get to EOF (end of file). In the loop we simply output the strings from mappings corresponding to the ASCII codes of the input characters; this is a major benefit of characters being basically integers.

When all is done we just close the files and output a message. Finally we call free on the items in the mappings array.

We can now compile the program and run it with its own source code as input.

Compile and Run

gcc code2html.c -std=c11 -lm -o code2html

./code2html code2html.c code2html.html

The output isn't very interesting, but note that it outputs the argument count as well as the arguments themselves. We should also get a message saying the HTML file has been created.

Program Output

-----------------
| codedrome.com |
| Code2HTML     |
-----------------

argc 3
argv[0] ./code2html
argv[1] code2html.c
argv[2] code2html.html
code2html.html created

You can then open the generated HTML in an editor and/or browser. It is not a full HTML document but that is not the intention, although it will display properly in a browser without opening and closing HTML tags etc.. What we have is a snippet which can be pasted into a blog post or other web page, including this one!

Leave a Reply

Your email address will not be published. Required fields are marked *