Bacon’s Cipher in Python

Bacon's Cipher is a very simple and very old method of encoding a message and is now only of interest as a historical relic, but it also provides an interesting little programming project. In this article I will code it in Python.

Bacon's Cipher

Bacon's Cipher was created in 1605 by Francis Bacon, a very interesting figure who, amongst other things, made valuable contributions to "natural philosophy" which evolved into what we call science. His Wikipedia article is worth reading but I'll just concentrate on his eponymous cipher.

Portrait of Francis Bacon (1561-1626) by Paul van Somer I
Image: Wikipedia

To encipher a message each letter is first represented by a sequence of 5 As and Bs according to the following table. This of course can be considered to be a form of binary so I have also shown the corresponding sequences of 0s and 1s. (The original form of the cipher combined the letter pairs I,J and U,V so only had 24 encodings. I will use the full 26 encodings shown below.)

LetterA/B EncodingBinary Encoding
AAAAAA00000
BAAAAB00001
CAAABA00010
DAAABB00011
EAABAA00100
FAABAB00101
GAABBA00110
HAABBB00111
IABAAA01000
JABAAB01001
KABABA01010
LABABB01011
MABBAA01100
NABBAB01101
OABBBA01110
PABBBB01111
QBAAAA10000
RBAAAB10001
SBAABA10010
TBAABB10011
UBABAA10100
VBABAB10101
WBABBA10110
XBABBB10111
YBBAAA11000
ZBBAAB11001

For this project I will use the binary encodings because they form a sequence of 26 consecutive numbers which we can easily convert to and from ASCII using addition and subtraction.

After creating a list of either As and Bs or 0s and 1s we need another piece of text the same length or longer. In the original form of the cipher two typefaces were used, one for letters corresponding to A and another for letters corresponding to B. In practice any two types of distinct formatting can be used, and I will use lower case for 0 and upper case for 1.

To decipher the text the binary pattern (or As and Bs) is recreated according to the case or other formatting of the text, and each block of 5 characters in the pattern is converted back to the corresponding letter. The deciphering process creates a string of uppercase letters as the punctuation, whitespace and case of the original plaintext is lost.

Let's look at a brief example, using "Bacon." as plaintext and "Francis Bacon was a statesman and philosopher" as the target text.

PlaintextBacon.
Remove non-lettersBacon
Convert to upper caseBACON
Binary Pattern00001 00000 00010 01110 01101
EncipheredfranCis bacon wAs a STAteSMaN

As you can see the target text is truncated to the necessary length but it could be used in full with the superfluous letters all using the same formatting.

The Project

This project consists of the following files:

  • baconscipher.py

  • baconscipherdemo.py

The files can be downloaded as a zip, or you can clone/download the Github repository if you prefer.

Source Code Links

ZIP File
GitHub

Let's first look at baconscipher.py.

baconscipher.py

def encipher(plaintext, target_text):

    """
    Encipher plaintext to target text with binary 0
    as lower case and binary 1 as upper case
    """

    # remove all non-alphabetic characters
    # from plaintext and convert to upper case
    plaintext = ''.join([c for c in plaintext if c.isalpha()]).upper()

    # get a string of 0s and 1s representing the
    # formatting of the enciphered message letters
    binary_string = _string_to_bit_pattern(plaintext)

    # format the target text letters as upper or lower case
    # according to the bit pattern
    enciphered = _cased_text_from_bit_pattern(target_text, binary_string)

    return enciphered


def decipher(enciphered):

    """
    Decipher enciphered text assuming lower case letters
    represent binary 0 and upper case letters represent binary 1
    """

    print("\nDeciphering\n===========\n")

    # remove everything except letters
    enciphered = ''.join([c for c in enciphered if c.isalpha()])

    length = len(enciphered)
    letter_quintet = ""
    bit_pattern = ""
    deciphered = []

    for i in range(0, length, 5):

        # grab next 5 letters
        letter_quintet = enciphered[i: i+5]

        # get corresponding string of 5 bits
        bit_pattern = _letter_quintet_to_bit_pattern(letter_quintet)

        # get letter corresponding to bit pattern
        letter = chr(int(bit_pattern, 2) + 65)

        print(f"{letter_quintet} {bit_pattern} {letter}")

        deciphered.append(letter)

    return "".join(deciphered)


#------------------------------------------------------------
# "PRIVATE" FUNCTIONS
#------------------------------------------------------------

def _string_to_bit_pattern(string):

    """
    Convert string of letters to string of
    corresponding 5-bit patterns
    """

    binary_list = []
    bit_pattern = ""

    for letter in string:
        # get ASCII code, subtract 65 and format as 5 bit string
        bit_pattern = format(ord(letter) - 65, '05b')
        binary_list.append(bit_pattern)

    return "".join(binary_list)


def _cased_text_from_bit_pattern(target_text, binary_pattern):

    """
    Set case of target text according to string of 5-bit patterns

    0 => lower case
    1 => upper case

    Non-alpha characters are skipped
    """

    cased_text = []

    index = 0

    for bit in binary_pattern:

        while not target_text[index].isalpha():
            cased_text.append(target_text[index])
            index += 1

        if bit == "0":
            cased_text.append(target_text[index].lower())
        else:
            cased_text.append(target_text[index].upper())

        index += 1

    return "".join(cased_text)


def _letter_quintet_to_bit_pattern(letter_quintet):

    """
    Convert string of 5 letters to corresponding bit pattern

    Lower case => 0
    Upper case => 1
    """

    bit_pattern = []

    for c in letter_quintet:

        if(c >= "a" and c <= "z"):
            bit_pattern.append("0")
        elif(c >= "A" and c <= "Z"):
            bit_pattern.append("1")

    return "".join(bit_pattern)

encipher

The first line actually carries out three separate tasks. The list comprehension selects only alphabetical characters, the resulting list is joined into a string, and the string is then converted to upper case.

Then we call a function to create a string of bit patterns corresponding to the letter/bits mappings shown in the table above.

Finally we call another function to set the case of the individual letters in the target text according to the bit pattern string.

I'll describe the functions used here in more detail as we get to them.

decipher

Firstly we strip out everything from enciphered which is not a letter, and then declare a few variables for use within the for loop.

Next is the for loop itself: note that we iterate from 0 to the length of the enciphered text, and in particular note that we have a step of 5 as within the loop we process 5 characters at a time.

Within each iteration we grab the next 5 letters and then pass them to a function to create the corresponding sequence of 0s and 1s based on the case of the letters. The next line of code converts the string of bits to an integer, adds 65 to get the ASCII code, and then gets the actual letter using the chr function. This letter is then added to the deciphered list.

As you can see I have printed the variables so we can see the decipherement process in action.

Finally we return the deciphered list joined into a string.

_string_to_bit_pattern

This is used by encipher as we saw earlier, and to quote my own docstring this function will "convert string of letters to string of corresponding 5-bit patterns". To do this it iterates the letters in the string and for each one grabs the ASCII code using ord, subtracts 65 (because A = 65 in ASCII, whereas Bacon's encodings start at 0) and formats it as a 5-character binary string. This is then added to a list which we finally join and return.

_cased_text_from_bit_pattern

This is also used by the encipher function, and takes the target text and string generated by _string_to_bit_pattern. The letters in the target text are set to lower case or upper case depending on the value of the corresponding bit. Note that any non-alpha characters are skipped.

To do this we iterate the binary pattern, and within the loop firstly use a while loop to skim non-alpha characters, adding them unchanged to the final result list. When we hit a letter it is converted to upper or lower depending on the value of the current bit, and this is added to the result list. This list is then joined and returned.

_letter_quintet_to_bit_pattern

The last function is used by decipher and creates a strin of 5 0s and 1s depending on the case of the letters in the input string. Note that in Python we can use comparison operators on letters without first converting them to numbers. Nice!

Too Many Functions?

No, not IMHO. The three pseudo-private functions (prefixed with _) are only called once so the code in them could be embedded within the functions that call them. However, each of these three functions carries out one specific and separate task and I feel the overall code is neater, better organised and easier to read, test and maintain if separated out into several functions.

Validation (Lack Of)

If this were a piece of production code for serious use it would be necessary to carry out some validation of the parameters, specifically checking that the target text is at least 5 times longer than the plain text, and that the enciphered text is formatted in a way that can be deciphered using Bacon's Cipher. As this is only a simple programming exercise I have not bothered doing so.

Now let's move on to baconscipherdemo.py, a simple bit of code to try out the previous module.

baconscipherdemo.py

import baconscipher


def main():

    print("------------------")
    print("| codedrome.com  |")
    print("| Bacon's Cipher |")
    print("------------------\n")

    plaintext = "Knowledge and human power are synonymous."

    target_text = "There were under the law, excellent King, both daily sacrifices and freewill offerings; the one proceeding upon ordinary observance, the other upon a devout cheerfulness: in like manner there belongeth to kings from their servants both tribute of duty and presents of affection.  In the former of these I hope I shall not live to be wanting, according to my most humble duty and the good pleasure of your Majestyundefineds employments: for the latter, I thought it more respective to make choice of some oblation which might rather refer to the propriety and excellency of your individual person, than to the business of your crown and state."

    enciphered = baconscipher.encipher(plaintext, target_text)
    print("Enciphered\n==========")
    print(enciphered)

    deciphered = baconscipher.decipher(enciphered)
    print("\nDeciphered\n==========")
    print(deciphered)


if __name__ == "__main__":
    main()

The plaintext and target text are both quotes from works by Francis Bacon. The plaintext comes from

Novum Organum; Or, True Suggestions for the Interpretation of Nature

The target text is from

The Advancement of Learning

In main we simply pass these to baconscipher.encipher and print the result, which is then passed to baconscipher.decipher, again printing the result.

Now we can run the program with this command.

Running the Program

python3 baconscipherdemo.py

This is the end result.

Program Output

------------------
| codedrome.com  |
| Bacon's Cipher |
------------------

Enciphered
==========
tHeRe wERe UnDER tHe LAw, eXcELleNt king, BOth DAily SacrificeS AnD freE
WilL OFFeRingS; The one proCEeDiNG UPoN ORdInARy obSerVancE, the otHer u
Pon A deVouT cHEerfuLNeSs: IN LikE MaNNEr theRE belONGeTh To kIngS f

Deciphering
===========

tHeRe 01010 K
wEReU 01101 N
nDERt 01110 O
HeLAw 10110 W
eXcEL 01011 L
leNtk 00100 E
ingBO 00011 D
thDAi 00110 G
lySac 00100 E
rific 00000 A
eSAnD 01101 N
freEW 00011 D
ilLOF 00111 H
FeRin 10100 U
gSThe 01100 M
onepr 00000 A
oCEeD 01101 N
iNGUP 01111 P
oNORd 01110 O
InARy 10110 W
obSer 00100 E
VancE 10001 R
theot 00000 A
HeruP 10001 R
onAde 00100 E
VouTc 10010 S
HEerf 11000 Y
uLNeS 01101 N
sINLi 01110 O
kEMaN 01101 N
NErth 11000 Y
eREbe 01100 M
lONGe 01110 O
ThTok 10100 U
IngSf 10010 S

Deciphered
==========
KNOWLEDGEANDHUMANPOWERARESYNONYMOUS