Zipf’s Law in Python

In this post I will write a project in Python to apply Zipf's Law to analysing word frequencies in a piece of text.

Zipf's Law describes a probability distribution where each frequency is the reciprocal of its rank multiplied by the highest frequency. Therefore the second highest frequency is the highest multiplied by 1/2, the third highest is the highest multiplied by 1/3 and so on.

This is best illustrated with a graph.

Zipf's Law in Python

Continue reading

Frequency Analysis in Python

Simple codes such as substitution cyphers can be cracked or broken using a technique called frequency analysis which I will implement in Python.

In a previous post I implemented a very simple and very insecure substitution cypher. It is insecure because each letter in the original text is always encrypted the same way, for example the most common letter "e" might always be encrypted as "h", so if we find that "h" is the most common letter in the encrypted text then we can assume it represents "e". This can be carried out for all letters, a process called frequency analysis which in this post I will implement in Python.

Continue reading

Pascal’s Triangle in Python

The numbers in the graphic below form the first five rows of Pascal's Triangle, which in this post I will implement in Python.

The first row consists of a single number 1. In subsequent rows, each of which is has one more number than the previous, values are calculated by adding the two numbers above left and above right. For the first and last values in each row we just take the single value above, therefore these are always 1.

Pascal's Triangle

Pascal's Triangle in its conventional centred layout

Continue reading

The Soundex Algorithm in Python

Soundex is a phonetic algorithm, assigning values to words or names so that they can be compared for similarity of pronounciation. For this post I will write an implementation in Python.

It doesn't take much thought to realise that the whole area of phonetic algorithms is a minefield, and Soundex itself is rather restricted in its usefulness. In fact, after writing this implementation I came to the conclusion that it is rather mediocre but at least coding it up does give a better understanding of how it works and therefore its usefulness and limitations.

Continue reading

Reading EXIF Data with Python and Pillow

Digital photos contain EXIF data which can be read and displayed by suitable software. In this post I will demonstrate reading EXIF data using Python.

Pretty much any smartphone, tablet or digital camera embeds EXIF data into each photograph it takes, with details of the device itself as well as the individual photo. And just about every piece of software which can display or edit digital images can read and display that data. Or at least attempt to...!

Unfortunately EXIF is not easy to deal with. It would be nice if the data consisted of a string of XML or JSON so that it could be handled using ubiquitous techniques and libraries, but the format itself is rather quirky, made worse by inconsistent implementations by different manufacturers and sometimes different devices from the same manufacturer.

You may have noticed that sometimes even popular mainstream software fails to show sensible interpretations of some EXIF data. Despite that I will have a bash at extracting the EXIF data from images using Python and the Pillow library.

Continue reading