In this post I will write a project in Python to apply Zipf's Law to analysing word frequencies in a piece of text.
Zipf's Law describes a probability distribution where each frequency is the reciprocal of its rank multiplied by the highest frequency. Therefore the second highest frequency is the highest multiplied by 1/2, the third highest is the highest multiplied by 1/3 and so on.
In this project I will code in Python a few of the methods of estimating Pi.
Pi is an irrational number starting off 3.14159 and then carrying on for an infinite number of digits with no pattern which anybody has ever discovered. Therefore it cannot be calculated as such, just estimated to (in principle) any number of digits.
Simple codes such as substitution cyphers can be cracked or broken using a technique called frequency analysis which I will implement in Python.
In a previous post I implemented a very simple and very insecure substitution cypher. It is insecure because each letter in the original text is always encrypted the same way, for example the most common letter "e" might always be encrypted as "h", so if we find that "h" is the most common letter in the encrypted text then we can assume it represents "e". This can be carried out for all letters, a process called frequency analysis which in this post I will implement in Python.
In this post I'll write an implementation in Python of Zeller's Congruence, a simple and elegant little formula to carry out the seemingly complex task of calculating the day of the week (Monday, Tuesday etc.) from a date.
The numbers in the graphic below form the first five rows of Pascal's Triangle, which in this post I will implement in Python.
The first row consists of a single number 1. In subsequent rows, each of which is has one more number than the previous, values are calculated by adding the two numbers above left and above right. For the first and last values in each row we just take the single value above, therefore these are always 1.
Pascal's Triangle in its conventional centred layout
Soundex is a phonetic algorithm, assigning values to words or names so that they can be compared for similarity of pronounciation. For this post I will write an implementation in Python.
It doesn't take much thought to realise that the whole area of phonetic algorithms is a minefield, and Soundex itself is rather restricted in its usefulness. In fact, after writing this implementation I came to the conclusion that it is rather mediocre but at least coding it up does give a better understanding of how it works and therefore its usefulness and limitations.
Digital photos contain EXIF data which can be read and displayed by suitable software. In this post I will demonstrate reading EXIF data using Python.
Pretty much any smartphone, tablet or digital camera embeds EXIF data into each photograph it takes, with details of the device itself as well as the individual photo. And just about every piece of software which can display or edit digital images can read and display that data. Or at least attempt to...!
Unfortunately EXIF is not easy to deal with. It would be nice if the data consisted of a string of XML or JSON so that it could be handled using ubiquitous techniques and libraries, but the format itself is rather quirky, made worse by inconsistent implementations by different manufacturers and sometimes different devices from the same manufacturer.
You may have noticed that sometimes even popular mainstream software fails to show sensible interpretations of some EXIF data. Despite that I will have a bash at extracting the EXIF data from images using Python and the Pillow library.