Benford’s Law in Python

I recently posted an article on Zipf's Law and the application of the Zipfian Distribution to word frequencies in a piece of text. A closely related concept is Benford's Law which describes the distribution of the first* digits of many, if not most, sets of numeric data. In fact the two are so closely related that the Benford Distribution can be considered as special case of the Zipfian Distribution.

Continue reading

Zipf’s Law in Python

Zipf's Law describes a probability distribution where each frequency is the reciprocal of its rank multiplied by the highest frequency. Therefore the second highest frequency is the highest multiplied by 1/2, the third highest is the highest multiplied by 1/3 and so on.

This is best illustrated with a graph.

In this post I will write a project in Python to apply Zipf's Law to what is probably it's best known use, that of analysing word frequencies in a piece of text.

Continue reading

Exporting PostgreSQL Data to Excel with Python

There is a baffling selection of reporting software out there with very sophisticated functionality and users can put together reports impressive enough to satisfy any manager or board.

However, many people put pragmatics over aesthetics and will say "can't I just get the data in a spreadsheet?"

In this post I will put together a very simple solution to the problem of exporting data from PostgreSQL to an Excel spreadsheet using psycopg2 for the database access and openpyxl for the spreadsheet creation.

Continue reading

Tkinter Pillow Application 0.2

This is Version 0.2 of my Tkinter Pillow Application, Version 0.1 being here.

This is an ongoing project to develop an image editing application using Tkinter for the UI (with other GUI toolkits as a long-term objective) and the Pillow library for image editing functionality.

Version 0.1 established the overall architecture of the solution and enabled users to open and display an image. For Version 0.2 I will add a number of essential improvements to the user interface.

Continue reading

Moving Averages in Python

Everyone understands averages, both their meaning and how to calculate them. However, there are situations, particularly when dealing with real-time data, when a conventional average is of little use because it includes old values which are no longer relevant and merely give a misleading impression of the current situation.

The solution to this problem is to use moving averages, ie. the average of the most recent values rather than all values, which is the subject of this post.

Continue reading

Z-Scores in Python

So, your child gets 78% in both physics and history. Both pretty good grades but as the reader of geeky blogs like this you believe the sciences are more important than the humanities and would have preferred your child to do better in physics than history.

However, we are not necessarily comparing like with like here: 78% in one subject is probably not equivalent to 78% in another. Rather than the absolute percentages we need to calculate and compare the Z-Scores which take into account the averages and ranges of the entire set of scores.

Continue reading

Tkinter Pillow Application 0.1

In my post An Introduction to Image Manipulation with Pillow I commented that "You could in principle use it [Pillow] as the basis of a sort of lightweight Photoshop type application using perhaps Tkinter or PyQT". At the time I wasn't actually intending to do so but recently the idea has started to appeal to me so I thought I'd give it a go.

Although I'm not attempting to compete with Photoshop this is still a fairly ambitious project which will spread over a number of posts, and to start with I'll just get something very basic up and running.

Continue reading

Reading PostgreSQL Database Schemas with Python

A core principle of relational databases is that a database's schema, or the design of its tables, columns and other objects, is held within the database itself; this means we can retrieve the structure using ordinary SQL queries. In this post I will develop a simple module which uses the psycopg2 DB-API interface to retrieve the tables and columns of a PostgreSQL database.

Continue reading