Image Histograms in Python

The Image module of the Pillow imaging library for Python has a method called histogram. When I first saw it I naively assumed that it generated three nice little graphics.

I was wrong! What the method actually does is to return the frequencies of the colour values 0 to 255 for the three colour channels red, green and blue, or a single set of frequencies for greyscale images. To put it another way, it gives us the raw data for the histograms.

Another Pillow module is ImageDraw which provides a set of methods for drawing in an image. So although Pillow does not actually create histograms it gives us all the data and drawing functionality we need to create them. So let's do so...

The Plan

As you no doubt know each pixel in an image consists of a red, a green and a blue value between 0 and 255. Plotting the frequencies of each value for each of the three channels can give us an idea of both the predominance of each colour as well as the overall brightness of that colour throughout the image. A practical use for such histograms is to gauge how much, if at all, the colour balance of an image needs to be adjusted.

The raw data provided by Pillow's histogram method is a list of integers, and in a 24-bit colour image there are 768 values, the first 256 representing red values from 0 to 255, and the next two blocks of 256 values representing green and blue respectively. In an 8-bit black and white image there are just 256 values.

This is a sample from a colour image, and shows that there are 152 pixels in the image with a red value of 0, 176 with a red value of 1 etc..

Return value of the histogram method (partial)

[152, 176, 439, 1024, 2131, 2887, 3031, 2918, 2855...]

In this project I will write a module with a function called create_histograms. This will take a Pillow image and return a dictionary of Pillow images of histograms, one for greyscale images and three for colour images. The calling code can then either save these or display them in a GUI.

I will be using this image but you might like to use your own.

The Gimp histograms for all three channels of the image are shown below. My versions need to be the same shape of course, but instead of using grey for the histogram itself and a graduated colour bar at the bottom I will draw the individual vertical lines in the colours they correspond to.

The Project

This project consists of the following files which can be downloaded as a zip, or you can clone/download the Github repository if you prefer.

  • colors_histogram.py
  • colors_histogram_demo.py

Source Code Links

ZIP File
GitHub

This is the colors_histogram module.

colors_histogram.py

from PIL import Image, ImageDraw


def create_histograms(image):

    """
    Takes a Pillow image.
    Returns a dictionary of Pillow images of colour
    histograms.
    For colour (mode "RGB") images there are 3 with
    keys "red", "green" and "blue".
    For greyscale (mode "L") images there is one with key "greyscale".
    Raises ValueError if mode is not "RGB" or "L"
    """

    if image.mode == "RGB":

        normalized_frequencies = _create_normalized_frequencies_rgb(image)

        return {"red": _create_histogram((0,), normalized_frequencies["red"]),
                "green": _create_histogram((1,), normalized_frequencies["green"]),
                "blue": _create_histogram((2,), normalized_frequencies["blue"])}

    elif image.mode == "L":

        normalized_frequencies = _create_normalized_frequencies_greyscale(image)

        return {"greyscale": _create_histogram((0,1,2), normalized_frequencies)}

    else:

        raise ValueError("Image must have mode of RGB or L")


def _create_normalized_frequencies_rgb(image):

    frequencies = image.histogram()
    max_freq = max(frequencies)

    normalized_frequencies = {"red": [f / max_freq for f in frequencies[0:256]],
                              "green": [f / max_freq for f in frequencies[256:512]],
                              "blue": [f / max_freq for f in frequencies[512:768]]}

    return normalized_frequencies


def _create_normalized_frequencies_greyscale(image):

    frequencies = image.histogram()
    max_freq = max(frequencies)

    normalized_frequencies = [f / max_freq for f in frequencies]

    return normalized_frequencies


def _create_histogram(channels, frequencies):

    width = 256
    height = 158
    column_width = 1

    im = Image.new("RGB", (width, height), (255,255,255))

    draw = ImageDraw.Draw(im)

    col = [0,0,0]

    for v in range(0, 256):

        for channel in channels:
            col[channel] = v

        draw.line(xy=[(v, height),(v, height - (height * frequencies[v]))],
                  fill=tuple(col),
                  width=column_width)

    return im

create_histograms

This function consists mainly of function calls to create normalized frequencies (which we'll get to in a moment) which are then passed to another function to actually create histograms, the latter being done within the creation of a dictionary.

As you can see there are two separate tasks here, one for colour images and one for black and white. Other image types will raise an exception.

_create_normalized_frequencies_rgb

The actual frequencies aren't much use for drawing histograms. What we need is the frequency as a fraction of the highest frequency.

In my image the highest frequency is 14966 (which incidentally is 35 red). If you look at the histogram for the red channel this is represented by the peak which hits the top. For each of the frequencies the normalized frequency is the actual frequency divided by the highest which gives a real value between 0 and 1.

Here are a few of examples:

FrrequencyMax FrequencyNormalized Frequency
14966149661
10522149660.703060269
7483149660.5
228149660.015234531

These normalized values tell us how far up the histogram each column needs to go. For example the first column here goes all the way up to the top, the third goes 0.5 of the way to the top and so on. Therefore all we need to do to calculate a column height in pixels is to multiply the histogram height by the normalized value.

For RGB images we need three sets of these frequencies in a dictionary. These are calculated in list comprehensions as the dictionary is being created. Note how the list provided by Pillow is sliced up into three chunks: [0:256], [256:512] and [512:768].

_create_normalized_frequencies_greyscale

This works on the same principle as the previous function but is simpler as we only need one set of values.

_create_histogram

I have written a lot of data vizualization code over the years in various languages and the biggest problem is allowing for different ranges on both the x and y axes and then scaling these up or down to the required image size.

Here I haven't bothered with any of that but have just hard coded the image size as I know we are always going to be dealing with exactly 256 values. A rare luxury! The height is also hard coded to the width divided by the Golden Ratio 1.618, rounded down to the nearest integer.

Next we create a new Pillow image with RGB colour depth, a tuple for the size, and another tuple of RGB values for the background colour. The ImageDraw.Draw method then gives us an object we can use to draw on the image.

The channels argument is a tuple specifying which channel or channels the histogram is being drawn for. It will be (0,) for red, (1,) for green and (2,) for blue, or (0,1,2) for greyscale where all three RGB values are the same.

We now iterate from 0 to 255.The col list is used to hold the RGB values for the current column in the histogram, and the relevant item(s) is set to the current value of v.

Finally we draw a line from the bottom up to the required height, calculated as described in the section on the _create_normalized_frequencies_rgb function. We also pass the col list cast to a tuple, and the line width.

The module is now finished so let's write a bit of code to try it out.

colors_histogram_demo.py

from PIL import Image, ImageDraw

import colors_histogram


def main():

    print("--------------------")
    print("| codedrome.com    |")
    print("| Pillow Histogram |")
    print("--------------------")

    filename = "central_st_giles_colour.jpg"

    try:

        image = Image.open(filename)

        histograms = colors_histogram.create_histograms(image)

        if image.mode == "RGB":

            histograms["red"].save(filename + "_histogram_red.png", "PNG")
            histograms["green"].save(filename + "_histogram_green.png", "PNG")
            histograms["blue"].save(filename + "_histogram_blue.png", "PNG")

        elif image.mode == "L":

            histograms["greyscale"].save(filename + "_histogram_greyscale.png", "PNG")

        image.close()

    except IOError as e:

        print(e)

    except ValueError as e:

        print(e)


main()

Firstly edit filename if you are using your own photo.

After opening an image we pass it to colors_histogram.create_histograms, catching the returned dictionary in the histograms variable. Depending on whether the image is colour or black and white we then save the three or the one histogram before closing the image.

There are two possible errors we need to catch here. An IOError is raised if there is a problem opening the image or saving the histograms, and a ValueError is raised if the image's mode isn't "RGB" or "L".

Now let's run the program.

Running the Program

python3.7 colors_histogram_demo.py

Open the folder where your code is and you will find three new png files (or one for monochrome images).

I am pleased with the result and I think using actual colours is very effective. The histograms are deliberately minimalist as their primary use is within a GUI which would provide its own border, headings and other information, similar to the Gimp screenshots above.