Reading EXIF Data with Python and Pillow

Pretty much any smartphone, tablet or digital camera embeds EXIF data into each photograph it takes, with details of the device itself as well as the individual photo. And just about every piece of software which can display or edit digital images can read and display that data. Or at least attempt to...!

Unfortunately EXIF is not easy to deal with. It would be nice if the data consisted of a string of XML or JSON so that it could be handled using ubiquitous techniques and libraries, but the format itself is rather quirky, made worse by inconsistent implementations by different manufacturers and sometimes different devices from the same manufacturer.

You may have noticed that sometimes even popular mainstream software fails to show sensible interpretations of some EXIF data. Despite that I will have a bash at extracting the EXIF data from images using Python and the Pillow library.

The Project

The Image module of the Pillow library for Python includes a method called _getexif. This gives us a dictionary of EXIF data with the keys being numeric codes and the values being in various formats which are not always human-readable.

To obtain something suitable for display to users we therefore need to do a bit of work on this data:

  • Obtain the names corresponding to field codes. Pillow provides most of these in the dictionary PIL.ExifTags.TAGS

  • Process the data items which are not human readable in some way to make them so

In this project I will write a module with a method called generate_exif_dict. This takes a filename argument and returns a dictionary with textual fieldnames as keys and values which have been processed into some format you might want to let users of your software see.

The solution I have come up with is far from perfect; these are the main imperfections I am aware of:

  • Not all tags in PIL.ExifTags.TAGS may be present in an image's EXIF data (PIL.ExifTags.TAGS has 254 items but the image I'll use for testing only has 48 of these)

  • Not all tags in an image's EXIF data may be in PIL.ExifTags.TAGS (my image has two of these)

  • Some data may be in proprietary formats not published by manufacturers and therefore of no use to anyone else*

  • I have only attempted processing of the more common and useful tags - the more obscure ones have been left as they are

  • The module was developed around Panasonic Lumix images: photos taken on cameras or devices from other manufacturers might give inconsistent results

* My photo has a large amount of what looks like gobbledygook in MakerNote. I initially assumed this was a bug in Pillow but it is actually correct, if not very useful. Linux Mint's Xviewer displays this as "28120 bytes undefined data" and Gimp doesn't even bother showing anything.

Starting to Code

The project consists of the following files which you can download as a zip or clone/download the Github repository if you prefer. You will also need an image, either your own or save the one above.

  • exif.py
  • exifdemo.py

Source Code Links

ZIP File
GitHub

This is exif.py.

exif.py

import datetime
from fractions import Fraction

from PIL import Image
import PIL.ExifTags


def generate_exif_dict(filepath):

    """
    Generate a dictionary of dictionaries.

    The outer dictionary keys are the names
    of individual items, eg Make, Model etc.

    The outer dictionary values are themselves
    dictionaries with the following keys:

        tag: the numeric code for the item names
        raw: the data as stored in the image, often
        in a non-human-readable format
        processed: the raw data if it is human-readable,
        or a processed version if not.
    """

    try:

        image = Image.open(filepath)

        exif_data_PIL = image._getexif()

        exif_data = {}

        for k, v in PIL.ExifTags.TAGS.items():

            if k in exif_data_PIL:
                value = exif_data_PIL[k]
            else:
                value = None

            if len(str(value)) > 64:
                value = str(value)[:65] + "..."

            exif_data[v] = {"tag": k,
                            "raw": value,
                            "processed": value}

        image.close()

        exif_data = _process_exif_dict(exif_data)

        return exif_data

    except IOError as ioe:

        raise


def _derationalize(rational):

    return rational[0] / rational[1]


def _create_lookups():

    lookups = {}

    lookups["metering_modes"] = ("Undefined",
                                 "Average",
                                 "Center-weighted average",
                                 "Spot",
                                 "Multi-spot",
                                 "Multi-segment",
                                 "Partial")

    lookups["exposure_programs"] = ("Undefined",
                                    "Manual",
                                    "Program AE",
                                    "Aperture-priority AE",
                                    "Shutter speed priority AE",
                                    "Creative (Slow speed)",
                                    "Action (High speed)",
                                    "Portrait ",
                                    "Landscape",
                                    "Bulb")

    lookups["resolution_units"] = ("",
                                   "Undefined",
                                   "Inches",
                                   "Centimetres")

    lookups["orientations"] = ("",
                               "Horizontal",
                               "Mirror horizontal",
                               "Rotate 180",
                               "Mirror vertical",
                               "Mirror horizontal and rotate 270 CW",
                               "Rotate 90 CW",
                               "Mirror horizontal and rotate 90 CW",
                               "Rotate 270 CW")

    return lookups


def _process_exif_dict(exif_dict):

    date_format = "%Y:%m:%d %H:%M:%S"

    lookups = _create_lookups()

    exif_dict["DateTime"]["processed"] = \
        datetime.datetime.strptime(exif_dict["DateTime"]["raw"], date_format)

    exif_dict["DateTimeOriginal"]["processed"] = \
        datetime.datetime.strptime(exif_dict["DateTimeOriginal"]["raw"], date_format)

    exif_dict["DateTimeDigitized"]["processed"] = \
        datetime.datetime.strptime(exif_dict["DateTimeDigitized"]["raw"], date_format)

    exif_dict["FNumber"]["processed"] = \
        _derationalize(exif_dict["FNumber"]["raw"])
    exif_dict["FNumber"]["processed"] = \
        "f{}".format(exif_dict["FNumber"]["processed"])

    exif_dict["MaxApertureValue"]["processed"] = \
        _derationalize(exif_dict["MaxApertureValue"]["raw"])
    exif_dict["MaxApertureValue"]["processed"] = \
        "f{:2.1f}".format(exif_dict["MaxApertureValue"]["processed"])

    exif_dict["FocalLength"]["processed"] = \
        _derationalize(exif_dict["FocalLength"]["raw"])
    exif_dict["FocalLength"]["processed"] = \
        "{}mm".format(exif_dict["FocalLength"]["processed"])

    exif_dict["FocalLengthIn35mmFilm"]["processed"] = \
        "{}mm".format(exif_dict["FocalLengthIn35mmFilm"]["raw"])

    exif_dict["Orientation"]["processed"] = \
        lookups["orientations"][exif_dict["Orientation"]["raw"]]

    exif_dict["ResolutionUnit"]["processed"] = \
        lookups["resolution_units"][exif_dict["ResolutionUnit"]["raw"]]

    exif_dict["ExposureProgram"]["processed"] = \
        lookups["exposure_programs"][exif_dict["ExposureProgram"]["raw"]]

    exif_dict["MeteringMode"]["processed"] = \
        lookups["metering_modes"][exif_dict["MeteringMode"]["raw"]]

    exif_dict["XResolution"]["processed"] = \
        int(_derationalize(exif_dict["XResolution"]["raw"]))

    exif_dict["YResolution"]["processed"] = \
        int(_derationalize(exif_dict["YResolution"]["raw"]))

    exif_dict["ExposureTime"]["processed"] = \
        _derationalize(exif_dict["ExposureTime"]["raw"])
    exif_dict["ExposureTime"]["processed"] = \
        str(Fraction(exif_dict["ExposureTime"]["processed"]).limit_denominator(8000))

    exif_dict["ExposureBiasValue"]["processed"] = \
        _derationalize(exif_dict["ExposureBiasValue"]["raw"])
    exif_dict["ExposureBiasValue"]["processed"] = \
        "{} EV".format(exif_dict["ExposureBiasValue"]["processed"])

    return exif_dict

generate_exif_dict

After opening the image with Pillow and grabbing its EXIF data with _getexif() we create an empty dictionary. We then iterate PIL.ExifTags.TAGS and if the key also exists in the image data we store it in the value variable, otherwise value is set to None.

Some EXIF data can be very long, for example the MakerNote I mentioned above. Any data longer than 64 characters - an arbitrary amount - is truncated and an ellipsis added.

An item is then added to the exif_data dictionary, the key being the value from PIL.ExifTags.TAGS (ie the name such as Make or Model) and the value being a dictionary containg the tag number, the raw value and the processed value. The processed value is set to the same as the raw value but some will be changed later.

Finally we pass the completed dictionary to _process_exif_dict to overwrite some of the processed values to a more human-friendly format.

_derationalize

A number of data items are delivered to us as "rationals", ie tuples of 2 numbers. For example the aperture of 8.0 is (80, 10) so we need to divide 80 by 10 to get 8.0. This function is for use by _process_exif_dict to streamline the process.

_create_lookups

Some values are encoded as an index, for example MeteringMode can take a value between 0 and 6. The textual equivalents are added to a dictionary as tuples so that we can retrieve them using the encoded values as indexes.

The first two start at 0. Fine. The second two start at 1. Hmm...! I have padded these with empty strings.

_process_exif_dict

This is probably the ugliest bit of code I have ever written, even with some of the complexity shunted out to _derationalize and _create_lookups.

After creating a date formatting string and a lookups dictionary the following processing is carried out:

  • The 3 dates are strings so are converted to actual dates, although they look identical when printed

  • FNumber is derationalized, and then prefixed with the letter "f"

  • MaxApertureValue is processed in a similar way. Additionally zoom lenses with variable apertures may give us many decimal places so the processed value is formatted to just 1dp

  • FocalLength is derationalized and then suffixed with "mm"

  • FocalLengthIn35mmFilm is not a rational so just needs to be suffixed with "mm". (Why is FocalLength a rational but FocalLengthIn35mmFilm a single value? I don't know.)

  • Orientation, ResolutionUnit, ExposureProgram and MeteringMode are all set from the lookup dictionary as described above.

  • ExposureTime is derationalized and then the denominator limited as otherwise it can get ridiculously long. For example my image has an exposure time as a rational of 10/300 which is 0.03 recurring. Without limiting the denominator this is displayed as 4803839602528529/144115188075855872! I think 1/30 is precise enough.

  • ExposureBiasValue (better known as exposure compensation) is derationalized and suffixed with "EV", or what I would refer to as stops.

That's all the processing for now. At a later date I'll revisit the code and add processing for a few more fields.

Now let's write a short program to use the above module and print the results.

exifdemo.py

import exif


def main():

    print("-----------------")
    print("| codedrome.com |")
    print("| Pillow EXIF   |")
    print("-----------------")

    try:

        filepath = "vaults_theatre.jpg"

        exif_dict = exif.generate_exif_dict(filepath)

        print_exif_dict(exif_dict)

    except IOError as ioe:

        print(ioe)


def print_exif_dict(exif_dict):

    for k, v in exif_dict.items():

        if v["raw"] is not None:
            print(k)
            print("-" * len(k))
            print("    tag:       {}".format(v["tag"]))
            print("    raw:       {}".format(v["raw"]))
            print("    processed: {}\n".format(v["processed"]))


main()

This is all very straightforward - we just call exif.generate_exif_dict and throw the result at the print_exif_dict which iterates the dictionary, printing each name as a heading and then the tag number, raw value and processed value.

Now we can run the program. I am using this image but just change the filename if you are using your own.

Running the Program

python3.7 exifdemo.py

This will give us the following output.

Program Output (partial)

-----------------
| codedrome.com |
| Pillow EXIF   |
-----------------

Make
----
    tag:       271
    raw:       Panasonic
    processed: Panasonic

Model
-----
    tag:       272
    raw:       DMC-GX80
    processed: DMC-GX80

Orientation
-----------
    tag:       274
    raw:       1
    processed: Horizontal

XResolution
-----------
    tag:       282
    raw:       (180, 1)
    processed: 180

YResolution
-----------
    tag:       283
    raw:       (180, 1)
    processed: 180

ResolutionUnit
--------------
    tag:       296
    raw:       2
    processed: Inches

Software
--------
    tag:       305
    raw:       GIMP 2.8.16
    processed: GIMP 2.8.16

DateTime
--------
    tag:       306
    raw:       2019:05:16 16:26:48
    processed: 2019-05-16 16:26:48

ExposureTime
------------
    tag:       33434
    raw:       (10, 300)
    processed: 1/30

FNumber
-------
    tag:       33437
    raw:       (80, 10)
    processed: f8.0

ExposureProgram
---------------
    tag:       34850
    raw:       3
    processed: Aperture-priority AE

ISOSpeedRatings
---------------
    tag:       34855
    raw:       3200
    processed: 3200

DateTimeOriginal
----------------
    tag:       36867
    raw:       2019:05:01 14:03:13
    processed: 2019-05-01 14:03:13

DateTimeDigitized
-----------------
    tag:       36868
    raw:       2019:05:01 14:03:13
    processed: 2019-05-01 14:03:13

ExposureBiasValue
-----------------
    tag:       37380
    raw:       (0, 100)
    processed: 0.0 EV

MaxApertureValue
----------------
    tag:       37381
    raw:       (925, 256)
    processed: f3.6

MeteringMode
------------
    tag:       37383
    raw:       5
    processed: Multi-segment

FocalLength
-----------
    tag:       37386
    raw:       (120, 10)
    processed: 12.0mm

ExifImageWidth
--------------
    tag:       40962
    raw:       800
    processed: 800

ExifImageHeight
---------------
    tag:       40963
    raw:       534
    processed: 534

FocalLengthIn35mmFilm
---------------------
    tag:       41989
    raw:       24
    processed: 24mm

This is an edited version of the output. The image at the top of the page has 48 pieces of EXIF data (plus 2 not in PIL.ExifTags.TAGS) but the output shown here is only the most useful. If you use a different image from a different camera manufacturer you will probably see some differences in the fields displayed.