PostgreSQL Data Manipulation in Python

In a previous post I introduced the psycopg2 Python/PostgreSQL interface and used it to create a database, a few tables and a view. In this post I will demonstrate inserting, updating, deleting and selecting data using the database created in the previous post, as well as showing what happens if we try to violate database constraints.

This post builds on the previous one so assumes you have read it and run the code, and still have the database.

The source code for this project consists of the following two files:

  • pgconnection.py
  • pgdml.py

which can be downloaded as a zip or cloned/downloaded from Github. This project shares a file, pgconnection.py, with the previous post so you might like to save the files for this post in the same folder and share pgconnection.py.

Source Code Links

ZIP File
GitHub

Now let's look at the code...

The get_connection function

pgconnection.py

import psycopg2


def get_connection(dbname):

    connect_str = "dbname={} host='localhost' user='user' password='password'".format(dbname)

    return psycopg2.connect(connect_str)

This is a simple function which attempts to connect to the specified database with hard-coded host, user and password; you'll need to change these to your own values.

Imports and the main function

pgdml.py Part 1

import datetime

import psycopg2
from psycopg2.extensions import ISOLATION_LEVEL_AUTOCOMMIT

import pgconnection


def main():

    print("------------------")
    print("| codedrome.com  |")
    print("| PostgreSQL DML |")
    print("------------------\n")

    #insert_galleries()
    #insert_photos()
    #insert_typesdemo()

    #update_photos()

    #delete_photos()

    #select_photos()

    #select_galleriesphotos()

    #insert_photo_invalid_fk()

    #delete_fk()

    #reset_serial()

We need to import psycopg2, and for this project we also need ISOLATION_LEVEL_AUTOCOMMIT which lives in psycopg2.extensions. This lets us avoid using transactions if we want to.

The main function consists mostly of function calls which are commented out. We can uncomment them and run them one at a time.

Inserting Data

The schema diagram is a reminder of two of the tables created in the previous post, and which we will now be inserting data into. The photos table has a foreign key constraint: its galleryid column references the primary key of the galleries table. We therefore need to insert data into galleries first. The next bit of code shows the function to do this, followed by the function to insert data into the photos table.

pgdml.py Part 2

def insert_galleries():

    galleries = ({"name": "London 2018", "description": "Photos of London in 2018"},
                 {"name": "Paris 2016", "description": "Photos of Paris in 2016"},
                 {"name": "Oslo 2018", "description": "Photos of Oslo in 2018"},
                 {"name": "Copenhagen 2017", "description": "Photos of Copenhagen in 2017"},
                 {"name": "Edinburgh 2015", "description": "Photos of Edinburgh in 2015"})

    try:

        conn = pgconnection.get_connection("codeinpython")
        conn.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT)
        cursor = conn.cursor()

        for gallery in galleries:

            cursor.execute("""INSERT INTO galleries(name, description)
                              VALUES (%(name)s, %(description)s);""",
                              {'name': gallery["name"], 'description': gallery["description"]})

            print("Gallery inserted")

        cursor.close()
        conn.close()

    except psycopg2.Error as e:

        print(type(e))

        print(e)


def insert_photos():

    photos = ({"galleryid": 1, "title": "London Photo 1", "description": "London Photo 1", "photographer": "Chris Webb", "datetaken": datetime.date(2018, 5, 17)},
              {"galleryid": 1, "title": "London Photo 2", "description": "London Photo 2", "photographer": "Chris Webb", "datetaken": datetime.date(2018, 5, 18)},
              {"galleryid": 2, "title": "Paris Photo 1", "description": "Paris Photo 1", "photographer": "Chris Webb", "datetaken": datetime.date(2016, 9, 1)},
              {"galleryid": 2, "title": "Paris Photo 2", "description": "Paris Photo 2", "photographer": "Chris Webb", "datetaken": datetime.date(2016, 9, 1)},
              {"galleryid": 3, "title": "Oslo Photo 1", "description": "Oslo Photo 1", "photographer": "Chris Webb", "datetaken": datetime.date(2018, 7, 5)},
              {"galleryid": 3, "title": "Oslo Photo 2", "description": "Oslo Photo 2", "photographer": "Chris Webb", "datetaken": datetime.date(2018, 7, 5)},
              {"galleryid": 4, "title": "Copenhagen Photo 1", "description": "Copenhagen Photo 1", "photographer": "Chris Webb", "datetaken": datetime.date(2017, 4, 12)},
              {"galleryid": 4, "title": "Copenhagen Photo 2", "description": "Copenhagen Photo 2", "photographer": "Chris Webb", "datetaken": datetime.date(2017, 4, 13)},
              {"galleryid": 5, "title": "Edinburgh Photo 1", "description": "Edinburgh Photo 1", "photographer": "Chris Webb", "datetaken": datetime.date(2015, 8, 21)},
              {"galleryid": 5, "title": "Edinburgh Photo 2", "description": "Edinburg Photo 2", "photographer": "Chris Webb", "datetaken": datetime.date(2015, 8, 21)})

    try:

        conn = pgconnection.get_connection("codeinpython")
        cursor = conn.cursor()

        for photo in photos:

            cursor.execute("""INSERT INTO photos(galleryid, title, description, photographer, datetaken)
                              VALUES (%(galleryid)s, %(title)s, %(description)s, %(photographer)s, %(datetaken)s);""",
                              photo)

            conn.commit()

            print("Photo inserted")

        cursor.close()
        conn.close()

    except psycopg2.Error as e:

        print(type(e))

        print(e)


def insert_typesdemo():

    row = {"intcolumn": 123, "realcolumn": 456.789, "varcharcolumn": "Now is the winter of our discontent", "datecolumn": datetime.date(2018, 2, 17), "booleancolumn": True}

    try:

        conn = pgconnection.get_connection("codeinpython")
        cursor = conn.cursor()

        cursor.execute("""INSERT INTO typesdemo(intcolumn, realcolumn, varcharcolumn, datecolumn, booleancolumn)
                          VALUES (%(intcolumn)s, %(realcolumn)s, %(varcharcolumn)s, %(datecolumn)s, %(booleancolumn)s);""",
                          row)

        conn.commit()

        print("typesdemo row inserted")

        cursor.close()
        conn.close()

    except psycopg2.Error as e:

        print(type(e))

        print(e)

The insert_galleries function first creates a tuple of dictionaries holding gallery data to insert. However, it doesn't really matter how the data to be inserted comes to you - it might be like this, some different kind of data structure, an instance of a class, or even individual variables. As long as you can get at the separate values you can use them to construct the necessary INSERT statement.

Next we enter a try block, firstly calling pgconnection.get_connection and then setting its isolation level to ISOLATION_LEVEL_AUTOCOMMIT so we don't need to bother with handling transactions. (In the next function I haven't done this and call conn.commit() instead. This is just to demonstrate the two techniques.)

We then enter a loop, iterating the galleries. The first line in the loop is at the core of this whole project - it constructs an SQL statement and then executes it. I have split the one (logical) line of code into three (actual) lines to make it easier to describe and understand.

The first line starting "INSERT INTO" is straightforward and static, in that it is copied as-is to the statement to be executed.

The second line starting "VALUES" contains placeholders for the two columns of the table. These are in the format %([column name])s. There are two important points to note here:

  • This is the format used for PostgreSQL - if you wish to adapt this code for a different RDBMS you might need to use a different style. To get the style for your own DB-API implementation use [modulename].paramstyle, eg psycopg2.paramstyle.

  • Whatever the column type always finish off the placeholder with s for string. This is because all values are converted by psycopg2 to a string for inclusion in the SQL.

The last line is the second argument to cursor.execute, and consists of a dictionary containing the values to be used in the VALUES part of the SQL statement. The keys of the dictionary must match the names in the brackets in the placeholders. In this case both values are strings so are used unchanged, but for numbers, dates etc. psycopg2 does the hard work of converting the values into the format required by the database, as we'll see a bit later.

After the loop finishes we just need to close the cursor and connection.

The insert_photos function works in almost the same way but there are two small differences:

  • As mentioned above I haven't set the connection's isolation level, therefore we need to call conn.commit().

  • In insert_galleries I constructed a dictionary in the cursor.execute function call from an existing dictionary. A waste of time of course but it does demonstrate that if your data isn't already in a dictionary you can construct one from individual values. For insert_photos I have just passed the existing dictionary as it is, saving a bit of coding and processor time.

The last insert function is insert_typesdemo. This works in the same way as the previous two, but here the table has numeric, date and Boolean columns to demonstrate that psycopg2 can convert various Python data types to SQL-friendly formats with no effort from us.

Exception Handling

Any code interacting with the outside world needs comprehensive exception handling and all the functions in this project catch psycopg2.Error. This is the superclass of several more specific exceptions and I have taken a minimalist (ie. lazy!) approach to exception handling by just catching psycopg2.Error.

In all functions I first print the specific exception type, and then the exception itself. As we'll see later psycopg2 has very helpful exception messages.

Don't use .format()

Rather than using the %([column name])s style placeholders it would probably be easier to just use {} and .format(). However, this is a very bad idea and should be avoided. This avoids inadvertently ending up with something unpleasant in the SQL (such as DROP DATABASE!) and also improves efficiency.

Running the Program

We now have enough to run, so uncomment the first three function calls in main and run this:

Running the Program

python3.7 pgdml.py

The output is

Program Output

------------------
| codedrome.com  |
| PostgreSQL DML |
------------------

Gallery inserted
Gallery inserted
Gallery inserted
Gallery inserted
Gallery inserted
Photo inserted
Photo inserted
Photo inserted
Photo inserted
Photo inserted
Photo inserted
Photo inserted
Photo inserted
Photo inserted
Photo inserted
typesdemo row inserted

Go to pgAdmin and query the tables; this is the data from galleries, photos and typesdemo respectively.

Updating Data

That's the basic principles covered and it's plain sailing from now on as those principles can be applied to updating, deleting and selecting data.

You might have spotted a deliberate typo in the photos data. In the description column of the last photo, photoid 10, Edinburgh is missing its "h", so let's run an UPDATE to correct it.

pgdml.py Part 3

def update_photos():

    update_dict = {"description": "Edinburgh Photo 2", "photoid": 10}

    try:

        conn = pgconnection.get_connection("codeinpython")
        cursor = conn.cursor()

        cursor.execute("UPDATE photos SET description = %(description)s WHERE photoid = %(photoid)s;", update_dict)

        conn.commit()

        print("Photo updated")

        cursor.close()
        conn.close()

    except psycopg2.Error as e:

        print(type(e))

        print(e)

There's nothing here we haven't seen before - a dictionary holding just the photoid of the row we want to change, and the correct description. This is used in a cursor.execute call in the same way as in the INSERT statements above. If you uncomment update_photos in main, run the program again and look at the data in the photos table you'll see the error has been corrected.

Deleting Data

Actually, we might not even like that last photograph of Edinburgh so let's delete it.

pgdml.py Part 4

def delete_photos():

    delete_dict = {"photoid": 10}

    try:

        conn = pgconnection.get_connection("codeinpython")
        cursor = conn.cursor()

        cursor.execute("DELETE FROM photos WHERE photoid = %(photoid)s;", delete_dict)

        conn.commit()

        print("Photo deleted")

        cursor.close()
        conn.close()

    except psycopg2.Error as e:

        print(type(e))

        print(e)

Familiar ground again, this time with a DELETE FROM. The only point to notice is that even though we only need photoid as the second argument to execute it still needs to be in a dictionary. Passing a variable name or a literal will raise an error.

Selecting Data

Now for a couple of functions to select data, firstly from the photos table and then from the galleriesphotos view created in the previous post.

pgdml.py Part 5

def select_photos():

    # Wildcards:
    # percent sign % for 0 or more characters
    # underscore _ for exactly 1 character
    select_dict = {"where_like": '%Oslo%'}

    try:

        conn = pgconnection.get_connection("codeinpython")
        cursor = conn.cursor()

        # To make LIKE case insensitive use ILIKE
        cursor.execute("SELECT galleryid, title, description, photographer, datetaken FROM photos WHERE description LIKE %(where_like)s;", select_dict)

        # get a list of tuples containing the data
        data = cursor.fetchall()

        cursor.close()
        conn.close()

        for row in data:

            print(row)

    except psycopg2.Error as e:

        print(type(e))

        print(e)


def select_galleriesphotos():

    try:

        conn = pgconnection.get_connection("codeinpython")
        cursor = conn.cursor()

        cursor.execute("SELECT galleryname, gallerydescription, phototitle, photodescription FROM galleriesphotos")

        # get a list of tuples containing the data
        data = cursor.fetchall()

        cursor.close()
        conn.close()

        for row in data:

            print(row)

    except psycopg2.Error as e:

        print(type(e))

        print(e)

In select_photos we firstly create a dictionary with a value to use in the WHERE clause of the SELECT statement. I have called it "where_like" but you can use anything as long as the parameter name matches. I have also hard-coded a value but for real-world applications you would use a value entered by a user as search or filter criteria. Note the comments regarding wildcards - here we're looking for anything containing "Oslo".

Most of this function is familiar but the SELECT statement as it stands is case-sensitive. As the comment says you can use ILIKE to make it case-insensitive which you would probably want to do if you were getting search/filter criteria from users.

After calling cursor.execute we then call cursor.fetchall() to actually get the data. If there is no matching data you still get a list but it will be empty so you might want to check for this and act appropriately. (There are a couple more fetch... methods, fetchmany() and fetchone() which I'll cover in a future post.)

The select_galleriesphotos works in the same way but is simpler as it has no WHERE clause.

Uncomment select_photos in main and run the program, and then do the same with select_galleriesphotos. These are the outputs.

Program Output

------------------
| codedrome.com  |
| PostgreSQL DML |
------------------

(3, 'Oslo Photo 1', 'Oslo Photo 1', 'Chris Webb', datetime.date(2018, 7, 5))
(3, 'Oslo Photo 2', 'Oslo Photo 2', 'Chris Webb', datetime.date(2018, 7, 5))

Program Output

------------------
| codedrome.com  |
| PostgreSQL DML |
------------------

('London 2018', 'Photos of London in 2018', 'London Photo 2', 'London Photo 2')
('London 2018', 'Photos of London in 2018', 'London Photo 1', 'London Photo 1')
('Paris 2016', 'Photos of Paris in 2016', 'Paris Photo 2', 'Paris Photo 2')
('Paris 2016', 'Photos of Paris in 2016', 'Paris Photo 1', 'Paris Photo 1')
('Oslo 2018', 'Photos of Oslo in 2018', 'Oslo Photo 2', 'Oslo Photo 2')
('Oslo 2018', 'Photos of Oslo in 2018', 'Oslo Photo 1', 'Oslo Photo 1')
('Copenhagen 2017', 'Photos of Copenhagen in 2017', 'Copenhagen Photo 2', 'Copenhagen Photo 2')
('Copenhagen 2017', 'Photos of Copenhagen in 2017', 'Copenhagen Photo 1', 'Copenhagen Photo 1')
('Edinburgh 2015', 'Photos of Edinburgh in 2015', 'Edinburgh Photo 2', 'Edinburg Photo 2')
('Edinburgh 2015', 'Photos of Edinburgh in 2015', 'Edinburgh Photo 1', 'Edinburgh Photo 1')

Violating Constraints!

All of the code above should have run with no problems, but one of the key responsibilities of an RDBMS is to enforce constraints and thereby maintain data integrity. Inserted data must be of the correct type for its respective column, data must exist for NOT NULL columns, foreign keys must exist in their referenced table and so on.

Let's deliberately try to violate the foreign key constraint, not for the purposes of being perverse but to test that both the database and code handle this situation correctly.

pgdml.py Part 6

def insert_photo_invalid_fk():

    invalid_photo = {"galleryid": 6, "title": "Hong Kong Photo 1", "description": "Hong Kong Photo 1", "photographer": "Chris Webb", "datetaken": datetime.date(2018, 8, 27)}

    try:

        conn = pgconnection.get_connection("codeinpython")
        cursor = conn.cursor()

        cursor.execute("""INSERT INTO photos(galleryid, title, description, photographer, datetaken)
                          VALUES (%(galleryid)s, %(title)s, %(description)s, %(photographer)s, %(datetaken)s);""",
                          invalid_photo)

        conn.commit()

        print("Photo inserted")

        cursor.close()
        conn.close()

    except psycopg2.Error as e:

        print(type(e))

        print(e)


def delete_fk():

    delete_dict = {"galleryid": 1}

    try:

        conn = pgconnection.get_connection("codeinpython")
        cursor = conn.cursor()

        cursor.execute("DELETE FROM galleries WHERE galleryid = %(galleryid)s;", delete_dict)

        conn.commit()

        print("Gallery deleted")

        cursor.close()
        conn.close()

    except psycopg2.Error as e:

        print(type(e))

        print(e)

The first of the two functions above tries to insert a photo with a galleryid of 6, thereby violating the foreign key constraint as the galleries table only has IDs of 1-5. Running it will give you this:

Program Output

------------------
| codedrome.com  |
| PostgreSQL DML |
------------------

<class 'psycopg2.IntegrityError'>
insert or update on table "photos" violates foreign key constraint "photos_galleryid_fkey"
DETAIL: Key (galleryid)=(6) is not present in table "galleries".

The people who wrote psycopg2 have come up with the revolutionary idea of having error messages which actually tell you what the error is. Brilliant!

The second function also violates the foreign key constraint in a different way: it tries to delete a gallery which still has photos attached to it. If you want to delete a gallery you will need to delete all its photos first (or assign them a different galleryid). This is what happens when you run it. They really are spoiling us with helpful messages...

Program Output

------------------
| codedrome.com  |
| PostgreSQL DML |
------------------

<class 'psycopg2.IntegrityError'>
update or delete on table "galleries" violates foreign key constraint "photos_galleryid_fkey" on table "photos"
DETAIL: Key (galleryid)=(1) is still referenced from table "photos".

Resetting IDs

When developing a database-driven application you're likely to run INSERTs, UPDATEs and DELETEs many times during testing and debugging. This might cause problems with primary and foreign keys, for example if you delete all the photos and galleries in our schema, re-INSERT the galleries and then try to re-INSERT the hard-coded photo data it will fail. This is because the galleries with IDs 1-5 used as foreign keys by the photos no longer exist - they will be replaced by 6 to 10.

This is an easy problem to fix. If you open pgAdmin and look in the Sequences node of the tree you will find the three primary key sequences for our tables. The default naming convention is simple: [tablename]_[columnname]_seq.

To reset the sequence to 1 you just need to run the SQL shown in the cursor.execute line of the reset_serial function. This is for the photos table but you can of course substitute any sequence name.

pgdml.py Part 7

def reset_serial():

    try:

        conn = pgconnection.get_connection("codeinpython")
        conn.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT)
        cursor = conn.cursor()

        cursor.execute("ALTER SEQUENCE photos_photoid_seq RESTART WITH 1")

        print("photo table serial reset")

        cursor.close()
        conn.close()

    except psycopg2.Error as e:

        print(type(e))

        print(e)

This and the previous post cover the basics of creating a database and manipulating data using psycopg2. This gets you a long way but of course there is much more ground to cover and I have a number of further articles planned on this important topic.