Creating a PostgreSQL Database in Python

This article is the first in an ongoing series covering the use of the PostgreSQL RDBMS with Python. Python itself provides no database access functionality but it does provide the Python Database API 2.0 standard (or DB-API) which is implemented by various third-party modules. The module I will be using for this and subsequent posts is psycopg2, the most popular Python/PostgreSQL DB-API module.

In this post I will introduce psycopg2 with a simple program which creates a database and then adds a few tables and views.

Target Audience

Relational database management systems and SQL are large and complex areas so I will not attempt to give an introduction to them here - this article will assume that you understand at least the basics of tables, column types, primary keys and foreign keys, as well as creating them using SQL. The code for this post assumes you have PostgreSQL installed, and ideally pgAdmin as well.

Why Create a Database Schema with Python?

The need to programatically insert, update, delete and query data is obvious, but actually creating a database by running a program is rather less common. For purely in-house projects a typical workflow would be to create the schema on a development database (perhaps using data modelling tools and/or a GUI such as pgAdmin) and then generating a script to run on the production server.

However, if you are writing database-driven software to be distributed to various sites or external clients then manually running scripts becomes too cumbersome to be practical. Many companies will want to distribute a setup program (or suite of programs) to their customers which can be run with little or no sys admin or DBA expertise.

Python is the ideal choice for such this, even if the software being distributed is written in another language. It is straightforward to create one or more standard setup programs to create the database and carry out all other installation and setup tasks, perhaps tweaking them for individual client's requirements.

For this project I will be hard-coding the various CREATE statements, but for a later project I'll look at running SQL scripts held in separate files from Python.

Even if you have no need to create a database with Python, learning to do so provides a simple route into learning how to use psycopg2.

Installing psycopg2

You can install psycopg2 with pip using the following command:

Installing psycopg2

pip install psycopg2

If you have any problems go to either the relevant pypi.org page pypi.org/project/psycopg2 or psycopg2's own installation page initd.org/psycopg/docs/install.html.

I am using PostgreSQL here because it is my favourite but the principle behind DB-API is that various implementations all work in the same way, so you should have no problems adapting this code to your own choice of RDBMS/interface combination.

If you need to find an interface to a different RDBMS go to https://wiki.python.org/moin/DatabaseInterfaces. (This page also lists no less than THIRTEEN other PostgreSQL interfaces alongside psycopg2!)

The Database Schema

For this project I will be creating a very simple database with the two tables shown in this diagram, complete with their primary key and foreign key constraints.

I will also create another table which has date and Boolean columns so that in a later post I can demonstrate inserting these data types. Finally I'll create a view which joins the two tables in the diagram to show data from each.

Generating the CREATE Statements

The chances are your database schema evolved over a period of weeks or months, with lots of fiddling about using a GUI and no definitive set of SQL scripts to recreate it. Fortunately it is easy to generate these scripts with an application such as pgAdmin.

To generate a CREATE script for an object such as a table, view or the database itself, right-click the object in the tree view, mouseover Scripts and click "CREATE Script". This will show the SQL in a new window for you to copy/paste or save to a file.

You can also generate a script for all the database objects (but not the database itself) by right-clicking the database name and selecting Backup. Enter a filename and in the Format dropdown select Plain. Now click the Dump Options #1 tab and select the "Only schema" checkbox. Then click the Backup button followed by the Done button. This may take a few seconds if your database schema is large. The resulting script isn't something you would want to hard-code into Python but as I mentioned above a later post will cover running SQL scripts in external files from Python.

DB-API Workflow

Whatever you need to do with or to your database, the basic steps are the same:

  • Call the psycopg2.connect function with the relevant connection string - this returns an object of the Connection class

  • Call the Connection object's cursor function - it returns an object of the Cursor class

  • For SELECT queries, run the Cursor object's fetchall, fetchmany or fetchone methods with a string of SQL

  • For other queries, call the Cursor object's execute method with a string of SQL, then call the Connection object's commit or rollback methods (everything happens in a transaction unless you specify otherwise)

  • Call the Cursor object's close method

  • Call the Connection object's close method

It probably goes without saying that all the above should be in a try/except block.

The Code

The code for this project consists of the following files:

  • pgconnection.py
  • pgddl.py

You can download the source code as a zip or clone/download from Github if you prefer.

Source Code Links

ZIP File
GitHub

Let's look at pgconnection.py first.

pgconnection.py

import psycopg2


def get_connection(dbname):

    connect_str = "dbname={} host='localhost' user='user' password='password'".format(dbname)

    return psycopg2.connect(connect_str)

This consists of a single short function which attempts to connect to the database given as a function argument, with the other connection properties - host name, user and password - being hard coded. Before you run this code you'll need to edit this for your own environment, including a user with rights to create databases.

As you can see, all you need do to get an open database connection is pass a connection string to psycopg2.connect. There is no exception handling as exceptions are handled by the calling code.

This is a very simplistic solution to getting a database connection but is sufficient for demonstration purposes. I'll cover more comprehensive connection string and connection management in a later post.

Now let's move on to pgddl.py which creates a database, tables and a view.

pgddl.py

from psycopg2.extensions import ISOLATION_LEVEL_AUTOCOMMIT
import psycopg2

import pgconnection


def main():

    """
    Demonstrate creation of PostgreSQL database, tables and views using psycopg2
    """

    print("------------------")
    print("| codedrome.com  |")
    print("| PostgreSQL DDL |")
    print("------------------\n")

    # Create a tuple of dictionaries containing the SQL to create database, tables and views
    queries = ({"Description": "Create database",
                "Database": "postgres",
                "SQL": "CREATE DATABASE codeinpython"},

               {"Description": "Create galleries table ",
                "Database": "codeinpython",
                "SQL": "CREATE TABLE galleries(galleryid serial PRIMARY KEY, name varchar(64) NOT NULL, description varchar(256))"},

               {"Description": "Create photos table ",
                "Database": "codeinpython",
                "SQL": "CREATE TABLE photos(photoid serial PRIMARY KEY, galleryid smallint REFERENCES galleries(galleryid) NOT NULL, title varchar(64) NOT NULL, description varchar(256) NOT NULL, photographer varchar(64) NOT NULL, datetaken date)"},

               {"Description": "Create typesdemo table ",
                "Database": "codeinpython",
                "SQL": "CREATE TABLE typesdemo(serialid serial PRIMARY KEY, intcolumn integer, realcolumn real, varcharcolumn varchar(64), datecolumn date, booleancolumn boolean)"},

               {"Description": "Create view galleriesphotos ",
                "Database": "codeinpython",
                "SQL": "CREATE VIEW galleriesphotos AS SELECT galleries.name AS galleryname, galleries.description AS gallerydescription, photos.title AS phototitle, photos.description AS photodescription FROM galleries LEFT JOIN photos ON photos.galleryid = galleries.galleryid"})

    # iterate and run queries
    try:

        for query in queries:

            conn = pgconnection.get_connection(query["Database"])
            conn.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT)
            cursor = conn.cursor()

            cursor.execute(query["SQL"])

            print("Executed {}".format(query["Description"]))

            cursor.close()
            conn.close()

    except psycopg2.ProgrammingError as e:

        print(e)


main()

At the top of the file we import psycopg2, and also ISOLATION_LEVEL_AUTOCOMMIT from psycopg2.extensions. As I mentioned above everything happens inside a transaction unless we specify otherwise and ISOLATION_LEVEL_AUTOCOMMIT allows us to do so. The psycopg2 module is also imported.

The main function consists of two sections: firstly we create a tuple of dictionaries containing the various SQL strings needed to create the bits and pieces of our database, then we iterate the tuple, running each piece of SQL in turn. In a production solution you would probably split these into two or more functions, perhaps even holding the SQL in a separate file or files: a topic for a future post.

Each dictionary has a Description which will be printed out just to keep the user informed of what's happening, the name of the database, and the SQL itself. The SQL creating tables etc. is of course run on the database itself, but the first query which creates the database is run on the postgres database.

Within the for/in loop iterating the tuple we follow the workflow described in the bullet points above: get a connection to the relevant database, set its isolation level to prevent transactions being used as they are irrelevant here, and get a Cursor. We then run the current bit of SQL with the Cursor's execute method, print a message and then close both the cursor and the connection. If we were running a series of SQL statements we knew are going to be on the same database we could just open a connection once, use it multiple times in a loop, and then close it.

The loop is enclosed in a try block which will catch any of the exceptions which could occur at various stages, including pgconnection.get_connection.

As you can see this creates a very compact and easily-maintainable database creation utility. If your schema changes all you need do it add, remove or edit the individual dictionaries in the tuple.

Now we can run the program with this command:

Running the Program

python3.7 pgddl.py

The output is

Program Output

-----------------
| codedrome.com  |
| PostgreSQL DDL |
-----------------

Executed Create database
Executed Create galleries table
Executed Create photos table
Executed Create typesdemo table
Executed Create view galleriesphotos

We get a message for each of the five CREATE statements to confirm that they have been run, and if you go to pgAdmin you can see the tables in the tree view on the left. (You will need to right-click the database and click Refresh, or hit F5.) In the screenshot one of the table nodes is expanded to show the 6 columns and 2 constraints, ie. the primary key and foreign key. If you scroll down a bit you will also see the view.

What's Next?

Having created the database, a few tables and a view we can then go on to insert, update and delete data, and also query the tables and view. This will be the topic of the next post.