Reading PostgreSQL Database Schemas with Python

A core principle of relational databases is that a database's schema, or the design of its tables, columns and other objects, is held within the database itself; this means we can retrieve the structure using ordinary SQL queries. In this post I will develop a simple module which uses the psycopg2 DB-API interface to retrieve the tables and columns of a PostgreSQL database.

Edgar Codd, the father of relational databases, set out a list of 13 RDBMS rules (numbered 0 to 12!), and Rule No 4 is...

The data base description is represented at the logical level in the same way as ordinary data, so that authorized users can apply the same relational language to its interrogation as they apply to the regular data.

... so let's do just that.

What Exactly is a Schema?

Firstly though I need to address the ambiguity of the term "schema". Its main meaning is the overall design of the database - its tables, columns, views, constraints and so on. However, for many years now most RDBMSs have supported an extra layer of database object organisation, also known as schemas. (Additionally, Oracle uses the term schema in a different way again, with each user having a separate schema. As this is a PostgreSQL specific post it is irrelevant here.)

In this project the word "schema" in postgresqlschemareader.py refers to the structure of the entire database. Within the code itself "table_schema" is used several times and refers to the subdivision of objects within the database.

Coding the Module

The source code for this project consists of the following two files:

  • postgresqlschemareader.py
  • postgresqlschemareader_test.py

which can be downloaded as a zip or cloned/downloaded from Github.

Source Code Links

ZIP File
GitHub

Getting a List of Tables

This is the first part of postgresqlschemareader.py, consisting of imports, a function to get a list of tables, and a function to print that list.

postgresqlschemareader.py part 1

import psycopg2
import psycopg2.extras


def get_tables(connection):

    """
    Create and return a list of dictionaries with the
    schemas and names of tables in the database
    connected to by the connection argument.
    """


    cursor = connection.cursor(cursor_factory=psycopg2.extras.RealDictCursor)

    cursor.execute("""SELECT table_schema, table_name
                      FROM information_schema.tables
                      WHERE table_schema != 'pg_catalog'
                      AND table_schema != 'information_schema'
                      AND table_type='BASE TABLE'
                      ORDER BY table_schema, table_name""")

    tables = cursor.fetchall()

    cursor.close()

    return tables


def print_tables(tables):

    """
    Prints the list created by get_tables
    """


    for row in tables:

        print("{}.{}".format(row["table_schema"], row["table_name"]))

As well as psycopg2 I have also imported psycopg2.extras to use psycopg2.extras.RealDictCursor when creating a cursor. As you can see in the first line of the get_tables function, this is passed to connection.cursor as the cursor_factory argument to force it to return objects derived from the Python dictionary. (There is also a psycopg2.extras.DictCursor which the psycopg2 documentation describes as "dictionary-like" but later on we'll need to add key/value pairs to the dictionary of tables which you cannot do with DictCursor.)

The SQL query is straightforward: it just retrieves a couple of columns from information_schema.tables. There are many tables in the database which relate to the actual database, including information_schema.tables itself. The purpose of this function is to get the tables containing actual data so the rest are filtered out - anything in the "pg_catalog" or "information_schema" schemas are omitted, as are any tables with a type anything other than "BASE TABLE".

If you read somewhere else the suggestion that you just use "WHERE table_schema = 'public'" ignore it - it's wrong. The default schema for new tables is called "public" but if other schemas have been added their tables will be missed out.

I have just SELECT'ed the table schemas and names but if you look at information_schema.columns in pgAdmin (it lives in the Catalogs node of the tree view) there are no less than 12 columns in the table which you might need for more esoteric purposes.

Having executed the SQL we just need to call fetchall to get the dictionary, close the cursor and return the dictionary. There is no exception handling as this is handled by the calling code.

Next we have a short function to print out the dictionary returned by the previous function in a neat [table_schema].[table_name] format.

The main function

Let's try out the functions we have so far by creating a main function in a separate file, postgresqlschemareader_test.py.

postgresqlschemareader_test.py

import psycopg2

import postgresqlschemareader

def main():

    """
    Test and demonstrate the functions of the postgresqlschemareader module.
    """


    print("--------------------------")
    print("| codedrome.com          |")
    print("| Read PostgreSQL Schema |")
    print("--------------------------\n")

    try:

        conn = psycopg2.connect("dbname=codeinpython host='localhost' user='chris' password='chris'")

        tables = postgresqlschemareader.get_tables(conn)
        print("codeinpython Tables\n===================\n")
        postgresqlschemareader.print_tables(tables)

        # columns = postgresqlschemareader.get_columns(conn, "public", "photos")
        # print("Columns in photos Table\n=======================\n")
        # postgresqlschemareader.print_columns(columns)


        # tree = postgresqlschemareader.get_tree(conn)
        # print("Database codeinpython\n=====================\n")
        # postgresqlschemareader.print_tree(tree)


        conn.close()

    except psycopg2.Error as e:

        print(type(e))

        print(e)


main()

The module name is a bit cumbersome but if I attempted to abbreviate it its purpose might be difficult to decipher. I used the full name here but you might like to alias it, for example:

Making postgresqlschemareader More Concise

import postgresqlschemareader as pgsr

Within a try block we first attempt to open a database connection. I am using the database created and populated in my first two PostgreSQL/psycopg2 posts but you can substitute any database you have access to. The connection is then passed to postgresqlschemareader.get_tables, the return value of which is then passed to postgresqlschemareader.print_tables.

The next two triplets of function calls are commented out for the time being.

Running the Program

We now have enough to run the program with this command:

Running the Program

python3.7 postgresqlschemareader_test.py

The output is

Program Output

--------------------------
| codedrome.com          |
| Read PostgreSQL Schema |
--------------------------

codeinpython Tables
===================

public.galleries
public.photos
public.typesdemo

Getting a List of Columns

Now let's move on to getting a list of columns for a specified table.

postgresqlschemareader.py part 2

def get_columns(connection, table_schema, table_name):

    """
    Creates and returns a list of dictionaries for the specified
    schema.table in the database connected to.
    """


    where_dict = {"table_schema": table_schema, "table_name": table_name}

    cursor = connection.cursor(cursor_factory=psycopg2.extras.RealDictCursor)

    cursor.execute("""SELECT column_name, ordinal_position, is_nullable, data_type, character_maximum_length
                      FROM information_schema.columns
                      WHERE table_schema = %(table_schema)s
                      AND table_name   = %(table_name)s
                      ORDER BY ordinal_position""",
                      where_dict)

    columns = cursor.fetchall()

    cursor.close()

    return columns


def print_columns(columns):

    """
    Prints the list created by get_columns.
    """


    for row in columns:

        print("Column Name:              {}".format(row["column_name"]))
        print("Ordinal Position:         {}".format(row["ordinal_position"]))
        print("Is Nullable:              {}".format(row["is_nullable"]))
        print("Data Type:                {}".format(row["data_type"]))
        print("Character Maximum Length: {}\n".format(row["character_maximum_length"]))

The get_columns function takes not only a connection but also the schema and table names we want to get columns for. The latter two arguments are used to create a dictionary for use with the cursor's execute method.

The information_schema.columns table has no less than 44 columns but I have SELECT'ed five which I think are most useful - obviously edit this to your own requirements if necessary.

The SQL finishes off by ordering the data by ordinal_position. You might get the columns back in this order without the ORDER BY clause, but then again you might not. Rows in a table have no intrinsic order and even if you get the data back in the "right" order some or even most of the time there is no guaratee this will always happen.

Again there is a short function to print out the data from the previous function.

Comment out the first three function calls in main, uncomment the next three and run the program again. This is the output.

Program Output

--------------------------
| codedrome.com          |
| Read PostgreSQL Schema |
--------------------------

Columns in photos Table
=======================

Column Name:              photoid
Ordinal Position:         1
Is Nullable:              NO
Data Type:                integer
Character Maximum Length: None

Column Name:              galleryid
Ordinal Position:         2
Is Nullable:              NO
Data Type:                smallint
Character Maximum Length: None

Column Name:              title
Ordinal Position:         3
Is Nullable:              NO
Data Type:                character varying
Character Maximum Length: 64

Column Name:              description
Ordinal Position:         4
Is Nullable:              NO
Data Type:                character varying
Character Maximum Length: 256

Column Name:              photographer
Ordinal Position:         5
Is Nullable:              NO
Data Type:                character varying
Character Maximum Length: 64

Column Name:              datetaken
Ordinal Position:         6
Is Nullable:              YES
Data Type:                date
Character Maximum Length: None

Combining Function Calls to Create a Tree

Now let's combine the previous functions to get a data structure listing the tables together with their columns.

postgresqlschemareader.py part 3

def get_tree(connection):

    """
    Uses get_tables and get_columns to create a tree-like data
    structure of tables and columns.

    It is not a true tree but a list of dictionaries containing
    tables, each dictionary having a second dictionary
    containing column information.
    """


    tree = get_tables(connection)

    for table in tree:

        table["columns"] = get_columns(connection, table["table_schema"], table["table_name"])

    return tree


def print_tree(tree):

    """
    Prints the tree created by get_tree
    """


    for table in tree:

        print("{}.{}".format(table["table_schema"], table["table_name"]))

        for column in table["columns"]:

            print(" |-{} ({})".format(column["column_name"], column["data_type"]))

The get_tree function is actually very simple as it just calls get_tables and then iterates the tables, calling get_columns on each table. The return value of get_columns is then added to the tables dictionary; this is why I used psycopg2.extras.RealDictCursor as we couldn't do this with list returned by default from execute, or the "dictionary-like" data structure returned when using psycopg2.extras.DictCursor.

Finally we have a function to print out the result of get_tree.

Comment out the second set of three function calls in main, uncomment the last set and run the program to get this:

Program Output

--------------------------
| codedrome.com          |
| Read PostgreSQL Schema |
--------------------------

Database codeinpython
=====================

public.galleries
 |-galleryid (integer)
 |-name (character varying)
 |-description (character varying)
public.photos
 |-photoid (integer)
 |-galleryid (smallint)
 |-title (character varying)
 |-description (character varying)
 |-photographer (character varying)
 |-datetaken (date)
public.typesdemo
 |-serialid (integer)
 |-intcolumn (integer)
 |-realcolumn (real)
 |-varcharcolumn (character varying)
 |-datecolumn (date)
 |-booleancolumn (boolean)

The console output shown here is of little practical use beyond testing and demonstrating the functions, but this module can be used with any front end which allows users to query or edit data, or even administer the database.