Exporting PostgreSQL Data to Excel with Python

There is a baffling selection of reporting software out there with very sophisticated functionality and users can put together reports impressive enough to satisfy any manager or board.

However, many people put pragmatics over aesthetics and will say "can't I just get the data in a spreadsheet?"

In this post I will put together a very simple solution to the problem of exporting data from PostgreSQL to an Excel spreadsheet using psycopg2 for the database access and openpyxl for the spreadsheet creation.

I have already written a few posts on using the psycopg2 PostgreSQL interface for both creating databases and manipulating data. This post will use the database and data created in those earlier posts but you can easily use it with your own database.

openpyxl

openpyxl is "A Python library to read/write Excel 2010 xlsx/xlsm files", and is simple to use but powerful. You can install it using pip/pip3 with this command:

Installing openpyxl

pip(3) install openpyxl

Here are a few useful links:

openpyxl at pypi.org

documentation at readthedocs.io

a handy tutorial on medium.com

The Project

This project consists of a simple module containing one function to take a connection and a query string as arguments and then run the query. The results will be written to an Excel file, the headings and file name of which are also function arguments.

The source code for this project consists of the following two files:

  • pgtoexcel.py
  • pgtoexcel_test.py

which can be downloaded as a zip or cloned/downloaded from Github. This project shares a file, pgconnection.py, with the previous post so you might like to save the files for this post in the same folder and share pgconnection.py.

Source Code Links

ZIP File
GitHub

Now let's look at the code...

pgtoexcel.py

import psycopg2
import openpyxl
from openpyxl.styles import Font


def export_to_excel(connection, query_string, headings, filepath):

    """
    Exports data from PostgreSQL to an Excel spreadsheet using psycopg2.

    Arguments:
    connection - an open psycopg2 (this function does not close the connection)
    query_string - SQL to get data
    headings - list of strings to use as column headings
    filepath - path and filename of the Excel file

    psycopg2 and file handling errors bubble up to calling code.
    """


    cursor = connection.cursor()
    cursor.execute(query_string)
    data = cursor.fetchall()
    cursor.close()

    wb = openpyxl.Workbook()
    sheet = wb.get_active_sheet()

    sheet.row_dimensions[1].font = Font(bold = True)

    # Spreadsheet row and column indexes start at 1
    # so we use "start = 1" in enumerate so
    # we don't need to add 1 to the indexes.

    for colno, heading in enumerate(headings, start = 1):
        sheet.cell(row = 1, column = colno).value = heading

    # This time we use "start = 2" to skip the heading row.
    for rowno, row in enumerate(data, start = 2):
        for colno, cell_value in enumerate(row, start = 1):
            sheet.cell(row = rowno, column = colno).value = cell_value

    wb.save(filepath)

Firstly we need to import psycopg2 and openpyxl, as well as Font from openpyxl.styles.

The export_to_excel function takes a connection, which needs to be open and will be left open. This enables calling code to re-use connections or a connection pool. It also takes a query string which would typically be in the form "SELECT...FROM". The other arguments are a list of strings to use as column headings and the path to save the spreadsheet to.

Within export_to_excel we first grab the data using the method already described in this post.

Now we get to the spreadsheet stuff, firstly creating a new Workbook and getting the current worksheet. Then we set the font of the first row to bold as it will contain the column headings. We use [1] as spreadsheet rows and column indexes start at 1 not 0.

Next we iterate the column headings list using the enumerate function with a start argument of 1 rather than the default 0. This is used to index the columns where we set the cell text.

Next we iterate the rows and columns of data using a nested loop, starting the row iteration at 2 to skip the headings row. Within the loop we simply set the spreadsheet cell values to the data items.

Then all we need to do is save the spreadsheet to the specified file path. There is no exception handling in this function so any exceptions will bubble up to the calling code where they can better be dealt with.

As a quick summary of using openpyxl to create a spreadsheet:

  • Call openpyxl.Workbook() to create a workbook

  • Call get_active_sheet() on the workbook to get a sheet

  • Set sheet.cell([row number], [column number]).value for each cell you want to write to

  • Call save(filepath) on the workbook

Now we can try out the function.

pgtoexcel_test.py

import psycopg2

import pgconnection
import pgtoexcel


def main():

    print("-----------------------")
    print("| codedrome.com       |")
    print("| PostgreSQL to Excel |")
    print("-----------------------")

    try:

        conn = pgconnection.get_connection("codeinpython")
        query_string = "SELECT galleryname, gallerydescription, phototitle, photodescription FROM galleriesphotos"
        filepath = "galleriesphotos.xlsx"

        pgtoexcel.export_to_excel(conn,
                                  query_string,
                                  ("Gallery Name", "Gallery Description", "Photo Title", "Photo Description"),
                                  "filepath")

    except Exception as e:
        print(type(e))
        print(e)


main()

This code uses the pgconnection.get_connection function from earlier posts but you can substitute your own database connection here. We also have query_string and filepath variables which again you can change if you wish. Then all we need to do is call pgtoexcel.export_to_excel.

The exception handling here is minimal but as pgtoexcel.export_to_excel tries to connect to a database and then write to the file system there is plenty that can go wrong so you might like to include two (or more) exception handlers here.

My usual approach in these situations is to include exception handling like this before deliberately breaking the code (eg. with an invalid database password) to see what sort of exceptions it throws up, and then expanding the exception handling into something more specialized.

We now have enough to run, so uncomment the first three function calls in main and run this:

Running the Program

python3.7 pgtoexcel_test.py

The console output is far too boring to bother showing but if you open the folder where you saved the source code you'll find a new spreadsheet has been created with the headings and data.

Apart from the bold column headings I have made no attempt to format the spreadsheet so it is pretty raw. However, there is plenty of scope for enhancement of both formatting and content, up to very complex spreadsheet-based reporting.