Moving Averages in Python

Everyone understands averages, both their meaning and how to calculate them. However, there are situations, particularly when dealing with real-time data, when a conventional average is of little use because it includes old values which are no longer relevant and merely give a misleading impression of the current situation.

The solution to this problem is to use moving averages, ie. the average of the most recent values rather than all values, which is the subject of this post.

To illustrate the problem I will show part of the output of the program I'll write in this post. It shows the last rows of a set of server response times

Program Output - tail end of 1000 rows

------------------
| codedrome.com  |
| MovingAverages |
------------------

-------------------------------------------------
|          value|overall average| moving average|
-------------------------------------------------
.
.
.

|          46.00|          29.77|          31.00|
|          20.00|          29.76|          32.75|
|          38.00|          29.76|          37.25|
|          44.00|          29.78|          37.00|
|          49.00|          29.80|          37.75|
|          36.00|          29.80|          41.75|
|          11.00|          29.79|          35.00|
|          20.00|          29.78|          29.00|
|          40.00|          29.79|          26.75|
|          10.00|          29.77|          20.25|
|          47.00|          29.78|          29.25|
|          13.00|          29.77|          27.50|
|          24.00|          29.76|          23.50|
|          38.00|          29.77|          30.50|
|          31.00|          29.77|          26.50|
|          50.00|          29.79|          35.75|
|          32.00|          29.79|          37.75|
|          21.00|          29.78|          33.50|
|          42.00|          29.80|          36.25|
|         165.00|          29.93|          65.00|
|         256.00|          30.16|         121.00|
|         419.00|          30.55|         220.50|
|         329.00|          30.85|         292.25|
|         128.00|          30.94|         283.00|
-------------------------------------------------

Most times in the left hand column are between 10ms and 50ms and can be considered normal but the last few shoot up considerably. The second column shows overall averages which we might use to monitor the server for any problems. However, the large number of normal times included in these averages mean that although the server has slowed down considerably for the last few requests the averages have hardly risen at all and we wouldn't realise anything was wrong. The last column shows 4-point moving averages, or the averages of only the last four values. These of course do increase considerably and so alarm bells should start to ring.

Having explained both the problem and its solution, let's write some code. This project consists of the following files which can be downloaded as a zip or you can clone/download the Github repository if you prefer.

  • movingaverageslist.py
  • movingaverages_test.py

Source Code Links

ZIP File
GitHub

The movingaverageslist.py file implements a class which maintains a list of numerical values, and each time a new value is added the overall average and moving average up to that point are also calculated.

movingaverageslist.py

class MovingAveragesList(object):

    """
    This class implements a list to which numeric values can be appended.
    Doing so actually appends a dictionary containing three values:

    "value" - the value added

    "average" - the arithmetic mean of all values up to and
    including the current one

    "movingaverage" - the arithmetic mean of the specified
    number of previous values

    The underlying list can be accessed using objectname.data.
    """


    def __init__(self, points):

        """
        The points argument specifies how many previous values
        should be used to calculate each moving average.
        """


        self.data = []
        self.points = points


    def append(self, n):

        """
        Adds a dictionary of value, overall average and moving average
        to the list.
        """


        average = self.__calculate_overall_average(n)
        moving_average = self.__calculate_moving_average(n)
        self.data.append({"value": n,
                          "average": average,
                          "movingaverage": moving_average})


    def __calculate_overall_average(self, n):

        length = len(self.data)

        if length == 0:
            average = n
        else:
            average = (((self.data[length - 1]["average"]) *
                         length) + n) / (length + 1)

        return average


    def __calculate_moving_average(self, n):

        length = len(self.data)

        if length == 0:
            moving_average = n
        elif length <= self.points - 1:
            moving_average = (((self.data[length - 1]["average"]) *
                                length) + n) / (length + 1)
        else:
            moving_average = ((self.data[length - 1]["movingaverage"] * self.points) -
                              (self.data[length - self.points]["value"]) + n) / self.points

        return moving_average


    def __str__(self):

        """
        Create a grid from the data in the list.
        """


        items = []

        items.append("-" * 49 + "\n")
        items.append("|          value|overall average| moving average|\n")
        items.append("-" * 49 + "\n")

        for item in self.data:
            items.append("|{:15.2f}|{:15.2f}|{:15.2f}|\n"
                        .format(item["value"], item["average"], item["movingaverage"]))

        items.append("-" * 49)

        return "".join(items)

In __init__ we simply create an empty list, and set the points attribute, ie. the number of values used to calculate the average.

In the append method, the overall and moving averages are calculated using separate functions which I'll come to in a minute. Then a dictionary containing the new value and the two averages is appended to the list.

In __calculate_overall_average we don't need to add up all the values each time, we can just multiply the previous average by the count and then add the new value. This is then divided by the length + 1, ie. the length the list will be when the new value is added.

The __calculate_moving_average function uses a similar technique but is more complex as it has to allow for the list not yet having reached the length of the number of points. In this situation it just calculates the mean of whatever data the list has.

Lastly we implement __str__ which returns the data in a table format suitable for outputting to the console.

The MovingAveragesList class is now complete so let's put together a simple demo.

movingaverages_test.py

import random

import movingaverageslist


def main():

    print("------------------")
    print("| codedrome.com  |")
    print("| MovingAverages |")
    print("------------------\n")

    response_times_ms = populate_response_times()
    print(response_times_ms)

    # Quick demo of accessing the list directly.
    print(response_times_ms.data[-1])


def populate_response_times():

    """
    Create a MovingAveragesList object and populate it with
    random response times.
    """


    response_times_ms = movingaverageslist.MovingAveragesList(4)

    # Add a large number of normal times
    for t in range(1, 996):
        response_times_ms.append(random.randint(10, 50))

    # Add a few excessively long times
    for t in range(1, 6):
        response_times_ms.append(random.randint(100, 500))

    return response_times_ms


main()

In main we call populate_response_times to get a MovingAveragesList object with 1000 items, and then print the object. As we implemented __str__ in the class this will be called and therefore we'll see the table described above.

I have also added a line which prints the last item in the list just to show how to access the most recent value and averages. A possible enhancement would be to wrap this in a method to avoid rummaging around in the inner workings of the class.

The populate_response_times function creates a MovingAveragesList object with a points value of 4. This is probably too low for practical purposes but it does make manual testing easier!

It then adds a large number of "normal" values to it; remember that each time a value is added new overall and moving averages are also added. Then a few large numbers are added to simulate a server problem before we return the object.

Now we can run the program like this...

Running the program

python3.7 movingaverages_test.py

I won't repeat the output but you'll see 1000 rows of data whizzing up your console.

Possible Improvements

The MovingAveragesList class has been tailored to demonstrating the problem it solves and how it does it. In a production environment this are unnecessary and there are a few improvements which could make the class more efficient and useful.

  • We could drop the overall averages

  • Only the latest moving average could be kept

  • We could delete the oldest value each time a new one is added, just keeping a restricted number of the latest values

  • We could forget the list concept entirely and just keep a single moving average, updated from any new values added

  • We could include a threshold and function to be called if the threshold is exceeded, for example sending out emails if the server response time slows to an unacceptable level