Geometric and Harmonic Means in Python

The most commonly known and used statistical mean is the arithmetic mean, calculated by adding all values and dividing the result by the number of values. The arithmetic mean is one of a "family" of three means called the Pythagorean means, the other two being the geometric mean and the harmonic mean. In this post I will explain when you might need to use these alternatives and then show how to calculate them using Python.

The Geometric Mean

The geometric mean is the nth root of the product of n values. This means that to calculate it we need to carry out the following steps:

  • Calculate the cumulative product of the values, ie. multiply the first by the second, multiply the result by the third and so on until we get to the end of the data

  • Calculate the nth root of the product

This looks simple at first - only slightly more complicated that calculating the arithmetic mean. However, even if we only have a small set of relatively low numbers the cumulative product can very quickly grow to be huge, breaking even very large data types such as a 128 bit quadruple-precision floating-point number.

Fortunately there is a simple way to get round the problem:

  • Calculate the logarithms of all values

  • Calculate the arithmetic mean of these logarithms

  • Raise the base of the logarithms to the mean

This is the method I will be using in this project.

But why would you use the geometric mean instead of the familiar arithmetic mean? Let's look at an example.

The following table shows the rates of increase of house prices over 10 years, for example 6.5% is shown as 1.065. Also shown are the values of a house initially worth £250,000.

Rate of increaseHouse value
1.065266,250.00
1.022272,107.50
1.015276,189.11
1.028283,922.41
1.035293,859.69
1.048307,964.96
1.055324,903.03
1.042338,548.96
1.044353,445.11
1.039367,229.47

The arithmetic mean of the rates of increase is 1.0393. If we apply this rate 10 times to the original 250,000 we get 367,577.81, not the correct value of 367,229.47.

However, if we calculate the geometric mean which is 1.0392014685133 and apply that 10 times to the initial value we get the correct final value of 367,229.47. I have used these values in the Python code.

The Harmonic Mean

The harmonic mean is the reciprocal of the arithmetic mean of the reciprocals of the data. (The reciprocal of a number n is 1/n or n-1 - the two are equivalent.)

To calculate it we therefore need to carry out the following steps:

  • Calculate the reciprocals of our values

  • Calculate the arithmetic mean of these reciprocals

  • Calculate the reciprocal of the mean

Now let's look at the reasons you might use the harmonic mean, using another example. Consider the following data which represent the average speeds and journey times of twelve 100km journeys.

Average speed (kph)Journey time (hours)
422.38
631.59
781.28
482.08
711.41
631.59
731.37
442.27
681.47
631.59
731.37
601.67
Total time20.07
Total distance1200
Overall average speed59.8

Here I have calculated the overall average speed by dividing the total distance by the total time thus:

1200 / 20.07 = 59.8

However, you might try to calculate the overall average speed by taking the arithmetic mean of the individual average speeds. (These might be the only data you have so the previous method might not be possible.)

746 / 12 = 62.17

This result is what a professional statistician might describe as "wrong". Not wildly wrong in this case, but wrong all the same. As you may have guessed, to get the result which a professional statistician might describe as "right" we just need to use the harmonic mean instead.

I have used this set of data in the code so when we run it we'll see that we get the correct result of 59.8

The Project

This project consists of a single file, means.py, which you can download as a zip, or clone/download the Github repository if you prefer.

Source Code Links

ZIP File
GitHub

I am using NumPy as it simplifies the calculations and most people needing to calculate the geometric or harmonic mean are likely to be using NumPy anyway. However, if you want to use plain Pythons lists it's straightforward to modify the code accordingly. SciPy has functions for geometric and harmonic means, scipy.stats.mstats.gmean and scipy.stats.hmean respectively. Even if you use SciPy you might find this article useful.

If you want to run the code as it is but haven't yet got into NumPy then you just need to install it with pip and you are ready to go. These are the links to NumPy on pypi.org, and NumPy's own site.

NumPy on pypi.org

numpy.org

The Source Code

Let's now look at the source code.

means.py

import math
import numpy as np


def main():

    print("--------------------------------")
    print("| codedrome.com                |")
    print("| Geometric and Harmonic Means |")
    print("--------------------------------\n")

    housePriceIncreases = getHousePriceIncreases()
    print("House Price Increases\n---------------------")
    print(housePriceIncreases)
    print()

    am = arithmeticMean(housePriceIncreases)
    print("Arithmetic mean (wrong)\n-----------------------")
    print(am)
    print()

    gm = geometricMean(housePriceIncreases)
    print("Geometric mean (correct)\n------------------------")
    print(gm)
    print()

    print("=============================================================\n")

    averageSpeeds = getAverageSpeeds()
    print("Average speeds\n--------------")
    print(averageSpeeds)
    print()

    am = arithmeticMean(averageSpeeds)
    print("Arithmetic mean (wrong)\n-----------------------")
    print(am)
    print()

    hm = harmonicMean(averageSpeeds)
    print("Harmonic mean (correct)\n-----------------------")
    print(hm)
    print()


def getHousePriceIncreases():

    return np.array([1.065, 1.022, 1.015, 1.028, 1.035, 1.048, 1.055, 1.042, 1.044, 1.039], dtype=np.float64)


def getAverageSpeeds():

    return np.array([42, 63, 78, 48, 71, 63, 73, 44, 68, 63, 73, 60], dtype=np.float64)


def arithmeticMean(data):

    mean = np.mean(data)

    return mean


def geometricMean(data):

    if np.any(data <= 0):
        raise ValueError("All values must be positive")

    # Calculate base 10 logarithms of data
    # Can use any base
    logarithms = np.log10(data)

    # Calculate arithmetic mean of logarithms
    logarithms_mean = np.mean(logarithms)

    # Calculate geometric mean
    # Base must be same as that used for
    # calculating logarithms, here 10 is used
    geometric_mean = math.pow(10, logarithms_mean)

    # The calculation has been split into 3 lines
    # for clarity but these can be combined into
    # a single line thus:
    # geometric_mean = math.pow(10, np.mean(np.log10(data)))

    return geometric_mean


def harmonicMean(data):

    if np.any(data <= 0):
        raise ValueError("All values must be positive")

    # Calculate reciprocals of values
    reciprocals = np.power(data, -1)

    # Calculate arithmetic mean of reciprocals
    reciprocals_mean = np.mean(reciprocals)

    # Calculate harmonic mean
    harmonic_mean = math.pow(reciprocals_mean, -1)

    # The calculation has been split into 3 lines
    # for clarity but these can be combined into
    # a single line thus:
    # harmonic_mean = math.pow(np.mean(np.power(data, -1)), -1)

    return harmonic_mean


main()

main

In main we simply call functions to get and print NumPy arrays of hard-coded data. Then we call functions to get various means which are then printed.

getHousePriceIncreases and getAverageSpeeds

Here we just return NumPy arrays of data. Note that the data type is specified as 64 bit float.

arithmeticMean

I have included the arithmetic mean to complete the trio of Pythagorean means, using NumPy's existing method. We can then show the means that with our datasets give incorrect results.

geometricMean

Firstly we need to check that all the numbers are positive which is very easy using NumPy's any method. If there are any invalid values we raise an error.

To calculate the geometric mean we first create a NumPy array containing the base 10 logarithms of the data values. (You can use any base as long as you are consistent throughout.) Next we grab the arithmetic mean of the logarithms, finally calculating the geometric mean by raising 10 (or whatever logarithm base you use) to the power of that mean.

I have split the calculation into three steps to make the code easier to follow but have also included a commented out single line version which does exactly the same thing in a more compact form.

harmonicMean

Again we check for non-positive values before creating a NumPy array of reciprocals of the data values. We then get the arithmetic mean of these reciprocals and finally calculate the reciprocal of the mean.

As with the geometric mean I have included a compact one-line version of the calculation.

Running the Program

Now run the program with this command:

Running the program

python3 means.py

This is the output.

Program Output

--------------------------------
| codedrome.com                |
| Geometric and Harmonic Means |
--------------------------------

House Price Increases
---------------------
[1.065 1.022 1.015 1.028 1.035 1.048 1.055 1.042 1.044 1.039]

Arithmetic mean (wrong)
-----------------------
1.0393

Geometric mean (correct)
------------------------
1.0392014685132656

=============================================================

Average speeds
--------------
[42. 63. 78. 48. 71. 63. 73. 44. 68. 63. 73. 60.]

Arithmetic mean (wrong)
-----------------------
62.166666666666664

Harmonic mean (correct)
-----------------------
59.801457175118436

The datasets shown here are the same as the ones used in the examples above, so you can compare the arithmetic means which in these cases give incorrect results with the geometric and harmonic means which give correct results in these situations.