Z-Scores in Python

So, your child gets 78% in both physics and history. Both pretty good grades but as the reader of geeky blogs like this you believe the sciences are more important than the humanities and would have preferred your child to do better in physics than history.

However, we are not necessarily comparing like with like here: 78% in one subject is probably not equivalent to 78% in another. Rather than the absolute percentages we need to calculate and compare the Z-Scores which take into account the averages and ranges of the entire set of scores.

The Z-Score equivalents of percentage scores measure the difference between the mean percentage and the individual scores in units of standard deviation. (The standard deviation can be thought of as a measure of how much, on average, the individual scores vary from the mean.)

As an example, if the mean is 60 and the standard deviation is 10, the Z-Scores of 50%, 60% and 70% would be -1, 0 and 1 respectively.

The formula for calculating Z-Scores is as follows, where μ is the arithmetic mean (the "average" in everyday usage) and σ is the standard deviation.

Calculating Z-Scores

Z = (x - μ) / σ

For this project I will use two sets of fictitious grades with means and standard deviations of:

μσ
Physics64.8013.03
History73.608.77

We can see that the average physics score is a lot lower than the average history score so 78% in physics is already looking a lot better than the same percentage in history. Let's calculate the Z-Scores of 78% for each subject using the formula above.

Z-Scores of 78% in Physics

Z = (78 - 64.8) / 13.03 = 1.01

Z-Scores of 78% in History

Z = (78 - 73.6) / 8.77 = 0.5

So your kid is half a standard deviation above average in history but more than a whole standard deviation above average in physics. Yippee!

Coding

In this project I will write a simple function which takes a list of numbers and returns a dictionary containing the following:

  • The arithmetic mean of the data

  • The standard deviation of the number

  • A list of dictionaries containing the original data values and their equivalent Z-Scores

I'll also write a few lines of code to test the above function and print the results.

The project consists of the following two files which can be downloaded in a zip, or you can clone/download the Github repository if you prefer.

  • zscores.py
  • zscores_test.py

Source Code Links

ZIP File
GitHub

Let's look at zscores.py first.

zscores.py

import statistics


def calculate(data):

    """
    Returns a dictionary containing:
    The arithmetic mean of the data
    The population standard deviation of the data
    A list of dictionaries containing each data value
    and its corresponding Z-Score.
    """

    arithmetic_mean = statistics.mean(data)
    standard_deviation_population = statistics.pstdev(data)

    zscores = []

    for item in data:
        zscore = (item - arithmetic_mean) / standard_deviation_population
        zscores.append({"Value": item, "Z-Score": zscore})

    result = {"arithmetic_mean": arithmetic_mean,
              "standard_deviation_population": standard_deviation_population,
              "zscores": zscores}

    return result

Firstly we import statistics for its mean and pstdev functions. (You can of course use from statistics import mean, pstdev and just use the function names without the statistics. prefix, but my personal preference is to do it the way shown in the code.)

Next we set a couple of variables with the mean and standard deviation of the data, and create an empty list.

Then we iterate the data, calculating the Z-Score and then adding a dictionary containing the original value and its Z-Score to the list. You could combine the two lines but that would end up a long and messy bit of code.

Finally we combine the mean, standard deviation and Z-Scores list into a dictionary and return it. That's zscores.py finished so let's move on to zscores_test.py.

zscores_test.py

import zscores


def main():

    print("-----------------")
    print("| codedrome.com |")
    print("| Z-Scores      |")
    print("-----------------\n")

    physics_results = [38,40,43,43,49,54,55,57,61,62,62,63,64,64,65,66,66,67,68,68,69,75,76,78,78,79,80,82,85,87]
    history_results = [53,55,58,58,64,68,69,69,69,70,70,72,76,76,77,77,77,77,78,79,79,79,79,80,80,81,81,83,86,88]

    physics_zscores = zscores.calculate(physics_results)
    history_zscores = zscores.calculate(history_results)

    print_zscores("Physics", physics_zscores)
    print_zscores("History", history_zscores)


def print_zscores(subject, zscores):

    """
    Print the mean, standard deviation and z-scores
    in the zscores dictionary in a grid format.
    """

    width = 28

    print("-" * width)
    print("| {:^24} |".format(subject))
    print("-" * width)
    print("| Mean        {:>12.2f} |".format(zscores["arithmetic_mean"]))
    print("| Std.Dev.    {:>12.2f} |".format(zscores["standard_deviation_population"]))
    print("-" * width)
    print("|   Scores   |  Z-Scores   |")
    print("-" * width)

    for item in zscores["zscores"]:
        print("|{:>12.2f}| {:>12.2f}|".format(item["Value"], item["Z-Score"]))

    print("-" * width)


main()

After importing the zscores module we enter the main function, hard-coding a couple of sets of scores and then throwing them at zscores.calculate before finally printing the results with the print_zscores function.

The print_zscores function is a bit fiddly but quite straightforward, printing out the results of zscores.calculate in a table. Now let's run the program with:

Run

python3.7 zscores_test.py

The output is:

Program Output (partial)

-----------------
| codedrome.com |
| Z-Scores      |
-----------------

----------------------------
|         Physics          |
----------------------------
| Mean               64.80 |
| Std.Dev.           13.03 |
----------------------------
|   Scores   |  Z-Scores   |
----------------------------
|       38.00|        -2.06|
|       40.00|        -1.90|
|       43.00|        -1.67|
.
.
.
|       76.00|         0.86|
|       78.00|         1.01|
|       78.00|         1.01|
|       79.00|         1.09|
|       80.00|         1.17|
|       82.00|         1.32|
|       85.00|         1.55|
|       87.00|         1.70|
----------------------------

----------------------------
|         History          |
----------------------------
| Mean               73.60 |
| Std.Dev.            8.77 |
----------------------------
|   Scores   |  Z-Scores   |
----------------------------
|       53.00|        -2.35|
|       55.00|        -2.12|
|       58.00|        -1.78|
.
.
.
|       77.00|         0.39|
|       78.00|         0.50|
|       79.00|         0.62|
.
.
.
|       83.00|         1.07|
|       86.00|         1.41|
|       88.00|         1.64|
----------------------------

I have only shown part of the output but the two 78% scores we are interested in are shown in yellow (this is just for the post - the program output is all one colour) to demonstrate that the Z-Scores for 78% are as shown in the calculations carried out earlier.