Prime numbers have been understood at least since the Ancient Greeks, and possibly since the Ancient Egyptians. In modern times their study has intensified greatly due to their usefulness, notably in encryption, and because computers enable them to be calculated to a massively higher level than could be done by hand.
The best know (and according to Wikipedia still the most widely used) method for identifying them is the Sieve of Eratosthenes, which I will implement here in C.
The Sieve of Eratosthenes
The algorithm is described in full on Wikipedia but in summary the process is:
- Create a list of integers from 2 to the highest number you want to check, marking them all as prime
- Initialize a variable p to 2 (the lowest prime)
- Mark all multiples of p as composite (ie. non-prime)
- Set p to the next number marked as prime
- Repeat 3 and 4 until no subsequent primes are found.
The algorithm as it stands has some glaring inefficiencies and the Wikipedia article cited above does include some refinements and alternatives; there are also two or three obvious C-specific tweaks that could be made. Nevertheless, let's get stuck into a simple and straightforward implementation in C.
Writing the Code
Create a folder and within it create an empty file called sieveoferatosthenes.c. For some more complex projects I have split the code out into two or more .h and .c files, but this is quite simple so I'll put all the code in one file. Start out by typing or pasting the #includes, #defines and function prototypes into the empty file. You can download the source code or use the Github repository if you prefer.
Source Code Links
sieveoferatosthenes.c (part 1)
#include<stdio.h> #include<stdlib.h> #include<stdbool.h> #define RED "\x1B[31m" #define GRN "\x1B[32m" #define RESET "\x1B[0m" //-------------------------------------------------------- // FUNCTION PROTOTYPES //-------------------------------------------------------- bool* initialize_prime_flags(int max); void mark_multiples(bool* prime_flags, int max, int p); int find_next_prime(bool* prime_flags, int max, int current); void output_all(bool* prime_flags, int columncount, int max);
The strange looking #defines are in preparation for outputting composite and prime numbers in red and green respectively. This is a trick I found on StackOverflow and is used in the output_all function.
I'll describe the functions prototyped above in detail as and when we get to them, but basically they correspond to steps 1, 3 and 4 above, plus a function to output the results. Now let's move on to implementing the main function.
sieveoferatosthenes.c (part 2)
//-------------------------------------------------------- // FUNCTION main //-------------------------------------------------------- void main(void) { puts("-------------------------"); puts("| codedrome.com |"); puts("| Sieve of Eratosthenes |"); puts("-------------------------\n"); const int max = 100; bool* prime_flags; int p = 2; prime_flags = initialize_prime_flags(max); puts("Initial state: all marked as prime"); output_all(prime_flags, 10, max); puts("\nRunning Sieve of Eratosthenes..."); while(p != -1) { mark_multiples(prime_flags, max, p); p = find_next_prime(prime_flags, max, p); } puts("\nComposite numbers now identified"); output_all(prime_flags, 10, max); puts(""); free(prime_flags); }
Firstly we create a few variables: the maximum number we want to check for "primeness", an array of Booleans, and the current value of p. We then call initialize_prime_flags to get a data structure of Booleans of size max.
As you may have realised, although the Sieve of Eratosthenes is usually described as an algorithm for identifying prime numbers it is actually the opposite. It starts off with the assumption that all numbers are prime, and then identifies the ones that are not. The values of prime_flags are therefore initialized to true, and to emphasize this I have called output_all immediately after initialize_prime_flags to show that all the numbers are currently flagged as primes.
We then enter a while loop, calling functions to flag multiples of p as composite and then getting the next prime in the list. The latter function returns -1 if there are none left, so that value is used as the exit condition of the while loop.
The process is now complete so we just call output_all again to show which numbers are prime and which are composite.
Now we just need to implement the four functions called in main, starting with initialize_prime_flags.
sieveoferatosthenes.c (part 3)
//-------------------------------------------------------- // FUNCTION initialize_prime_flags //-------------------------------------------------------- bool* initialize_prime_flags(int max) { bool* prime_flags; prime_flags = malloc(sizeof(bool) * (max + 1)); for(int i = 2; i <= max; i++) { prime_flags[i] = true; } return prime_flags; }
This function is quite straightforward - just grab enough memory for the required number of bools and then set them all to true. The indexes of these Booleans correspond to the actual numbers they are flagging. This means that as we don't use 0 and 1 there are two Boolean's worth of memory wasted, but I hope you will agree that the convenience of exact indexing outweighs a tiny amount of wasted memory!
Next up is the output_all function.
sieveoferatosthenes.c (part 4)
//-------------------------------------------------------- // FUNCTION output_all //-------------------------------------------------------- void output_all(bool* prime_flags, int columncount, int max) { printf(GRN "PRIME " RESET); printf(RED "COMPOSITE\n" RESET); printf(" "); for(int i = 2; i <= max; i++) { if(prime_flags[i] == false) { printf(RED "%3d " RESET, i); } else { printf(GRN "%3d " RESET, i); } if(i % columncount == 0) { puts(""); } } }
This just iterates the data, outputting the values in red or green to indicate their status as composite or prime - note we start off with a key to the colour coding.
The function takes a columncount argument so if we have run out of columns we print a new line to go back to the first column. This is calculated using the modulus (remainder) operator %, for example if we asked for 10 columns and have just printed value 30, 30 % 10 = 0 so we call puts("") with an empty string so it will just print a new line. For any value of i which is not a multiple of 10 the modulus will be non-zero so puts is not called.
Now let's move on to the core function of this entire project: mark_multiples.
sieveoferatosthenes.c (part 5)
//-------------------------------------------------------- // FUNCTION mark_multiples //-------------------------------------------------------- void mark_multiples(bool* prime_flags, int max, int p) { int i; int multiplier = 2; i = p * multiplier; if(i <= max) { while(i <= max) { prime_flags[i] = false; multiplier++; i = p * multiplier; } } }
Firstly we create a couple of variables, one for the index i, and another for the multiplier set to 2. i is then set to p * multiplier to give us the first value to flag as non-prime. After checking that i hasn't blown through the end of the data we enter a while loop, setting the flag to false, incrementing the multiplier, and then calculating the next value of i or in other words the next value to be flagged as non-prime. The loop continues until we run past max.
Basically what we are doing here is multiplying the current prime by 2, 3, 4 etc. and marking the result as non-prime until we get past max. Note this is one of the inefficiencies in the basic algorithm - we may be marking numbers as non-prime which have already been flagged as multiples of lower primes. For example, 6 would be marked as non-prime because it is a multiple of 2, but would be marked non-prime again as a multiple of 3.
Finally we will implement find_next_prime.
sieveoferatosthenes.c (part 6)
//-------------------------------------------------------- // FUNCTION find_next_prime //-------------------------------------------------------- int find_next_prime(bool* prime_flags, int max, int current) { for(int i = current + 1; i <= max; i++) { if(prime_flags[i] == true) { return i; } } return -1; }
This function searches through the flags for the next prime, starting at the one after the current. If one is found it is returned; if the loop completes without a prime being found the function returns -1. (Remember that in main we use -1 as the terminating condition for the while loop.)
Now we can compile - enter this in terminal...
Compiling
gcc sieveoferatosthenes.c -std=c11 -o sieveoferatosthenes
...and then run with this command.
Running
./sieveoferatosthenes
The program output looks like this for max = 100. First we see all the numbers in green as they are initially marked as prime, then the numbers again with the composites in red. You could call output_all within the loop in main if you want to see the progress of each iteration, although it might be rather tedious for this many numbers.
Program Output
------------------------- | codedrome.com | | Sieve of Eratosthenes | ------------------------- Initial state: all marked as prime PRIME COMPOSITE 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 Running Sieve of Eratosthenes... Composite numbers now identified PRIME COMPOSITE 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100