When creating a dynamic data structure it is tempting to allocate just the amount of extra memory you need to add each item. However, malloc, realloc or calloc are quite expensive in terms of resources so for situations where many items are likely to be added it is more efficient to allocate a large block of memory in one go. This can be gradually used up, and then another block allocated when necessary. In this post I will write a simple demonstration of this principle, complete with monitoring code to verify that we are in fact improving efficiency.
This project is not intended to be a complete implementation of a data structure, just a demonstration or proof of concept. Therefore it will consist of a single source code file with just a couple of functions in addition to the main function. Create a new folder somewhere and in it create a new empty file called blockmalloc.c. You can download the source code as a zip or from Github if you prefer.
Source Code Links
Open the source code file and enter this code.
blockmalloc.c (part 1)
#include<stdio.h> #include<time.h> #include<stdlib.h> //-------------------------------------------------------- // FUNCTION PROTOTYPES //-------------------------------------------------------- long allocate_singles(); long allocate_blocks(); //-------------------------------------------------------- // FUNCTION main //-------------------------------------------------------- int main(int argc, char* argv[]) { puts("-----------------"); puts("| codedrome.com |"); puts("| Block Malloc |"); puts("-----------------\n"); const long test_count = 64; long total_ticks_singles = 0; long total_ticks_blocks = 0; // singles for(int i = 1; i <= test_count; i++) { total_ticks_singles += allocate_singles(); } printf("mean ticks of %ld calls to allocate_singles: %ld\n", test_count, total_ticks_singles / test_count); // blocks for(int i = 1; i <= test_count; i++) { total_ticks_blocks += allocate_blocks(); } printf("mean ticks of %ld calls to allocate_blocks: %ld\n", test_count, total_ticks_blocks / test_count); return EXIT_SUCCESS; }
We have prototypes for two functions, allocate_singles and allocate_blocks. The first will call realloc a large number of times to individually allocate memory for extra doubles. The second will allocate large chunks of memory, only calling realloc again when this memory is used up. In main we will call these a number of times. (I have initialized test_count to 64 - I have a slight obsession with powers of 2.) The two functions return the number of ticks they took to complete their main task, and we will keep a total of these to calculate the mean. Due to certain vagaries it is best to repeat tests such as this a number of times and take an average.
Still in main, we do a couple of for loops, one for each function, printing out the mean ticks taken by each. Obviously we are anticipating and hoping that the second set of function calls will be a lot faster than the first.
Let's start to write the functions themselves, starting off with allocate_singles.
blockmalloc.c (part 2)
//-------------------------------------------------------- // FUNCTION allocate_singles //-------------------------------------------------------- long allocate_singles() { double* data = NULL; clock_t StartTicks; clock_t FinishTicks; StartTicks = clock(); for(int i = 0; i < 10000; i++) { data = realloc(data, sizeof(double) * (i + 1)); } FinishTicks = clock(); free(data); return FinishTicks - StartTicks; }
This is a fairly straightforward function. We have a double pointer which we will dynamically allocate memory to, and a couple of variables to record start and finish ticks. Then, within a for loop, we individually allocate memory for 10,000 doubles. Yes, we are actually calling realloc 10,000 times! This is not good. We then set FinishTicks, call free, and return the difference between StartTicks and FinishTicks. I have not done anything with the memory as I don't want to distort the timings with anything not relevant to the realloc calls.
Let's move on to the allocate_blocks function.
blockmalloc.c (part 3)
//-------------------------------------------------------- // FUNCTION allocate_blocks //-------------------------------------------------------- long allocate_blocks() { double* data = NULL; clock_t StartTicks; clock_t FinishTicks; int unused_memory = 0; int allocated_memory = 0; int block_size = 1000; StartTicks = clock(); for(int i = 0; i < 10000; i++) { if(unused_memory == 0) { allocated_memory+= block_size; data = realloc(data, allocated_memory * sizeof(double)); unused_memory = block_size; } unused_memory--; } FinishTicks = clock(); free(data); return FinishTicks - StartTicks; }
This does essentially the same task as allocate_singles, ie. gets enough memory for 10,000 doubles, but does so 1,000 at a time so realloc is only called 10 times.
We start off with three more variables: unused_memory, or how much allocated memory has not yet been used, allocated_memory, the total amount of memory realloc'ed, both used and unused, and block_size, the number of extra doubles we will allocate memory for each time we call realloc. (Note that these refer to numbers of variables not bytes.)
We then enter a similar for loop as used in allocate_singles, but we check if unused_memory is 0. Only if it is do we call realloc, also increasing allocated_memory and resetting unused_memory. We can go ahead and use up another double's worth of the already-allocated memory, decrementing unused_memory as we do so. When unused_memory hits 0 realloc will be called again on the next iteration of the loop if there is one.
So that's the coding finished. Let's compile and run it. Open your Terminal and run the following.
Compile and Run
gcc blockmalloc.c -std=c11 -lm -o blockmalloc ./blockmalloc
You should see something like this.
Program Output
----------------- | codedrome.com | | Block Malloc | ----------------- mean ticks of 64 calls to allocate_singles: 280 mean ticks of 64 calls to allocate_blocks: 40
You can easily see that in this particular case allocate_blocks is 7 times faster. I ran it a number of times and got similar results each time, although the for loop in main should even out any random variation.
As I said at the beginning this is only a demonstration of concept rather than a full solution, but hopefully it will be a reminder to think carefully about all aspects of the dynamic memory process beyond just remembering to call free.