The C language has a reputation for being difficult to learn and to code in. I think this is unfair as it is actually a very small and simple language, but most of the perceived difficulty with using C comes from its memory management, or rather lack of. Only the smallest and simplest programs can get away with using auto variables: sooner or later you are going to have to use dynamic memory, opening yourself up to an extensive range of tricky bugs. There's no foolproof way to get round this, but you can catch most bugs before they wreak havoc in production with a brilliant little program called Valgrind.
Valgrind
I referred to Valgrind as a "little program" above but it's actually quite complex and sophisticated. If you want to investigate it in detail this is the website valgrind.org. However, the purpose of this article is just to show how to use it to catch a few of the most common memory management problems and to demonstrate that it is easy to incorporate it into the workflow of any C program you write.
Firstly though you need to install it. On Linux you just need to open up a terminal and enter
Installing Valgrind on Linux
sudo apt-get install valgrind
Once you have entered your password Valgrind will install quickly. You can check it has installed correctly by entering
Checking Valgrind Version
valgrind --version
(I wish developers would standardise on command line switches. I tried -v, -V, --v and --V and none of them worked. I had to Google "valgrind version" to find out that you need to use --version!)
The Project
This program will consist of one C source code file containing a few functions, each of which will commit some kind of memory management sin. Fear not though for our sins will find us out, or rather Valgrind will find us out.
Create a new folder and within it create a file called usingvalgrind.c. You can download it as a zip or use the Github repository if you prefer.
Source Code Links
usingvalgrind.c (part 1)
#include<stdio.h> #include<stdlib.h> #include<math.h> //-------------------------------------------------------- // FUNCTION PROTOTYPES //-------------------------------------------------------- void uninitialized_memory(); void memory_leak(); void use_freed_memory(); void overshoot_memory(); void realloc_memory(); //-------------------------------------------------------- // FUNCTION main //-------------------------------------------------------- int main(int argc, char* argv[]) { puts("-----------------"); puts("| codedrome.com |"); puts("| valgrind |"); puts("-----------------\n"); uninitialized_memory(); memory_leak(); use_freed_memory(); overshoot_memory(); realloc_memory(); return EXIT_SUCCESS; }
That's all quite straightforward - just a few function prototypes and a call of each of those functions in main. It should be clear from the function names just what memory management problem each causes. Now let's get on to the functions themselves. As you can see each has some commented out code which, if uncommented, will solve the problem. Leave it commented for the time being though.
usingvalgrind.c (part 2)
//-------------------------------------------------------- // FUNCTION uninitialized_memory //-------------------------------------------------------- void uninitialized_memory() { puts("Using uninitialized memory"); int* data; // uncomment to fix error //data = malloc(sizeof(int) * 32); data[0] = 123; printf("%d\n\n", data[0]); // uncomment to fix error //free(data); } //-------------------------------------------------------- // FUNCTION memory_leak //-------------------------------------------------------- void memory_leak() { puts("Memory leak"); int* data; data = malloc(sizeof(int) * 32); data[0] = 234; printf("%d\n\n", data[0]); // uncomment to fix error //free(data); } //-------------------------------------------------------- // FUNCTION use_freed_memory //-------------------------------------------------------- void use_freed_memory() { puts("Using freed memory"); int* data; data = malloc(sizeof(int) * 32); data[0] = 345; // move to after printf to fix error free(data); printf("%d\n\n", data[0]); } //-------------------------------------------------------- // FUNCTION overshoot_memory //-------------------------------------------------------- void overshoot_memory() { puts("Overshooting memory"); int* data; data = malloc(sizeof(int) * 32); // change indexes to between 0 and 31 to fix error data[32] = 456; printf("%d\n\n", data[32]); free(data); } //-------------------------------------------------------- // FUNCTION realloc_memory //-------------------------------------------------------- void realloc_memory() { puts("realloc memory"); int* data = NULL; for(int i = 0; i < 32; i++) { data = realloc(data, (sizeof(int)) * (i + 1)); data[i] = pow(i, 2); printf("%d\t%d\n", i, data[i]); } // uncomment to fix error //free(data); }
In the function uninitialized_memory we have forgotten to call malloc, and are therefore trespassing on memory which doesn't belong to us.
In memory_leak we have forgotten to call free, therefore hogging a block of redundant memory until either the program terminates or, if we make a habit of that sort of thing, actually crashes.
In use_freed_memory we have called free but then carry on using the memory anyway. This might not matter for a short time but sooner or later will cause problems if it is given back to us or another program for a different purpose.
The overshoot_memory function is well behaved in terms of calling malloc and free, but tries to write and read memory outside of that which it has been given.
Finally, just to check that Valgrind works just as well with realloc as it does with malloc, I have included a function realloc_memory which forgets to call free on realloc'ed memory rather than malloc'ed memory.
That's the coding finished so we can now compile and run it. In the terminal enter the following.
Compile and Run (without Valgrind)
gcc usingvalgrind.c -g -std=c11 -lm -o usingvalgrind ./usingvalgrind
Note that I have included a -g switch to tell the compiler to include debug information in the executable. This will enable Valgrind to report back the line numbers in the source code where any errors occur. Obviously that makes the executable a bit bigger so should be removed for production builds.
However, first time round we are not using Valgrind, just running the program as normal. This is the output.
Program Output
----------------- | codedrome.com | | valgrind | ----------------- Using uninitialized memory 123 Memory leak 234 Using freed memory 345 Overshooting memory 456 realloc memory 0 0 1 1 2 4 3 9 4 16 5 25 6 36 7 49 8 64 9 81 10 100 11 121 12 144 13 169 14 196 15 225 16 256 17 289 18 324 19 361 20 400 21 441 22 484 23 529 24 576 25 625 26 676 27 729 28 784 29 841 30 900 31 961
So what went wrong as result of all our memory management errors? Well, nothing at all. At least nothing you would notice when you run the program. It didn't crash or even display an error message, and the output is exactly what we expected, although I cannot guarantee this will be the case for you. It is very important to realise that bugs such as the ones we have deliberately introduced do not necessarily make themselves felt immediately.
However, we know that the code is actually full of bugs, ones that will cause serious problems if we put the code into production. So let's run the program again but this time under the umbrella of Valgrind.
Using Uninitialized Memory
We'll do this one function at a time, so in main comment out all the functions except uninitialized_memory, go back to the terminal, and run the following.
Compile and Run (with Valgrind)
gcc usingvalgrind.c -g -std=c11 -lm -o usingvalgrind valgrind --track-origins=yes --leak-check=full ./usingvalgrind
Valgrind slows down program execution to a crawl, so even with a short program like this you might notice a delay of a second or two before anything happens. When you do see some output, it will include the following.
Program Output (partial)
==10764== Use of uninitialised value of size 8 ==10764== at 0x40079D: uninitialized_memory (usingvalgrind.c:43) ==10764== by 0x40077F: main (usingvalgrind.c:22) ==10764== Uninitialised value was created by a stack allocation ==10764== at 0x400787: uninitialized_memory (usingvalgrind.c:35) ==10764== ==10764== Use of uninitialised value of size 8 ==10764== at 0x4007A7: uninitialized_memory (usingvalgrind.c:45) ==10764== by 0x40077F: main (usingvalgrind.c:22) ==10764== Uninitialised value was created by a stack allocation ==10764== at 0x400787: uninitialized_memory (usingvalgrind.c:35)
Valgrind tells us we are using uninitialized memory, as well as the function name, source file name and line number. (I have shown the salient parts of the message in red, but Valgrind shows everything in white.) The single error of forgetting to call malloc manifests itself twice, first when we write to the memory and then again when we read it.
The error isn't in line 43 or 45, but we know it is a "Use of uninitialised value" so we just need to track backward from these lines a short way to realise we declared a variable called data but forgot to allocate any memory to it. The solution is already in place, just uncomment the malloc and free lines, then run the above commands to build and run under Valgrind again.
Program Output (partial)
ERROR SUMMARY: 0 errors from 0 contexts
Now we see "0 errors". Good! Now let's carry out the same process with the next function, memory_leak.
Memory Leak
In main comment out uninitialized_memory and uncomment memory_leak. Run the program with Valgrind again and it will spot that we have not called free.
Program Output (partial)
==3692== in use at exit: 128 bytes in 1 blocks ==3692== total heap usage: 1 allocs, 0 frees, 128 bytes allocated ==3692== ==3692== 128 bytes in 1 blocks are definitely lost in loss record 1 of 1
You know what you need to do - uncomment the line which calls free and run again. This time we get.
Program Output (partial)
==4249== All heap blocks were freed -- no leaks are possible
Using Freed Memory
Another problem solved. Now edit main to run use_freed_memory, and run with Valgrind. This time we get.
Program Output (partial)
==4467== Invalid read of size 4 ==4467== at 0x400845: use_freed_memory (usingvalgrind.c:86) ==4467== by 0x40077F: main (usingvalgrind.c:24) ==4467== Address 0x5502040 is 0 bytes inside a block of size 128 free'd
This tells us that we are trying to use freed memory, so move the call to free to after the printf, and run again. As you can see this fixes the problem.
Program Output (partial)
==4790== ERROR SUMMARY: 0 errors from 0 contexts
Overshooting Memory
Three down, two to go. Edit main to run overshoot_memory, and when run with Valgrind you'll see.
Program Output (partial)
==5087== Invalid write of size 4 ==5087== at 0x400882: overshoot_memory (usingvalgrind.c:101) ==5087== by 0x40077F: main (usingvalgrind.c:25) ==5087== Address 0x55020c0 is 0 bytes after a block of size 128 alloc'd ==5087== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==5087== by 0x400875: overshoot_memory (usingvalgrind.c:98) ==5087== by 0x40077F: main (usingvalgrind.c:25) ==5087== ==5087== Invalid read of size 4 ==5087== at 0x400890: overshoot_memory (usingvalgrind.c:103) ==5087== by 0x40077F: main (usingvalgrind.c:25) ==5087== Address 0x55020c0 is 0 bytes after a block of size 128 alloc'd
Edit overshoot_memory to use an index between 0 and 31, and run again. Again we see the familiar output.
Program Output (partial)
==5678== ERROR SUMMARY: 0 errors from 0 contexts
Testing Valgrind with Realloc
We have seen that if we call malloc once Valgrind will check whether we call free once. However, realloc is a bit different: we can call it any number of times but of course only need to call free once. The final function checks that Valgrind can handle this. Edit main to run realloc_memory, and run again. The output is interesting.
Program Output (partial)
==6446== in use at exit: 128 bytes in 1 blocks ==6446== total heap usage: 32 allocs, 31 frees, 2,112 bytes allocated ==6446== ==6446== 128 bytes in 1 blocks are definitely lost in loss record 1 of 1 ==6446== at 0x4C2CE8E: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==6446== by 0x4008F6: realloc_memory (usingvalgrind.c:119) ==6446== by 0x40077F: main (usingvalgrind.c:26) ==6446== ==6446== LEAK SUMMARY: ==6446== definitely lost: 128 bytes in 1 blocks
Although our code does not call free, you can see that behind the scenes free is actually called 31 times, in all but one of the realloc calls. Uncomment the call to free, run again, and you will see just what you expected to see.
Program Output (partial)
==6922== ERROR SUMMARY: 0 errors from 0 contexts
I hope this post has removed much of the memory management angst associated with C programming - just remember to include a run of your program with Valgrind in your workflow.