Why does printf() cause a memory leak in C?

2024-04-17

6 minute read

c , columbia

This article only applies to running compiled C or C++ programs on Linux with GNU libc.

introduction

In the introductory systems programming class I TA’d for many years, a mantra was drilled into student’s heads as if it were from the gospel: your program should never leak memory. We’d force students to hunt down memory leaks from fopen’d files and library functions. This was the correct thing to teach, but there’s a long-running debate over whether its really necessary to clean up all memory, especially when a program is exiting.

The argument distinguishes between two types of memory leaks. There’s the type of memory leak that grows as your program keeps running. Maybe you’re allocating some buffer in a loop but don’t free at the end of that loop. Pretty much everyone agrees this is bad.

But if you’re allocating memory that isn’t going to keep getting bigger through the lifetime of your program - maybe some buffer that’ll be reused throughout your program - then don’t free it. It just adds clutter and may be overly complex to properly free, especially if you have some weird data structure that C is morally opposed to. This is because the operating system really should be doing this work for you. When a process exits, all memory that was in use should be reclaimed by the operating system automatically, and it doesn’t matter if you free’d it or not beforehand. These memory-truthers argue that calling free rarely actually returns memory to the operating system and wastes CPU cycles among other precious hardware resources.

We don’t teach this, so when we started to see memory leaks reported by valgrind in student’s code because of a simple printf call, something was amiss.

where’s that memory?

One of the first things taught about printf is that it’s buffered. In order to buffer stdout, printf will malloc some memory to be used as a buffer and will hopefully free it before the program ends. We can see this in action with a few simple programs:

1#include <stdio.h>  
2int main(void) {  
3    printf("hello ap world\n");  
4}

When this is run with valgrind:

1$ valgrind --leak-check=full ./printf-simple  
2...
3HEAP SUMMARY:  
4    in use at exit: 0 bytes in 0 blocks  
5  total heap usage: 1 allocs, 1 frees, 1,024 bytes allocated  
6  
7All heap blocks were freed -- no leaks are possible

Despite the fact that this program didn’t call malloc, we see that 1,024 bytes were allocated and freed. This was printf mallocing and freeing the buffer that it uses internally.

We can see that there is no buffer malloc’d when writing to stderr, since stderr is unbuffered:

1#include <stdio.h>  
2int main(void) {  
3    fprintf(stderr, "hello ap world\n");  
4}

Running this:

1$ valgrind --leak-check=full ./fprintf-stderr  
2...  
3HEAP SUMMARY:  
4    in use at exit: 0 bytes in 0 blocks  
5  total heap usage: 0 allocs, 0 frees, 0 bytes allocated  
6  
7All heap blocks were freed -- no leaks are possible

This all seems fine, but we run into problems when a program doesn’t terminate gracefully. We terminate these program with Ctrl+C, which sends an interrupt signal to the process. This is a very ungraceful end for a process – it doesn’t do much of the cleanup work that would usually happen on exit and almost immediately ends the process. As it turns out one of the cleanup steps that is missed is to call free on the buffers that are malloc’d by printf and its family of functions.

A simple program to demonstrate this is one that calls printf and then spins on an infinite loop:

1#include <stdio.h>
2int main(void) {
3    printf("hello ap world\n");
4    while (1)
5        ;
6}

 1$ valgrind --leak-check=full ./printf-spin  
 2...  
 3 Process terminating with default action of signal 2 (SIGINT)  
 4    at 0x109160: main (in /home/jc5526/printf_spin)
 5  
 6 HEAP SUMMARY:  
 7     **in use at exit: 1,024 bytes in 1 blocks**  
 8   total heap usage: 1 allocs, 0 frees, 1,024 bytes allocated  
 9  
10LEAK SUMMARY:  
11   definitely lost: 0 bytes in 0 blocks  
12   indirectly lost: 0 bytes in 0 blocks  
13     possibly lost: 0 bytes in 0 blocks  
14   still reachable: 1,024 bytes in 1 blocks  
15        suppressed: 0 bytes in 0 blocks

Because of this ungraceful exit, we’ve leaked the buffer that printf is supposed to clean up. Obviously there’s nothing we can do about this, since printf is supposed to call malloc and free transparently.

`glibc` shenanigans

It might seem like there’s just a missing call to free here, but as it turns out that’s not at all the case. It goes back to the argument discussed before - should libc really be calling free on this buffer? Is it just a waste of time, if the buffer will be free’d anyways by the operating system?

The libc developers evidently took that view, and optimized printf by making it leak its buffer. In reality, there’s some magic happening behind the scenes to give you the illusion of not leaking memory. When you run a program that uses printf normally, without a memory checking program like valgrind, printf will malloc a buffer and use it. It doesn’t bother to call free on this buffer, but the developers of glibc recognized that programs like valgrind exist, which would normally report leaked memory due this implementation of printf.

Their solution is to provide a function that explicitly frees any buffers that may have been created by printf or any other glibc function. This function is __libc_freeres, and it’ll get called by valgrind right before a program ends. Note that this is something that valgrind has to explicitly call – it won’t be called when you’re running your program normally. In other words, our programs don’t leak memory when run with valgrind, but do leak memory when run normally.

Valgrind provides a flag to disable this extra functionality. You can try running valgrind with the flag --run-libc-freeres=no, as I do here:

1#include <stdio.h>  
2int main(void) {  
3    printf("hello ap world\n");  
4}

 1$ valgrind --leak-check=full --run-libc-freeres=no ./hello  
 2==1094309== Command: ./hello  
 3==1094309==  
 4hello ap world  
 5==1094309==  
 6==1094309== HEAP SUMMARY:  
 7==1094309==     in use at exit: 1,024 bytes in 1 blocks  
 8==1094309==   total heap usage: 1 allocs, 0 frees, 1,024 bytes allocated  
 9==1094309==  
10==1094309== LEAK SUMMARY:  
11==1094309==    definitely lost: 0 bytes in 0 blocks  
12==1094309==    indirectly lost: 0 bytes in 0 blocks  
13==1094309==      possibly lost: 0 bytes in 0 blocks  
14==1094309==    still reachable: 1,024 bytes in 1 blocks  
15==1094309==         suppressed: 0 bytes in 0 blocks

Look at that! The most basic C program, hello world, actually leaks memory!

It’s not just printf that leaks memory - many glibc functions do this internally. As it turns out, when valgrind handles a program exit because of the SIGINT signal it doesn’t call __libc_freeres. This is relatively new behavior that changed between semesters of our class, which is why it took us by surprise.

The reasoning is in this bug report: when a program receives a fatal signal, such as one it doesn’t handle like SIGINT, valgrind terminates the program. Before termination, valgrind attempts to call final_tidyup, which runs __libc_freeres (and gnu_cxx::__freeres for C++) to free some memory allocated by glibc (or libstdc++). However, if the program gets the fatal signal while inside a critical section within glibc, it might leave data structures in an inconsistent state, causing __libc_freeres to crash. This crash makes valgrind itself crash just before producing its error summary, rendering the valgrind run unusable. Therefore, it’s considered a better policy to avoid running __libc_freeres on fatal signal termination, as not having some resources cleaned up is expected in such scenarios.

conclusion

So, maybe we shouldn’t really be teaching students that you should always free memory if glibc isn’t following that same rule. But writing perfectly optimized code isn’t the point of an introductory systems class - that comes later when you know enough to start doing bad things intentionally.

introduction

where’s that memory?

glibc shenanigans

conclusion

`glibc` shenanigans