Michael Ho - Personal Website

When a quick python debugging exercise turns into a learning lesson about how adding environment variables is unsafe in a multi-threaded application.

Seeing commands failing with 32512? Your execve syscalls are failing with -1 EFAULT? It might just be putenv throwing a wrench into your multi-threaded applications.

The Start: s2i-python-container Tests Are Failing For Me

Like all good debugging exercises, it often starts with trying to change a lightbulb and ends with you pulling out the engine of your car.

I wanted to make a small contribution to the s2i-python-container project and so I was running the tests associated from following their contributors guide. Some failures caught my eye and I noticed an issue with their uWSGI tests. When the tests would attempt to install uWSGI using pip in a container, the setup would fail with an error code and fail the test. No errors printed to the logs or terminal, just an abrupt halt while compiling some C dependencies using gcc.

At first I assumed this was due to some missing dependencies because the build logs of the failing containers indicated they were missing libxml2-config but this turned out to be a red herring. As I dug deeper, I realised the failure rate was not 100% and it was possible to get a passing test run randomly sometimes. This began to smell like a race condition with uWSGI installation.

Digging Into uWSGI

Running the tests for s2i-python-container didn’t provide a good debugging interface so I decided to check if the issue was reproducible by running the uWSGI setup.py installer directly. At first I couldn’t get a clean reproduction but after copying the CFLAGS from the s2i-python-container logs into the environment (luckily uWSGI automatically inherited it from the env) it began to reproduce the failure.

The key thing about the CFLAGS seemed to be its length. It was a quite long string (at over ~4000 characters).

Now that I could reproduce the issue in uWSGI, I could pinpoint the failing code to line 120 of uwsgiconfig.py. The code would check the return code of os.system (running gcc) and if failure is detected, it would silently exit the setup.py with an error code.

After finding this, I was able to see that actually this bug was picked up in the project and a workaround was deployed (by using subprocess instead of os.system). This fix is available in newer versions of uWSGI which were not yet pushed to pypi as of the time of writing this post.

The interesting thing in the bug report is that there were suspicions on what was causing the issue (thread unsafe use of os.system) but no concrete proof found. So I decided to keep digging.

The Exit Code

Now that I had the code line that causes the problem, I could change the error handling to show the exit code from os.system rather than silently drop it (it would translate any error into exit code 1). This showed os.system was returning a weird error code of “32512”.

Googling this shows up a variety of potential reasons including one where it can be caused by having too many characters in an os.system command line string. Unfortunately this isn’t the cause here as that is triggered after 65533 characters and is a persistent issue and not a race condition. This error code looks like it indicates some issue with the system executing the command.

When I applied strace to debug the associated syscalls (checking they’re called, and what they return if so) I could see the execve syscall occurs and is returning the error code: -1 EFAULT. This suggests a memory address became unreachable during the execve call.

A snippet from the execve(2) manpage:

EFAULT pathname or one of the pointers in the vectors argv or
       envp points outside your accessible address space.

Looking at the command line string to find clues related, uWSGI was (in addition to the long CFLAGS) appending additional long gcc flags resulting in a command line string of over 13,000 characters. I started to think I had found a potential race condition bug in cpython or glibc where maybe the string holding the command line for os.system was being freed before execve had finished copying it. This may have only become visible as the command line string became longer (making it longer to copy) and more threads could run in parallel (increasing the chances of collision).

However, after digging through the code for cpython, glibc and the kernel, I could see no point where a free before use could occur. Everything was chained together and waiting patiently for execve to finish before any memory was freed.

I decided to try to take the reproduction case out of uWSGI to be able to better analyse it.

Trying to Reproduce outside of uWSGI

I set up a small python script similar to the conditions in the uwsgiconfig.py code. A thread pool was configured to run several os.system calls in parallel which were fed from the main thread. The threads were configured to crash the script with an exit code if any os.system calls failed.

The first experiment was to echo large strings with the threadpool. This was unsuccessful as no errors occurred and all the threads happily echoed out their long strings. I continuously increased the string length but still could not cause any errors.

I next tried adding large environment variables to the python threads but this also yielded no failures from the threads.

I then tried to change the workload from calling echo to instead spawning gcc (compiling bzip2.c) to align it closer to what uWSGI was doing. Still no errors occurred. I copied the C files from uWSGI directly in to compile in case it related to the C code itself but this was also not fruitful.

Did my machine magically enter a state where the race condition was gone? I could still reproduce directly using uWSGI and consistently so I realised there must be something I’m missing there.

Returning to uWSGI (uwsgiconfig.py)

I dug further into how the uWSGI python build works and learned that the build is divided into three main steps:

compile the core code
compile the plugins
link the uwsgi binary together

I discovered if I toggled off the plugins compilation step then the bug would disappear!

In this plugins compilation step, it was using exec to execute the python code from the plugins directly in the Python process before compiling the related C code. This was occurring in parallel to the gcc calls from the core code compilation as everything was being fed into a thread pool. This was what my reproduction case was missing.

I found if I disabled the build of a certain plugin (named the python plugin) then the bug would disappear. This plugin had some Python code executed that would add an environment variable using os.environ. I commented out that line specifically and found the bug disappeared in uWSGI.

It now looked like a race condition occurring with os.system and os.putenv could cause os.system to fail to execute its command. Now I had enough information to make a better guess at what is going on and try to reproduce outside of uWSGI.

Reproduction Case Outside of uWSGI

Returning to my reproduction case, I reverted my threadpool back to simply echo helloworld (suppressed by piping to /dev/null) and set up a loop in the main thread to add random environment variables using os.putenv. Success! Now I can reproduce the issue almost consistently.

I made this reproduction case available publicly on my GitHub. You can test this case out by running the run.sh script on a Linux machine with multiple cpu cores.

$ ./run.sh
[pid 1981587] execve("/bin/sh", ["sh", "-c", "echo helloworld > /dev/null"], 0x5632d7c5d160 /* 82 vars */) = -1 EFAULT (Bad address)
[pid 1981586] execve("/bin/sh", ["sh", "-c", "echo helloworld > /dev/null"], 0x5632d7c5d160 /* 82 vars */) = -1 EFAULT (Bad address)
[pid 1981593] execve("/bin/sh", ["sh", "-c", "echo helloworld > /dev/null"], 0x5632d7cb1b50 /* 6 vars */) = -1 EFAULT (Bad address)
[pid 1981592] execve("/bin/sh", ["sh", "-c", "echo helloworld > /dev/null"], 0x5632d7cb1b50 /* 6 vars */) = -1 EFAULT (Bad address)
[pid 1981594] execve("/bin/sh", ["sh", "-c", "echo helloworld > /dev/null"], 0x5632d7cb1b50 /* 6 vars */ <unfinished ...>
)                                       = -1 EFAULT (Bad address)
[pid 1981591] execve("/bin/sh", ["sh", "-c", "echo helloworld > /dev/null"], 0x5632d7cb1b50 /* 6 vars */) = -1 EFAULT (Bad address)
[pid 1981590] execve("/bin/sh", ["sh", "-c", "echo helloworld > /dev/null"], 0x5632d7cb1b50 /* 6 vars */) = -1 EFAULT (Bad address)
[pid 1981589] execve("/bin/sh", ["sh", "-c", "echo helloworld > /dev/null"], 0x5632d7cb1b50 /* 6 vars */) = -1 EFAULT (Bad address)
[pid 1981588] execve("/bin/sh", ["sh", "-c", "echo helloworld > /dev/null"], 0x5632d7cb1b50 /* 6 vars */) = -1 EFAULT (Bad address)
[thread 5][os.system failure] ouch! return code = 32512
[thread 3][os.system failure] ouch! return code = 32512
[thread 5][os.system failure] ouch! return code = 32512
[thread 3][os.system failure] ouch! return code = 32512
[thread 0][os.system failure] ouch! return code = 32512
[thread 7][os.system failure] ouch! return code = 32512
[thread 2][os.system failure] ouch! return code = 32512
[thread 4][os.system failure] ouch! return code = 32512
[thread 6][os.system failure] ouch! return code = 32512

I also reproduced this into a C program to show that the issue is not with cpython but that it’s a limitation of glibc.

$ make all
gcc main.c -o main

$ make run
strace -f -s99999 -e trace=execve -e quiet="attach,exit,thread-execve" -Z ./main 2>&1 | grep -v "\-\-\-"
[pid 3099883] execve("/bin/sh", ["sh", "-c", "echo helloworld > /dev/null"], 0x1212120 /* 16 vars */) = -1 EFAULT (Bad address)
cmd failed! error=32512
302 successful runs before failure

Looking into glibc, I can see that the issue stems from a race condition around the environment variables array pointer. The glibc system function will pass the pointer directly to execve (from which the kernel will make a deep copy during the syscall). The glibc putenv function will re-allocate this pointer if an environment variable is added. This makes the two functions potentially racey as it’s possible then for the following case:

execve to be called with the pointer
putenv re-allocates the pointer to a new memory address
execve begins trying to copy the pointer contents but since the memory address is changed it causes an error

The Learned Lesson: Putenv Is Not Thread Safe!

Now that I had a root cause nailed down, it was easier to read through the literature and find out this is a known limitation of putenv. Looking at the man pages for glibc, they show whether a function is multi-threaded safe using the attributes.

A snippet from the attributes(7) manpage:

MT-Safe
    MT-Safe or Thread-Safe functions are safe to call in the
    presence of other threads.  MT, in MT-Safe, stands for
    Multi Thread.

    Being MT-Safe does not imply a function is atomic, nor
    that it uses any of the memory synchronization mechanisms
    POSIX exposes to users.  It is even possible that calling
    MT-Safe functions in sequence does not yield an MT-Safe
    combination.  For example, having a thread call two MT-
    Safe functions one right after the other does not
    guarantee behavior equivalent to atomic execution of a
    combination of both functions, since concurrent calls in
    other threads may interfere in a destructive way.

    Whole-program optimizations that could inline functions
    across library interfaces may expose unsafe reordering,
    and so performing inlining across the GNU C Library
    interface is not recommended.  The documented MT-Safety
    status is not guaranteed under whole-program optimization.
    However, functions defined in user-visible headers are
    designed to be safe for inlining.

MT-Unsafe
    MT-Unsafe functions are not safe to call in a
    multithreaded programs.

The system(3) function has an MT-Safe attribute so it’s noted as thread safe but the putenv(3) function has an MT-Unsafe attribute (specifically around const:env) that should make it clear to not use putenv in a multi-threaded context as it modifies the environment in an unsafe manner.

┌──────────────────────────┬───────────────┬─────────────────────┐
│Interface                 │ Attribute     │ Value               │
├──────────────────────────┼───────────────┼─────────────────────┤
│putenv()                  │ Thread safety │ MT-Unsafe const:env │
└──────────────────────────┴───────────────┴─────────────────────┘

An interesting thing I noticed while reading the manpages is that the system(3) manpage does not mention that it accesses the environment without any guards to ensure safety in the presence of concurrent modifications. Meanwhile getenv(3) has a special note in its attributes that it accesses env and that while it’s marked MT-Safe it is prone to risk that env is modified in parallel. I guess that technically this is true (system only passes the environment pointer to the kernel, it does not access it itself) but it would be nice if some kind of a warning was visible on the man page about this potential safety gap.

I guess it does make sense though to put the onus on putenv usage rather than marking all potential glibc functions that happen to interact with the environment with additional MT-Safe conditions.

┌──────────────────────────────────┬───────────────┬─────────────┐
│Interface                         │ Attribute     │ Value       │
├──────────────────────────────────┼───────────────┼─────────────┤
│getenv(), secure_getenv()         │ Thread safety │ MT-Safe env │
└──────────────────────────────────┴───────────────┴─────────────┘

So in summary, if you see -1 EFAULT coming out of your syscalls randomly, here’s the hint that you should ensure you’re not setting/adding any environment variables in parallel!