Like any other tool, POSIX signals have their own rules on how to use them wisely, securely, and safely. Programming language standards, man pages and the POSIX standard itself have described POSIX signals long ago. However, I often encounter crucial bugs related to POSIX signals even in skilled developers' code. These bugs may be found in both commercial and open source projects. So let's talk about the important stuff once again. (By the way, to newbies in the world of software development: committing to open source projects to fix obvious bugs in POSIX signal handlers is a great way to sharpen your skills in open-source projects and add cases to your portfolio. Fortunately, there are a lot of projects with similar bugs).
We published and translated this article with the copyright holder's permission. The author is Kirill Ovchinnikov (email - firstname.lastname@example.org). The article was originally [RU] published on Habr.
Well, first things first. What happens when a process receives a signal? The signal handler can be called in any of the threads of the process for which this specific signal (for example, SIGINT) is not marked as blocked. If there are several such threads, the kernel chooses one of the thread. Most often, it'll be the main thread of the program, however, this is not guaranteed, and you should not count on it. The kernel creates a special frame on the stack for the signal handler. This frame stores the information required for the process to continue working. This information includes: the program counter register (the address from which code should be executed), architecture-specific registers that are necessary for resuming the interrupted program, the thread's current signal mask, etc. After that, the signal handler function is called directly in this thread.
What does this mean? It means that the execution of any thread (which is not blocked for processing our signal) can be interrupted at any time. At absolutely any moment. It can be interrupted even in the middle of any function performing, any system call. Now, let's assume that if this call has some kind of static, global or thread-local internal state, for example, a buffer, some flags, mutex, or something else, calling the function again when it has not finished working yet may lead to completely unpredictable results. In computer science, such a function is called non-reentrant.
Let's use some function from stdio.h. For example, the well-known printf(). It uses a statically allocated data buffer inside, along with counters and indexes that store the amount of data and the current position in the buffer. All this isn't updated atomically. And if suddenly at the time of printf() execution, we catch the signal and run its handler in some thread, and this handler also call printf(), this function will work with an incorrect internal state. At best, it will simply lead to an incorrect result. At worst, the segmentation fault of the entire program will occur.
Another example: malloc() and free() are non-reentrant on most platforms because they use a static data structure inside that stores which memory blocks are free. The problem is compounded by the fact that malloc()/free() can be implicitly used in the depths of other library functions, and you may not even know about it.
Therefore, there is such a thing as async-signal-safety. Namely, the POSIX standard explicitly describes the strictly limited function set in signal handlers, and nothing more.
List of functions allowed:
Note that the function list varies between different POSIX standard versions, and changes can occur in two directions. For example, fpathconf(), pathconf(), and sysconf() were considered safe in the 2001 standard. In the 2008 standard they are not safe anymore. fork() is still a safe function. However, for a number of reasons, there are plans to remove it from the list in future versions of the standard.
And now the most important thing. An attentive eye may notice that this list doesn't contain printf(), syslog(), malloc() functions. So you can't use these functions in a signal handler and, in theory, you can't use everything that have this functions inside. You can't write to std::cout and std::cerr in C++. These operations are non-reentrant as well.
Among the C standard library functions, there are many functions that are also non-reentrant. For example, almost all functions from <stdio.h>, many functions from <string.h>, the number of functions from <stdlib.h> (however, some of them are in the allowed list). However, the C language standard clearly prohibits calling almost everything in signal handlers from the standard library, except abort(), _Exit(), quick_exit() and signal() itself:
ISO/IEC 9899:2011 §188.8.131.52 The signal function
5. If the signal occurs other than as the result of calling the abort or raise function, the behavior is undefined if ... the signal handler calls any function in the standard library other than the abort function, the _Exit function, the quick_exit function, or the signal function with the first argument equal to the signal number corresponding to the signal that caused the invocation of the handler.
So, if you really want to output something to the console from the signal handler, you can do it with the old-fashioned method:
write(1,"Hello World!", 12);
But it may be good practice (by the way, it's explicitly recommended in the libc documentation) to make signal handlers as simple and short as possible. For example, you can do write() to pipe, and in another thread (or in the main event loop of your program) you can do select() for this pipe. You can generally wait for and process signals in a specially dedicated thread (through sigwait(), you can take care of the correct mask in advance). Or the simplest option: the signal handler will generally be reduced to setting a flag variable that will be processed in the main program loop. However, variable flags is not that simple either. That's sort of what the next paragraph is about.
Let's look at the same item from the C language standard:
ISO/IEC 9899:2011 §184.108.40.206 The signal function
5. If the signal occurs other than as the result of calling the abort or raise function, the behavior is undefined if the signal handler refers to any object with static or thread storage duration that is not a lock-free atomic object other than by assigning a value to an object declared as volatile sig_atomic_t
Modern C++ standards tell the same thing. The logic here is exactly the same as in the previous paragraph. Since the signal handler can be called at absolutely any moment, it's important that the non-local variables that you are dealing with in the handler are updated atomically. Otherwise, if interrupted at the wrong moment, you may get incorrect content in the variables. Secondly, since from the point of view of the function being performed, variables are changed by "something else". It's important that accesses to these variables are not optimized out by the compiler. Otherwise, the compiler may decide that it's impossible to change the variable value between iterations of the cycle and will leave out this check altogether or will put a variable in the processor register for optimization. Therefore, as static/global flags, you can use either atomic types that can be changed from the signal handler (if they are exactly lock-free on your platform), or the sig_atomic_t type with the volatile specifier specially created for this purpose.
And God forbid you block some mutex in the signal handler. The same mutex that is used in the other part of the program or in handlers of other signal. This is the direct way to deadlock. Therefore, you can also forget about conditional variables as flags.
It's simple. If you call any function in the signal handler that can theoretically change the errno global variable, save the current errno value at the beginning of the signal handler somewhere, and restore it back at the end. Otherwise, you can break some outer code that checks that same errno.
Let's start with the fact that signal() has a significant advantage: it's included in the C language standard, whereas sigaction() is already a purely POSIX thing. On the other hand, the behavior of signal() can vary widely in different operating systems. Moreover, there are mentions on the Internet that the behavior of signal() may vary even with different versions of the Linux kernel.
First, a little bit of history for you.
On the original UNIX systems, calling a signal handler previously set with signal() reset the handler to SIG_DFL, and the system did not block the delivery of further instances of the signal. Nowadays this is equivalent to calling sigaction() with the SA_RESETHAND | SA_NODEFER flags. In other words, we received the signal, processed it -> the handler was reset to the standard one. And therefore, having finished processing the received signal, we had to remember to call signal() again and set our function again instead of the standard handler. System V also provided these semantics for signal(). This situation was bad because the next signal might be sent and delivered to the process again before the handler had time to reestablish itself. Furthermore, rapid delivery of the same signal could result in recursive invocations of the handler.
BSD improved on this situation. When a signal is received, the signal handler is not reset. But this was not the only change in behavior: further instances of the signal are blocked from being executed while the first handler is executing. In addition, some blocking system calls (such as read() or wait()) are automatically restarted if interrupted by a signal handler. The BSD semantics are equivalent to calling sigaction() with the SA_RESTART flag.
The situation on Linux is as follows:
So, the main differences between signal() and sigaction() are as follows:
Therefore, to avoid unexpected situations and portability problems, Open Group Base Specification recommends that you don't use signal(). Use sigaction() in the new code instead.
A child process created via fork() inherits the installed signal handlers of its parent. During an execve(), signal handlers are reset to the default, but the settings of blocked signals remain unchanged for the newly started process. So, if, for example, you ignored SIGINT, SIGUSR1, or something else in the parent, and the running process is counting on them, this can lead to interesting consequences.
If multiple standard (non-real time) signals are sent to a process, the order in which the signals are delivered is unspecified.
Standard signals do not queue. If multiple instances of a standard signal are sent to the process while that signal is blocked, then only one instance of the signal is marked as pending (and the signal will be delivered just once when it is unblocked).
Everything I wrote above is there in the documentation. And in general, there are a lot of interesting, useful and unexpected information there, especially in the Portability, Bugs and Known issues sections.
For example, I really like the description of the getlogin()/cuserid() function:
Sometimes it does not work at all, because some program messed up the utmp file. Often, it gives only the first 8 characters of the login name.
and more beautiful:
Nobody knows precisely what cuserid() does; avoid it in portable programs.
That's it. Clean code to you!
Date: Feb 20 2024
Author: Andrey Karpov
Date: Feb 06 2024
Author: Andrey Karpov
Date: Feb 01 2024
Author: Mikhail Gelvikh
Date: Jan 26 2024
Author: Anton Tretyakov
Date: Dec 20 2023
Author: Boris Novoselov