2010-02-27

Suppressing SIGPIPE in a library

[[ People still seem to come here occasionally but I think the post below is outdated: POSIX 2008 introduced MSG_NOSIGNAL for send(2) which Linux had since 2.2, FreeBSD/OSX have SO_NOSIGPIPE for setsockopt(2),  and all other Unix flavours are more or less dead (and Windows never had signals in the first place). ]]


When writing a library, sometimes it is necessary to suppress SIGPIPE that is generated by OS when the thread tries to write to a pipe with no reader on the other end.  Surprisingly, this is not as straightforward as one might think.  I myself implemented it twice (in Cache::Memcached::Fast prior to 0.18, and in XProbes prior to 0.4), and both times did it wrongly (it's worth noting that the bug couldn't affect applications that use these libraries, unless they do sophisticated handling of SIGPIPE themselves in a multi-threaded context, which is a very rare case).  Considering the problem once more, I finally came to the correct solution.

A bit of background: certain write system calls (write(), send(), etc.) when used on a pipe or a stream-oriented socket with no one reading on the other end generate SIGPIPE in addition to returning EPIPE error code.  Such SIGPIPE is synchronous (it is generated before write system call returns) and per-thread (it is sent to the thread doing the write system call).  Compare with SIGINT that is generated when you press C-c on console: it is asynchronous (its generation is not a result of a particular action of the application itself) and per-process (it is sent to a process as a whole, any thread that doesn't block it can be chosen to handle it).  Signal disposition is an action that is associated with the signal.  It may be either default (for SIGPIPE it is to kill the process), ignored (delivered signal has no effect whatsoever), or a custom handler may be registered for the signal with sigaction().  Signal disposition is a per-process attribute, i.e., signal action is shared among all threads.  Additionally, signal may either be blocked or unblocked in the given thread (per-thread attribute).  When blocked, the signal remains pending until it is unblocked later (or if the signal was sent to the whole process, it will be delivered to some other thread that doesn't block it).  There are two kinds of signals: standard and real-time.  The difference is that real-time signals may carry additional data with them, and also they may be queued: more than one signal (of the same value) may be pending at a given moment.  Standard signals are not queued, normally only one signal (of a given value) may be pending.  When the same standard signal is generated while it is pending already, the second instance is "merged" with the first, or simply lost in other words.

Library implementation should provide incapsulation: if it generates SIGPIPE as part of its internal action, that SIGPIPE should not be visible to the user.  Since EPIPE error code is enough to note the condition, we have to somehow suppress SIGPIPE that comes along.  It should be noted that most advanced OSes have flags to suppress SIGPIPE either per-connection (SO_NOSIGPIPE), or per-request (MSG_NOSIGNAL).  However there are orthodox systems that provide neither, and the rest is about them.

The trap most implementations (including my earlier versions) fall into is that the problem is viewed as "ignoring SIGPIPE", whereas the better term would be "suppressing SIGPIPE".  Thinking about ignoring, one quickly comes to sigaction(SIG_IGN).  However this changes signal disposition for the whole process, thus may affect other threads that are not prepared for this.

Instead, we have to temporarily suppress SIGPIPE in the current thread that executes library code.  Here's how to do this: to suppress SIGPIPE we first check if it is pending.  If it does, this means that it is blocked in this thread, and we have to do nothing.  If the library generates additional SIGPIPE, it will be merged with the pending one, and that's a no-op.  If SIGPIPE is not pending then we block it in this thread, and also check whether it was already blocked.  Then we are free to execute our writes.  When we are to restore SIGPIPE to its original state, we do the following: if SIGPIPE was pending originally, we do nothing.  Otherwise we check if it is pending now.  If it does (which means that out actions have generated one or more SIGPIPEs), then we wait for it in this thread, thus clearing its pending status (to do this we use sigtimedwait() with zero timeout; this is to avoid blocking in a scenario where malicious user sent SIGPIPE manually to a whole process: in this case we will see it pending, but other thread may handle it before we had a change to wait for it).  After clearing pending status we unblock SIGPIPE in this thread, but only if it wasn't blocked originally.  This link shows a complete code (suppress_sigpipe() and restore_sigpipe() functions).

7 comments:

  1. Thank you, that's exactly what I've been looking for!
    Your explanation is very clear, as well as the code.

    ReplyDelete
  2. It's worth noting that at least some Solaris and OpenBSD systems have neither MSG_NOSIGNAL nor SO_NOSIGPIPE, but also lack sigtimedwait() (or it resides in some library other than libc that I failed to figure out). It is fairly safe to replace the letter with sigwait() (if someone sends SIGPIPE by hand, our process would be dead anyway).

    ReplyDelete
  3. Useful post. I'm slightly worried we would silently swallow SIGPIPE if it would be generated between sigpending and pthread_sigmask though.

    ReplyDelete
  4. The point was to suppress only SIGPIPE that we might generate ourselves, and we can't cause SIGPIPE between sigpending and pthread_sigmask because there are no write() calls there ;).

    ReplyDelete
  5. See the thread on comp.unix.programmer:

    https://groups.google.com/forum/?fromgroups#!topic/comp.unix.programmer/Spk9NrhSMPk

    Minor nit: suppress_sigpipe() doesn't empty "pending" or "blocked" before use. See the second DESCRIPTION paragraph here: http://pubs.opengroup.org/onlinepubs/9699919799/functions/sigismember.html

    ReplyDelete
    Replies
    1. Thanks, I agree that this should be fixed.

      Delete
  6. Thank you for this wonderful pointer.

    ReplyDelete