pthread_cancel considered harmful

Now, time and again someone will come up with a problem which they think they can solve using pthread_cancel, and every time I have tried to discourage it. So the context is usually some C++ application, in some cases it is fairly well designed and in other cases it has been a mess. But no matter what, there is no way to make this work in a way that won’t either cause them to make huge changes to their application that they really do not want to make, or that their application will crash from time to time when it is about to process the cancellation.

The problem lies with how pthread_cancel interacts with some other features of C++, specifically exception handling. Consider the following code:

#include <unistd.h>
#include <pthread.h>
#include <iostream>
using namespace std;
struct Sleepy
{
  ~Sleepy()
  {
    cerr<<"...zZzZz..."<<endl;
    sleep(999);
  }
};
 
void* sleepyThread(void*)
{
  try 
  {        
    Sleepy f;
    cerr<<"Fall asleep...\n";
    throw string("WAKE UP!!");
  } catch(string& e)
  {
    cerr<<"Caught exception:"<<e<<endl;
  }    
}

int main()
{
  pthread_t thread;
  int id=pthread_create(&thread,NULL,&sleepyThread,NULL);
  sleep(1); //Give the new thread time to get to the sleeping part...
  cerr<<"lets try to cancel it..."<<endl;
  pthread_cancel(thread);
  pthread_join(thread,NULL);
  cerr<<"All is done now..."<<endl;
}

Most likely, if you compile this using a recent compiler, you will end up with something like this:

Fall asleep...
...zZzZz...
lets try to cancel it...
terminate called without an active exception
Aborted

What happens here is that the thread is throwing an exception, and during stack unwinding it call the desctructor of the Sleepy object. We all know that we are not supposed to throw exceptions inside destructors, because a destructor is likely to be called while processing an exception, and 2 exceptions being called at the same time causes the C++ runtime to abort.

“But there is no exception being thrown here!!”, I hear you say. Well, not true. The sleep function inside the destructor is a cancellation point, and when pthread_cancel is called, what actually happens is that an anonymous exception is thrown to do the stack unwinding. Oh dear 🙂

Now this is a problem whenever you want to use one of the system calls that also act as a cancellation points inside a destructor. unfortunately, there are a lot of these system calls that, and some of them you really do want to call in a destructor. Wouldn’t you want to call close(), fclose(), dirclose()?This is certainly going to mess with your design, RAII classes are out the window for some commn cases…what else will have to follow to make it cancellation safe?

There are probably be simple cases where cancellation will work as expected, but as soon as you start doing more complex work inside a thread, chances are that you will end up aborting your entire process instead of cancelling one thread. The fun part is that unless you take special care like I did in the example to avoid race-conditions it will work as expected 9999 out of 10000 times. But just that one time when you are running a demo in front of your CEO, the close() will happen just as you cancel, and boooooom……

So what is the solution then. Well, for sure, if aborting your threads is important, you need to design for it. My bet is that something like boosts ineruptible threads are more what you really need than pthread_cancel..

Advertisement