Tag Archives: multithreading

std::thread and std::future in C++11

This is a quick note to chapter 4 of C++ Concurrency in Action.

1. std::thread

In C++11, It’s quite simple to create a separate thread using std::thread. Following code will simply output “hello world” or “world hello”:

2. std::mutex and std::condition_variable

If you need synchronization between threads, there are std::mutex and std::condition_variable. The semantics are the same with that in pthread library. Here’s a simple producer/consumer demo:

3. std::future with std::async()

C++11 also simplifies our work with one-off events with std::future. std::future provides a mechanism to access the result of asynchronous operations. It can be used with std::async(), std::packaged_task and std::promise. Starting with std::async():

std::async() gives two advantages over the direct usage of std::thread. Threads created by it are automatically joined. And we can now have a return value. std::async() decides whether to run the callback function in a separate thread or just in the current thread. But there’s a chance to specify a control flag(launch::async or launch::deferred) to tell the library, what approach we want it to run the callback.

When testing With gcc-4.8, foo() is not called. But with VC++2013, it does output “hello”.

4. std::future with std::packaged_task

With std::async(), we cannot control when our callback function is invoked. That’s what std::packaged_task is designed to deal with. It’s just a wrapper to callables. We can request an associated std::future from it. And when a std::packaged_task is invoked and finished, the associated future will be ready:

In waiter() and waiter2(), future::get() blocks until the associating std::packaged_task completes. You will always get “in pt” before “after f.get()” and “in pt2” before “after f2.get()”. They are synchronized.

5. std::future with std::promise

You may also need to get notified in the middle of a task. std::promise can help you. It works like a lightweight event.

Future and Promise are the two separate sides of an asynchronous operation. std::promise is used by the “producer/writer”, while std::future is used by the “consumer/reader”. The reason it is separated into these two interfaces is to hide the “write/set” functionality from the “consumer/reader”:

Again in waiter() and waiter2(), future::get() blocks until a value or an exception is set into the associating std::promise. So “setting p” is always before “f.get()” and “setting p2” is always before “f2.get()”. They are synchronized.

NOTE: std::future seems to be not correctly implemented in VC++2013. So the last two code snippet do not work with it. But you can try the online VC++2015 compiler(still in preview as this writing), it works.

Different Semantics Between Win32 Events and Condition Variables

Following the last post, I’m trying to implement a thread pool for practise, which supposed to work under both Windows and Linux platform. But the different semantics between Win32 events and condition variables makes it impossible to code in a unified logic. First, Linux uses mutex and condition variable to keep synchronization. While there is only event under Windows. Then, pthread_cond_signal() does nothing if no thread is currently waiting on the condition:

But under Windows, code below simply pass through:

And, under Windows Vista and later versions, a new series of synchronization API was introduced to align with the Linux API:

Spurious Wakeups


One of the two basic synchronisation primitives in multithreaded programming is called “condition variables”. Here’s a small example:

Here, the call to “c.wait()” unlocks the mutex (allowing the other thread to eventually lock it), and suspends the calling thread. When another thread calls ‘notify’, the first thread wakes up, locks the mutex again (implicitly, inside ‘wait’), sees that variable is set to ‘true’ and goes on.

But why do we need the while loop, can’t we write:

We can’t. And the killer reason is that ‘wait’ can return without any ‘notify’ call. That’s called spurious wakeup and is explicitly allowed by POSIX. Essentially, return from ‘wait’ only indicates that the shared data might have changed, so that data must be evaluated again.

Okay, so why this is not fixed yet? The first reason is that nobody wants to fix it. Wrapping call to ‘wait’ in a loop is very desired for several other reasons. But those reasons require explanation, while spurious wakeup is a hammer that can be applied to any first year student without fail.

The second reason is that fixing this is supposed to be hard. Most sources I’ve seen say that fixing that would require very large overhead on certain architectures. Strangely, no details were ever given, which made me wonder if avoiding spurious wakeups is simple, but all the threading experts secretly decided to tell everybody it’s hard.

After asking on comp.programming.thread, I at least know the reason for Linux (thanks to Ben Hutchings). Internally, wait is implemented as a call to the ‘futex’ system call. Each blocking system call on Linux returns abruptly when the process receives a signal — because calling signal handler from kernel call is tricky. What if the signal handler calls some other system function? And a new signal arrives? It’s easy to run out of kernel stack for a process. Exactly because each system call can be interrupted, when glibc calls any blocking function, like ‘read’, it does it in a loop, and if ‘read’ returns EINTR, calls ‘read’ again.

Can the same trick be used to conditions? No, because the moment we return from ‘futex’ call, another thread can send us notification. And since we’re not waiting inside ‘futex’, we’ll miss the notification(A third thread can get it, and change the value of predicate. — gonwan). So, we need to return to the caller, and have it reevaluate the predicate. If another thread indeed set it to true, we’ll break out of the loop.

So much for spurious wakeups on Linux. But I’m still very interested to know what the original reasons were.

Also see the explanation for spurious wakeups on the linux man page: pthread_cond_signal.
Last note: PulseEvent() in windows(manual-reset) = pthread_cond_signal() in linux, while SetEvent() in windows(auto-reset) = pthread_cond_broadcast() in linux, see here and here. And spurious wakeups are also possible on windows when using condition variables.

Logging in Multithreaded Environment Using Thread-Local Storage

Generally, A logger is a singleton class. The declaration may look like:

The Init function is used to set log name or maybe other configuration information. And We can use the Write function to write logs.

Well, in a multithreaded environment, locks must be added to prevent concurrent issues and keep the output log in order. And sometimes we want to have separate log configurations. How can we implement it without breaking the original interfaces?

One easy way is to maintain a list of all available Logger instances, so that we can find and use a unique Logger in each thread. The approach is somehow like the one used in log4j. But log4j reads configuration files to initialize loggers, while our configuration information is set in runtime.

Another big issue is that we must add a new parameter to the GetInstance function to tell our class which Logger to return. The change breaks interfaces.

By utilizing TLS (thread-local storage), we can easily solve the above issues. Every logger will be thread-local, say every thread has its own logger instance which is stored in its thread context. Here comes the declaration for our new Logger class, boost::thread_specific_ptr from boost library is used to simplify our TLS operations:

Simply use boost::thread_specific_ptr to wrap the original 2 static variables, and they will be in TLS automatically, that’s all. The implementation:

Our test code:

Output when using the original Logger may look like:

When using the TLS version, it may look like:

Everything is in order now. You may want to know what OS API boost uses to achieve TLS. I’ll show you the details in boost 1.43:

The underlying API is TlsGetValue under windows and pthread_getspecific under *nix platforms.