Synchronization costs

When a thread is trying to acquire a mutex it will have to invoke an OS call, and thus switch to the kernel context. While in the kernel, when the mutex is already taken, the thread will be put to sleep and woken up when the mutex is released.

However, modern OS platforms offer low-overhead locking mechanisms, such as a futex on Linux or CriticalSection on Windows. The basic idea is to check an atomic counter in the user space and spare the switch to the kernel context, if the mutex is free (also known as uncontended). In this way the uncontended case might be highly optimized!

So, remember—mutexes aren't slow per se, it is the contention that slows things down!

You might sometimes still see the thundering herd problem mentioned in conjunction with thread synchronization, but modern platforms do not suffer from that anymore. "Thundering herd" used to describe the situation where several threads were waiting on a lock and then all of them were woken up, although only one could acquire the lock and proceed.