Lately I’ve been super interested in language models for concurrency, after hearing a fascinating talk from Adam Solove on synchronizable abstractions for UI.1 I’ve been working my way through a handful of books, including PCPH, the Concurrency section of PFPL, and most recently Concurrent Programming in ML, by John Reppy.
In particular, I think I’ve found a race condition in one of the code listings of Concurrent Programming in ML. After introducing the listing itself, we’ll walk through a trace that shows the errant behavior, then propose a small change that prevents it from happening.
Setup: Snippets from the Book
Before we begin, here’s the listing in full. It’s a sample implementation of a 1-element concurrent buffer, using condition variables. It supports creation, insertion, and removal.
You might also want to reference this exerpt which explains the semantics of the concurrency primitives at play in the snippet above: locks and condition variables. Study the listing above and exerpt below for a moment. See if you can spot a race, or are convinced the code is correct.
The semantics of the expression
withLock mu f x
are that first the lock mu is acquired, then the function f is applied to x, and then the function’s result is returned after releasing the lock.
The basic operations on condition variables are
val wait : condition -> unit
which causes a process to block on the condition variable, and
val signal : condition -> unit
which wakes up one waiting process. A condition variable is associated with a specific mutex lock, which must be held when performing a wait operation on the variable. The semantics of the wait operation are that the mutex lock is released, and then the process is blocked; when the condition is signaled, the next process in the condition’s waiting queue is unblocked and it reacquires the mutex lock and proceeds. A signal operation on a condition variable that has an empty waiting queue has no effect; in this sense condition variables are memoryless.
— Concurrent Programming in ML, section 2.4.2
A trace to expose the problem
The problem I see has to do with SML’s eager evaluation: before calling
f e, we evaluate
e to a value
v. Then substitution
kicks in and we substitute
v into the body of
f. For us,
that means that in the definition of
!data to a value before we run the body of
acquire the lock. When inserting into an empty queue,
NONE. And since this happens outside the
withLock if two calls to
insert attempt to acquire the lock at the same time, they’ll both think
the queue is empty when they wake up! When this happens, the one to wake
up second will unknowingly overwrite what the first one inserted.
Here’s a sample trace of a program allocating a buffer and then doing two concurrent insertions:
Notice how the stale read allowed two consecutive inserts. What we
wanted was for the second insert to wake up, see that the buffer is
full, then wait for the
dataEmpty condition variable to wake it up.
So having the
!data outside the lock is not good.
Fixing the stale read
The solution to this is to delay evaluating
!data until the body of the
waitLp function, which only executes when we have the lock. This
ensures that we don’t read a stale value for the content of the buffer.
Pretty small bug, and it doesn’t detract from the main point of the
listing, which is to show how to use condition variables in a sort of
“mutually recursive” style where
dataEmpty wakes up
dataAvail which wakes up
This also underscores how difficult it really is to ensure correctness in the presence of concurrency! That’s exactly why I’ve been reading about all these language models for concurrency, to better understand how we can leverage our programming language to ensure our programs are correct by construction.
Unfortunately, the talk isn’t online (Adam presented it at work), so the blog post linked above is the next best thing!↩