Lately I’ve been super interested in language models for concurrency, after hearing a fascinating talk from Adam Solove on synchronizable abstractions for UI.Unfortunately, the talk isn’t online (Adam presented it at work), so the blog post linked above is the next best thing!
I’ve been working my way through a handful of books, including PCPH, the Concurrency section of PFPL, and most recently Concurrent Programming in ML, by John Reppy.
In particular, I think I’ve found a race condition in one of the code listings of Concurrent Programming in ML. After introducing the listing itself, we’ll walk through a trace that shows the errant behavior, then propose a small change that prevents it from happening.
Setup: Snippets from the Book
Before we begin, here’s the listing in full. It’s a sample implementation of a 1-element concurrent buffer, using condition variables. It supports creation, insertion, and removal.
You might also want to reference this exerpt which explains the semantics of the concurrency primitives at play in the snippet above: locks and condition variables. Study the listing above and exerpt below for a moment. See if you can spot a race, or are convinced the code is correct.
The semantics of the expression
withLock mu f x
are that first the lock mu is acquired, then the function f is applied to x, and then the function’s result is returned after releasing the lock.
The basic operations on condition variables are
val wait : condition -> unit
which causes a process to block on the condition variable, and
val signal : condition -> unit
which wakes up one waiting process. A condition variable is associated with a specific mutex lock, which must be held when performing a wait operation on the variable. The semantics of the wait operation are that the mutex lock is released, and then the process is blocked; when the condition is signaled, the next process in the condition’s waiting queue is unblocked and it reacquires the mutex lock and proceeds. A signal operation on a condition variable that has an empty waiting queue has no effect; in this sense condition variables are memoryless.
— Concurrent Programming in ML, section 2.4.2
A trace to expose the problem
The problem I see has to do with SML’s eager evaluation: before calling a function f e
, we evaluate e
to a value v
. Then substitution kicks in and we substitute v
into the body of f
. For us, that means that in the definition of insert
:
fun insert (* ··· *) =
(* ··· *)
withLock mu waitLp (!data)(* ··· *)
we evaluate !data
to a value before we run the body of withLock
to acquire the lock. When inserting into an empty queue, !data
evaluates to NONE
. And since this happens outside the withLock
if two calls to insert attempt to acquire the lock at the same time, they’ll both think the queue is empty when they wake up! When this happens, the one to wake up second will unknowingly overwrite what the first one inserted.
Here’s a sample trace of a program allocating a buffer and then doing two concurrent insertions:
Notice how the stale read allowed two consecutive inserts. What we wanted was for the second insert to wake up, see that the buffer is full, then wait for the dataEmpty
condition variable to wake it up. So having the !data
outside the lock is not good.
Fixing the stale read
The solution to this is to delay evaluating !data
until the body of the waitLp
function, which only executes when we have the lock. This ensures that we don’t read a stale value for the content of the buffer.
Pretty small bug, and it doesn’t detract from the main point of the listing, which is to show how to use condition variables in a sort of “mutually recursive” style where dataEmpty
wakes up insert
which signals on dataAvail
which wakes up remove
.
This also underscores how difficult it really is to ensure correctness in the presence of concurrency! That’s exactly why I’ve been reading about all these language models for concurrency, to better understand how we can leverage our programming language to ensure our programs are correct by construction.