Lately I’ve been super interested in language models for concurrency,
after hearing a fascinating talk from Adam Solove on synchronizable
abstractions for UI.Unfortunately, the talk isn’t online (Adam presented it
at work), so the blog post linked above is the next best thing!
I’ve been working my way through a handful of books,
including PCPH,
the Concurrency section of PFPL, and most recently
Concurrent
Programming in ML, by John Reppy.
In particular, I think I’ve found a race condition in one of the code listings of Concurrent Programming in ML. After introducing the listing itself, we’ll walk through a trace that shows the errant behavior, then propose a small change that prevents it from happening.
Setup: Snippets from the Book
Before we begin, here’s the listing in full. It’s a sample implementation of a 1-element concurrent buffer, using condition variables. It supports creation, insertion, and removal.
You might also want to reference this exerpt which explains the semantics of the concurrency primitives at play in the snippet above: locks and condition variables. Study the listing above and exerpt below for a moment. See if you can spot a race, or are convinced the code is correct.
The semantics of the expression
withLock mu f x
are that first the lock mu is acquired, then the function f is applied to x, and then the function’s result is returned after releasing the lock.
The basic operations on condition variables are
val wait : condition -> unit
which causes a process to block on the condition variable, and
val signal : condition -> unit
which wakes up one waiting process. A condition variable is associated with a specific mutex lock, which must be held when performing a wait operation on the variable. The semantics of the wait operation are that the mutex lock is released, and then the process is blocked; when the condition is signaled, the next process in the condition’s waiting queue is unblocked and it reacquires the mutex lock and proceeds. A signal operation on a condition variable that has an empty waiting queue has no effect; in this sense condition variables are memoryless.
— Concurrent Programming in ML, section 2.4.2
A trace to expose the problem
The problem I see has to do with SML’s eager evaluation: before
calling a function f e
, we evaluate e
to a
value v
. Then substitution kicks in and we substitute
v
into the body of f
. For us, that means that
in the definition of insert
:
fun insert (* ··· *) =
(* ··· *)
withLock mu waitLp (!data)(* ··· *)
we evaluate !data
to a value before we run the body of
withLock
to acquire the lock. When inserting into an empty
queue, !data
evaluates to NONE
. And since this
happens outside the withLock
if two calls to insert attempt
to acquire the lock at the same time, they’ll both think the queue is
empty when they wake up! When this happens, the one to wake up second
will unknowingly overwrite what the first one inserted.
Here’s a sample trace of a program allocating a buffer and then doing two concurrent insertions:
Notice how the stale read allowed two consecutive inserts. What we
wanted was for the second insert to wake up, see that the buffer is
full, then wait for the dataEmpty
condition variable to
wake it up. So having the !data
outside the lock is not
good.
Fixing the stale read
The solution to this is to delay evaluating !data
until
the body of the waitLp
function, which only executes when
we have the lock. This ensures that we don’t read a stale value for the
content of the buffer.
Pretty small bug, and it doesn’t detract from the main point of the
listing, which is to show how to use condition variables in a sort of
“mutually recursive” style where dataEmpty
wakes up
insert
which signals on dataAvail
which wakes
up remove
.
This also underscores how difficult it really is to ensure correctness in the presence of concurrency! That’s exactly why I’ve been reading about all these language models for concurrency, to better understand how we can leverage our programming language to ensure our programs are correct by construction.