Sorbet thinks that loop_count is nil at the start of the rescue block, which causes it to
declare the puts line unreachable (because nil is never truthy).
But why? Clearly we can see that loop_count is an Integer. We’d expect Sorbet to at
least think loop_count has type T.nilable(Integer), if not simply Integer outright.
Sometimes Sorbet takes shortcuts—especially when the short cut is good enough 99% of the time
while being simple and fast. Sorbet’s approach to rescue and exception handling is one
of these shortcuts.
Control flow and rescue in Sorbet
I mentioned in my last post that Sorbet builds a control flow graph (CFG)
in order to model control-flow sensitive types throughout the body of a method. For rescue
nodes, it pretends that there are only two jumps into the rescue block: once before any
any code in the begin block has run, and once after all the code in begin block has
run. It looks a little something like this:
An example CFG with a rescue block
An example CFG with a rescue block
This is a simplified view of a CFG in Sorbet.If you have Graphviz installed, you can get Sorbet to dump its internal CFG for a given
file with the cfg-view.sh script in the Sorbet repo. The CFG for the example above looks
like like this.
The boxes contain hunks of straight-line
code (code without control flow), and all control flow is made explicit by branching on a
specified variable at the end of each block. For the case of rescue, the branches read from a
magical <exn-value> variable, which Sorbet treats as unanalyzable: Sorbet
doesn’t attempt to track how the value is initialized nor how control flow affects it.
Knowing this, we can explain the weird dead code error from the snippet above:
The first jump into the rescue block happens before any code in the begin block runs.
At that point, the loop_count variable is still uninitialized, and thus has type NilClass
when taking that branch.
The second jump into the rescue block happens after the begin block finishes. But from
Sorbet’s point of view, the begin block in the loop_count snippet never finishes! Sorbet
sees the infinite while true loop and says, “it doesn’t matter where to branch at the end
of the begin block—the infinite loop won’t allow control to flow there.”If you’re fearless, you can prove that this is what’s happening in the rendered
CFG for the above example.
Despite
it being able to tell that loop_count has type Integer inside the begin, only the
NilClass branch is live in the rescue block.
That second point about while true is simply a bug. Sorbet should be smart enough to
suspend the normal flow-sensitivity rules for infinite loops while checking code in a begin
block that has a rescue.Of course, easier said than done.
But still, fixing that bug would only fix half the problem:
Sorbet would think that loop_count has type T.nilable(Integer) in the rescue
body, but we said the best outcome would be for Sorbet to know that loop_count is always
initialized, having type Integer.
Before we can see what it would take for Sorbet to infer Integer, some history.
A brief history of rescue in Sorbet
Sorbet’s first commit dates to October 3, 2017. Six weeks later, the initial support for
rescue landed. The pull request description is from a time when all pull requests
were not public, so I’ll quote it here:
This does most of the work in the CFG, preserving the semantics in the desugarer. […]
It introduces a series of uncomputable ifs since 0 or more instructions from the first
block will execute then one of the rescues could match and then if none do the else will
match.
The approach it’s describing is what most people might do intuitively: any instruction in a
begin might raise,This is not quite true: x = 0 doesn’t raise, and Sorbet can see that syntactically. This might be something to take advantage of in the future.
so let’s record an unanalyzable jump after each instruction,
into the rescue block. In picture form:
A diagram depicting the implementation described in the above commit
A diagram depicting the implementation described in the above commit
Note how every line of the begin body gets its own, tiny basic block with an unanalyzable
jump to the rescue block, or to the next line of the body. This implementation wouldn’t have
exhibited the bug in our loop_count example—there would be a jump after the loop_count = 0
assignment into the rescue block, which would have been enough for Sorbet to infer a type of
Integer (regardless of whether that while true bug were fixed method).
But importantly, this original approach was thrown out, almost exactly 9 months
laterAnd only two days after I joined the team 😅
, when the shortcut we’ve been discussing arrived. In fact, a
comment from that commit persists unchanged in the codebase today:
We have a simplified view of the control flow here but in practise it has been reasonable on
our codebase. We don’t model that each expression in the body or else could throw,
instead we model only never running anything in the body, or running the whole thing. To do
this we have a magic Unanalyzable variable at the top of the body using rescueStartTemp
and one at the end of the else using rescueEndTemp which can jump into the rescue handlers.
Why adopt this shortcut approach if it causes bugs like this to happen? Well for starters,
the original approach had an even more insidious bug. Consider this example:
begin x = might_raise()# ...rescue# Sorbet would assume `x` was always setputs(x)end
In this example, the first instruction in the block is an assignment (x = ...). But Sorbet
would only record the jump after the assignment entirely, not between the method call and the
assignment. This meant Sorbet would think that x was unconditionally set, but in fact it’s
not set when might_raise() does actually raise. At the time, Sorbet tripped this bug all
the time on real-world code—there were beta users of Sorbet chiming in on the PR eagerly
waiting for the bug to be fixed. Meanwhile, code that looked like our loop_count example
either did not exist or was simply rewrittenThere’s an easy workaround: use T.let to pin the type of the variable outside the begin
block.
to avoid the bug.
But this still doesn’t quite paint the full picture. I’ve told you, “There was a bug, and Sorbet
fixed it by introducing another bug.” Which leads us to out second point: having a lot of tiny,
jumpy basic blocks is slow to typecheck. There are lots of reasons:
After building a CFG, Sorbet does a bunch of post-processing on it. For example, it computes
which variables each basic block reads and writes, tries to dealias variable writes, and
merges adjacent blocks if possible. When there are fewer basic blocks, these post-processing
steps run faster to begin with, and are more likely to drastically shrink the CFG size
(making type inference run faster).
When typechecking a method, every basic block in the CFG requires its own Environment
data structure, which maps variables to their types within that block. Sorbet either has to
allocate a separate Environment per block (what it currently does) or incur complexity
from making them copy-on-write. Allocating is slow and we really
like to avoid complexity.
Before typechecking a basic block, Sorbet has to merge the environments of all incoming
blocks. Merging environments involves doing slow type checking operations, like checking
whether two possibly-arbitrary types are subtypes of each other and allocating union types.
Not only are there more basic blocks, there are more instructions! Each <exn-value>
variable has to be populated before Sorbet can branch on it, and that each tiny basic block
has not only the single instruction inside it, but also an extra instruction to initialize
the <exn-valu> variable. That basically doubles the number of instructions Sorbet emits for
a begin block, making more work for type inference.
The new “only before and after” shortcut is pretty clever. In practice, it models the case when
even the very first assignment might raise, while generating far fewer basic blocks, thus
running much faster.
The bigger picture
There are a lot of these clever “good enough” tricks in Sorbet. Many of them are only possible
because of the stakes: Sorbet already allows T.untyped, so depending on your viewpoint,
either:
There is already the holy grail of all hacks in the type system, so why not cut more corners
when it makes sense.
BecauseT.untyped is in the type system, at least the user can use T.untyped to not be
completely blocked from writing the code they need to write.
Either way, in some sense the stakes are low. In a compiler where the stakes for being wrong
are higher (the code computes the wrong answer), maybe shortcuts aren’t the best idea. And in
fact, multipledistinctexceptionchanges landed in Sorbet’s
CFG code to support the Sorbet Compiler.
It’s now been over four years since we shipped the change to model rescue this way. I’m not
aware of a single incident caused by this shortcut, and I can only even remember explaining
this behavior to a confused Sorbet user twice. I can’t find any performance numbers from when
the original change landed, but we can still put it into perspective: it’s the difference
between a handful of people being confused over the course of 4 years, or thousands of people
getting faster typechecking results thousands of times per day. Seems like a reasonable
trade-off.