People always ask me, “Why does Sorbet think this is nil? I just checked that it’s not!” So much so, that it’s at the very top of the Sorbet FAQ
That doc answers what’s happening and how to fix it, but it doesn’t really answer why it behaves this way. A common follow up question looks something like this:
Having to use local variables as mentioned in Sorbet’s limitations of flow-sensitivity docs is annoying. Idiomatic Ruby doesn’t use local variables nearly as much as Sorbet requires. What gives?
TL;DR: Sorbet’s type inference algorithm requires being given a fixed data structure that models control flow inside a method. Type inference doesn’t get to change that structure, so the things Sorbet learns while from inference don’t retroactively change Sorbet’s view of control flow. (This is in line with the other popular type systems for dynamically typed languages.) As a result control flow must be a function of local syntax alone (variables), not global nor semantic information (methods).
But that’s packing a lot in at once, so let’s take a step back.
In this post whenever I say type inference I basically mean assigning types to variables, and using the types of variables to resolve calls to methods. Type inference in Sorbet needs two things:
A symbol table, which maps names to global program definitions (like classes and methods, and their types). Sorbet spends a ton of time building a symbol table representing an entire codebase before it ever starts running type inference.
A control flow graph, which is a data structure that models the flow of control through a single method. Sorbet builds these graphs on the fly right before running type inference.
Since type inference requires the control flow graph, clearly building the control flow graph can’t require type inference. Instead, it has to build a control flow graph using only the method’s abstract syntax tree (or AST). Since all Sorbet has is an AST, the control flow only reflects syntax-only observations, like “these two variables are the same” and “an if condition branches on the value of this variable.” Sorbet can draw these observations exclusively from the syntactic structure of the current method, with no need to consult the symbol table, let alone run inference.
This brings us to our central conflict: knowing which method (or methods!) a given call site resolves to is not a syntactic property. Consider this snippet:
if [true, false].sample
= 0
x else
= nil
x end
.even? x
The meaning of x.even?
depends on the type of
x
, which depends on the earlier control flow in the method.
That means that if a program branches on a method return
value, Sorbet cannot draw any interesting observations about
control flow.
This gets to be a problem for methods whose meaning involves some
claim like, “I always return the same thing every time I’m called.”
Sorbet can’t know whether x.foo
refers to one of those
constant methods or a method that returns a random number every time, so
it has to assume the worst.
Here’s a pathological example:
Note the two calls to x.foo
at the very end of the
snippet:
- Knowing whether the second call to
x.foo
is non-nil requires knowing whetherx.foo
returns the same thing across subsequent calls. - Knowing that requires knowing whether
foo
refers to anattr_accessor
method or some other method. - Knowing that requires knowing the type of
x
. - Knowing that requires understanding the control flow in the method.
- So we can’t make understanding the control flow in the method
require knowing whether the second call to
x.foo
returns the same thing.- But we can make it require knowing whether a variable has been assigned to between two variable accesses.
Properties and attributes in other languages
Unfortunately, this all means that Sorbet can only track control flow-sensitive types on variables, not methods. This is the exact same limitation that other popular gradual type checkers except for one difference: both JavaScript and Python make a syntactic distinction between method calls (which have parentheses) versus property/attribute access (which don’t):
.foo # <- syntactically a property (JS) or attribute (Python)
x.foo() # <- syntactically a method call x
In Ruby, both x.foo
and
x.foo()
correspond to method calls,This is true even if foo
was defined with
attr_reader :foo
!
so Sorbet models them as such. But in TypeScript, Flow,
and Mypy,And maybe other control-flow sensitive type systems,
too. Feel free to send me more examples.
that small, syntactic difference is enough to allow
treating properties and attributes different from methods.
→
View example in TypeScript Playground
→
View example in Try Flow
→
View example in mypy Playground
In all the above examples, we see that the type of
variable.property
is aware of control flow, the types of
expression().property
and variable.method()
are not.
Unfortunately, the direct analogue to properties in Ruby are instance
variables like @property
, which have the limitation that
they can can only be accessed inside their owning class. It’s like if
JavaScript only allowed this.property
instead of allowing
the call site to be any arbitrary expression like
x.property
. In Ruby, you can’t write
x.@property
.You can do something similar:
x.instance_variable_get(:@property)
, but again this is a
method, not a property access—someone could have overridden the
.instance_variable_get
method!
If you do use instance variables in Ruby with
Sorbet, they behave comparablyThere’s a known bug in the
implementation at the time of writing, but it occurs somewhat rarely in
practice so we haven’t prioritized fixing it.
to their counterparts in other languages:
Seen from this lens, I think it’s fair to say that Sorbet is doing the best it can with what it has. If you disagree and have a suggestion for how Sorbet could do better, feel free to reach out.
Extra thoughts
It’s maybe worth noting that even the Ruby VM itself cheats a little
here: yes x.foo
is technically a method call, but if that
method was defined via attr_reader
, the Ruby VM has special
handling to make it run much, much faster than had the method been
defined manually. So while you can think of these two things as doing
the same thing, the first one will run much faster:
attr_reader :foo
def foo; @foo; end
I take this to mean that even the Ruby VM itself realizes that there is value in having something property like. It just unfortunately didn’t make it into the language itself.
It’s interesting to imagine a future where Sorbet treats
x.foo
and x.foo()
separately. For example, it
could require that non-constant, nullary methods be
written with trailing ()
even though Ruby doesn’t require
it. Then a follow up change might be able to build on that invariant, to
treat x.foo
like a property access instead of a method
call.
But not only are there some high-level design and low-level technical
problems standing in the way of implementing this right now, there’s
also a social problem: almost every Ruby style guide and linter requires
the opposite, namely that nullary methods never be
called with ()
explicitly. Solving social problems tends to
involve waging holy wars, which is never all that fun.
And to throw another wrench into the picture: recent versions of
JavaScript added getters, which allow executing an arbitrary method on
property access. Python has had computed @property
declarations since version 2.2. Notably, TypeScript, Flow, and mypy
simply do not implement getters the same way as methods, even though
they arguably should for soundness:
→
View example on TypeScript Playground
→
View example on Try Flow
→
View example on mypy Playground
If it were not so common in Ruby for all nullary
methods to be called without ()
, instead of just those
defined with attr_reader
or something similar, maybe Sorbet
could have chosen the same trade-off.