People always ask me, “Why does Sorbet think this is nil? I just checked that it’s not!” So much so, that it’s at the very top of the Sorbet FAQ
That doc answers what’s happening and how to fix it, but it doesn’t really answer why it behaves this way. A common follow up question looks something like this:
Having to use local variables as mentioned in Sorbet’s limitations of flow-sensitivity docs is annoying. Idiomatic Ruby doesn’t use local variables nearly as much as Sorbet requires. What gives?
TL;DR: Sorbet’s type inference algorithm requires being given a fixed data structure that models control flow inside a method. Type inference doesn’t get to change that structure, so the things Sorbet learns while from inference don’t retroactively change Sorbet’s view of control flow. (This is in line with the other popular type systems for dynamically typed languages.) As a result control flow must be a function of local syntax alone (variables), not global nor semantic information (methods).
But that’s packing a lot in at once, so let’s take a step back.
In this post whenever I say type inference I basically mean assigning types to variables, and using the types of variables to resolve calls to methods. Type inference in Sorbet needs two things:
A symbol table, which maps names to global program definitions (like classes and methods, and their types). Sorbet spends a ton of time building a symbol table representing an entire codebase before it ever starts running type inference.
A control flow graph, which is a data structure that models the flow of control through a single method. Sorbet builds these graphs on the fly right before running type inference.
Since type inference requires the control flow graph, clearly building the control flow graph can’t require type inference. Instead, it has to build a control flow graph using only the method’s abstract syntax tree (or AST). Since all Sorbet has is an AST, the control flow only reflects syntax-only observations, like “these two variables are the same” and “an if condition branches on the value of this variable.” Sorbet can draw these observations exclusively from the syntactic structure of the current method, with no need to consult the symbol table, let alone run inference.
This brings us to our central conflict: knowing which method (or methods!) a given call site resolves to is not a syntactic property. Consider this snippet:
if [true, false].sample
= 0
x else
= nil
x end
.even? x
The meaning of x.even?
depends on the type of x
, which depends on the earlier control flow in the method. That means that if a program branches on a method return value, Sorbet cannot draw any interesting observations about control flow.
This gets to be a problem for methods whose meaning involves some claim like, “I always return the same thing every time I’m called.” Sorbet can’t know whether x.foo
refers to one of those constant methods or a method that returns a random number every time, so it has to assume the worst.
Here’s a pathological example:
Note the two calls to x.foo
at the very end of the snippet:
- Knowing whether the second call to
x.foo
is non-nil requires knowing whetherx.foo
returns the same thing across subsequent calls. - Knowing that requires knowing whether
foo
refers to anattr_accessor
method or some other method. - Knowing that requires knowing the type of
x
. - Knowing that requires understanding the control flow in the method.
- So we can’t make understanding the control flow in the method require knowing whether the second call to
x.foo
returns the same thing.- But we can make it require knowing whether a variable has been assigned to between two variable accesses.
Properties and attributes in other languages
Unfortunately, this all means that Sorbet can only track control flow-sensitive types on variables, not methods. This is the exact same limitation that other popular gradual type checkers except for one difference: both JavaScript and Python make a syntactic distinction between method calls (which have parentheses) versus property/attribute access (which don’t):
.foo # <- syntactically a property (JS) or attribute (Python)
x.foo() # <- syntactically a method call x
In Ruby, both x.foo
and x.foo()
correspond to method calls,This is true even if foo
was defined with attr_reader :foo
!
so Sorbet models them as such. But in TypeScript, Flow, and Mypy,And maybe other control-flow sensitive type systems, too. Feel free to send me more examples.
that small, syntactic difference is enough to allow treating properties and attributes different from methods.
→ View example in TypeScript Playground
→ View example in Try Flow
→ View example in mypy Playground
In all the above examples, we see that the type of variable.property
is aware of control flow, the types of expression().property
and variable.method()
are not.
Unfortunately, the direct analogue to properties in Ruby are instance variables like @property
, which have the limitation that they can can only be accessed inside their owning class. It’s like if JavaScript only allowed this.property
instead of allowing the call site to be any arbitrary expression like x.property
. In Ruby, you can’t write x.@property
.You can do something similar: x.instance_variable_get(:@property)
, but again this is a method, not a property access—someone could have overridden the .instance_variable_get
method!
If you do use instance variables in Ruby with Sorbet, they behave comparablyThere’s a known bug in the implementation at the time of writing, but it occurs somewhat rarely in practice so we haven’t prioritized fixing it.
to their counterparts in other languages:
Seen from this lens, I think it’s fair to say that Sorbet is doing the best it can with what it has. If you disagree and have a suggestion for how Sorbet could do better, feel free to reach out.
Extra thoughts
It’s maybe worth noting that even the Ruby VM itself cheats a little here: yes x.foo
is technically a method call, but if that method was defined via attr_reader
, the Ruby VM has special handling to make it run much, much faster than had the method been defined manually. So while you can think of these two things as doing the same thing, the first one will run much faster:
attr_reader :foo
def foo; @foo; end
I take this to mean that even the Ruby VM itself realizes that there is value in having something property like. It just unfortunately didn’t make it into the language itself.
It’s interesting to imagine a future where Sorbet treats x.foo
and x.foo()
separately. For example, it could require that non-constant, nullary methods be written with trailing ()
even though Ruby doesn’t require it. Then a follow up change might be able to build on that invariant, to treat x.foo
like a property access instead of a method call.
But not only are there some high-level design and low-level technical problems standing in the way of implementing this right now, there’s also a social problem: almost every Ruby style guide and linter requires the opposite, namely that nullary methods never be called with ()
explicitly. Solving social problems tends to involve waging holy wars, which is never all that fun.
And to throw another wrench into the picture: recent versions of JavaScript added getters, which allow executing an arbitrary method on property access. Python has had computed @property
declarations since version 2.2. Notably, TypeScript, Flow, and mypy simply do not implement getters the same way as methods, even though they arguably should for soundness:
→ View example on TypeScript Playground
→ View example on Try Flow
→ View example on mypy Playground
If it were not so common in Ruby for all nullary methods to be called without ()
, instead of just those defined with attr_reader
or something similar, maybe Sorbet could have chosen the same trade-off.