Mutation makes typechecking Ruby harder than many other programming
languages. Most people will immediately think I mean mutation in the
sense of x += 1
or something—that’s not what I’m referring
to. In fact, that’s the easy kind of mutation to model in a type
system.
What I mean is that nearly everything worth knowing statically about a Ruby program involves mutation. Defining a class?
class A
end
That mutates the global namespace of constants. After those lines
run, all code in the project can reference the class A
.
Defining a method?
class A
def foo
puts 'hello'
end
end
The method foo
is undefined just before the
def
block (at runtime!), but defined after—mutation
again.
Ruby provides things like attr_reader
and
attr_accessor
to define getter and setter methods:
class B
attr_reader :foo
end
attr_reader
is not a Ruby keyword, contrary to popular
belief: it’s a method on the singleton class which takes an argument. It
defines an instance method called foo
as a side effect by
mutating the class B
.
It’s the same for mixing modules into classes:
module M; end
class C
include M
end
include
is another method disguised like a keyword which
mutates the class’s list of ancestors.
One of my least favorite Ruby features: you can redefine (not override) a method:
class D
attr_reader :foo
:old_foo, :foo
alias_method def foo
puts 'Calling D#foo'
old_fooend
end
Because D#foo
is defined by the attr_reader
line, the subsequent def
overwrites it (akin to mutating a
local variable, like x += 1
). Oh and that
alias_method
? Another method looking like a keyword which
mutates the class.
Even the way libraries work in Ruby is powered by mutation:
require 'some_gem'
require
is a method (again, not a keyword) that looks up
and runs arbitrary Ruby code, whose result we discard. It’s only
convention that the primary side effect of the require
’d
code is to mutate the global namespace, defining more classes and
methods.
DSLs and metaprogramming
It would be one thing if Ruby constrained the places where this mutation could occur. But instead, it provides first-class support for these features anywhere Ruby code runs. Everything we’ve seen so far can be hidden behind arbitrary computation at runtime:
- With
Module#const_set
, a Ruby program can compute an arbitrary name and use it to create new constant at runtime. Module#define_method
does the same for methods.- Again
require
is a method, so it can occur wherever other methods are called.
It’s not uncommon to see Ruby libraries embrace this rather than
avoid it (Rails definitely does). Ruby programs frequently build up
large abstractions and do tons of computation which at the end of the
day result in a define_method
or a
const_set
.
Rubyists call this “metaprogramming” or “building DSLs” but I call it like I see it: mutation.
Modeling mutation
Type systems are notoriously bad at modelling this kind of mutation. Look at other typed, object-oriented languages: Java, Scala, C++, … Each of these languages forbids this kind of mutation. (Whether because it’s hard to implement support for it or because they’re making a value judgement is beyond me.)
So how can Sorbet can model this? Mostly, it just cheats. Err, “approximates.” From my experience working on the Sorbet team, I can think of three main ways it cheats.
First, Sorbet assumes that if a class or method might exist, it does
exist, and universally throughout a project.Frequently this assumption is backed up by an
autoloader. For example, Rails includes an autoloader that loads
constants lazily on demand, so that the programmer doesn’t have to
sprinkle require statements throughout the code. But how do autoloads
work? Mutation again 🙂.
It pretends that all include
,
extend
, and alias_method
statements in a class
run first, before all other code at the top-level of that class. It
restricts method redefinitions—the old and new methods must take the
same number and kinds of arguments. And it restricts
alias_method
: you can only alias to a method on your class,
not to a parent class. Sorbet makes no attempt to model
undef_method
at all (another method-not-keyword!).
Second, Sorbet cheats by implementing heuristics for the most common
DSLs. To support attr_reader
, Sorbet says, “Hey, this
method call happens to be to some method named attr_reader
.
I’m not sure if it’s to Module#attr_reader
or to some other
attr_reader
definition or to any definition at all, but
it’s provided with a single Symbol argument, the result is discarded,
and it’s called at the syntactic top-level of a class, so I bet that it
is a call to Module#attr_reader
.” It’s similar for many
other popular DSLs: it makes decent educated guesses.
But after all that, it sort of gives up. Sorbet makes no attempts to
work backwards from a call to define_method
or
const_set
inside a method body to learn that a class or
method might have been defined somewhere. Instead, it cheats one last
time and uses runtime information.
As a part of initializing a Sorbet project, Sorbet
require
s (read: executes) as much code in a project as it
can: all the gems listed in the Gemfile and all the Ruby files in the
current folder. Afterwards, it can see the result of all that’s been
mutated thus far (via reflection) and serialize what it sees into RBI files to convey what it saw
to the static checker. This is still imperfect (it completely misses
things that are defined after require
time), but
empirically it finds most of the remaining undiscovered definitions.
Beyond mutation
Don’t get me wrong, those approximations are really useful and effective. But really, the way Sorbet handles mutation in a codebase is by incentivicing people to get rid of it.
Sorbet can type check a project in seconds, but it takes minutes to re-generate all RBIs files. When Sorbet can see things statically, there’s also a canonical place to write a type annotation for it.
It’s a much better experience to click “Go to Definition” and jump to the actual source definition rather than to an auto-generated RBI file.
And arguably, if it’s easy for Sorbet to understand what’s defined and where, it’s easier for a programmer to understand. Understandable code lets people iterate faster, is less brittle, and harder to break by accident.
Programming languages are tools to change and structure the way we think. In the long run, all code can be changed. We adopt type systems specifically to help guide these changes, which I’ve touched on before. When it comes to mutation in Ruby, Sorbet makes a solid effort to model the helpful parts, while providing guide rails and suggestions to deal with the rest.
Appendix A: By comparison with typed JavaScript
You might say, “the things that you’re talking about aren’t unique to Ruby! It’s the same for all dynamic programming languages!” But is that true in practice?
Let’s compare our Ruby snippets from before with JavaScript.
Ruby:
class A
def self.my_dsl(name)
do; end
define_method(name) end
end
JavaScript:
class A {
static myDsl(name) {
this.prototype[name] = function() {}
} }
First I’ll point out: the mutation becomes way more obvious in the
JavaScript program! But second: both TypeScript and Flow report static
errors on this program. They both complain that there’s no type
annotation declaring that it’s ok to treat this.prototype
as if it were a key-value mapping.
The fact that both Flow and TypeScript report an error here speak to how common this idiom is in practice. It’s not common, and they’d rather not encourage programs like this, so they forbid it.
Here’s another example, first in Ruby:
require 'some_gem'
SomeNamespace::SomeClass.new
And then in JavaScript:
import someNamespace from 'some_package';
new someNamespace.SomeClass();
With no RBI files declaring whether
SomeNamespace::SomeClass
exists or not, Sorbet will report
an error that the class doesn’t exist. But in TypeScript and Flow, the
code is just fine, even if there’s no type declaration file. Both can
still see that whatever vale is imported will be bound to the
someNamespace
variable (even if it’s treated as
any
).
Sorbet is thus forced to come up with ways to generate RBI files for all new projects, because without them Sorbet would be crippled: it would have no way to distinguish between a class name that has actually been typoed vs one that is typed correctly but for which there’s no visible definition. Meanwhile, TypeScript and Flow work completely fine in new codebases out of the box.
So my claim is that: no, these problems are unique to Ruby, because the design of the language and the culture of its use so pervasively promote or require mutation.
Appendix B: More things that are actually mutation
freeze
(ironic: to prevent mutation on a class or object… we mutate it!)private
/private_class_method
(not keywords! These are methods that take a Symbol; it just so happens thatdef foo; end
is an expression that evaluates to the symbol:foo
. Which is why there’s bothprivate
andprivate_class_method
, becausedef self.foo; end
also evaluates to:foo
, soprivate def self.foo; end
would attempt to mark an instance method named:foo
private, even it didn’t exist!)