Conventional knowledge is that union types are common and intersection types are rare. But actually that’s not the case—intersection types show up in nearly every program Sorbet type checks thanks to control flow.
Union types in
Sorbet are incredibly common, which should be no surprise. In Sorbet,
T.nilable(...) is sugar for
T.any(NilClass, ...). T.nilable shows up all
over the place and probably catches more bugs than any other feature in
Sorbet.
Sorbet also has intersection
types. While union types mean “either this or that,” intersection
types mean “both this and that.” On first glance, intersection types
seem like some super niche feature which only benefits a handful of
programs. In Stripe’s Ruby monorepo, the strings T.any and
T.nilable occur nearly 300 times more than
T.all does.
But those numbers hide something critical: intersection types power Sorbet’s control flow-sensitive typing. They’re actually present in every Ruby program, but just a little hard to spot. Let’s look at how pervasive they are with a few examples:
# typed: strict
extend T::Sig
class Parent; end
class Child < Parent; end
sig {params(x: Parent).void}
def example1(x)
case x
when Child
T.reveal_type(x) # Revealed type: `Child`
end
endHere x starts out having type Parent, but
inside the case statement Sorbet treats x as
having the more specific type Child. There’s no
T.all in sight, but that’s because it’s hiding. Sorbet
doesn’t just throw away the fact that it knew
x <: Parent. Instead, it uses T.all to
update its type for x to
T.all(Parent, Child).
T.all(Parent, Child) is equivalent to Child
because Child is a subtype of Parent. If types
represent sets of values, then the set of values represented by
Child is a subset of the set of values represented by
Parent, so the intersection of those two sets would just
leave Child.If you’re not convinced, consider: with
T.all(Parent, Child) we should be able to call all the
methods on Parent and all the methods on
Child. But Child inherits
Parent’s methods, so any method Parent has
will already be on Child. So Child is
equivalently good as T.all(Parent, Child).
Sorbet attempts to simplify a large type to a smaller, equivalent type when it can for two reasons:
Usability – Most users don’t know that
T.allmeans “intersection type” or even what intersection types are. (And even those who do still end up drawing Venn diagrams from time to time!) It only gets more complicated whenT.alls andT.anys nest inside each other.Meanwhile,
Childis a super easy type to understand, and leads to nice, short error messages.Performance – Checking whether one type is a subtype of another is a super common operation, so it has to be fast. By collapsing
T.all(Parent, Child)toChild, Sorbet does at least half as much work when checking subtyping (probably more, because of some common path optimizations).When this simplification happens, Sorbet even skips an allocation entirely. Cnstructing
T.all(Parent, Child)in Sorbet short circuits and returns a reference to the already allocatedChildtype that was passed as an argument (with ownership tracked viastd::shared_ptrIf you’re looking for a good intro to modern C++ things likeshared_ptr, I can’t recommend this blog post series enough.
).
Let’s look at another example of control flow:
# typed: strict
extend T::Sig
class A; end
class B; end
sig {params(a_or_b: T.any(A, B)).void}
def example2(a_or_b)
case a_or_b
when A
T.reveal_type(a_or_b) # Revealed type: `A`
end
endThis example method accepts either A or B
(T.any(A, B)) and then branches on whether
a_or_b is an instance of A. Again: Sorbet
doesn’t throw away that it knows a_or_b <: T.any(A, B).
Instead it updates its knowledge of the type of a_or_b
using T.all to get T.all(T.any(A, B), A).
Realizing that this is equivalent to A is a bit
trickier:
T.all(T.any(A, B), A)
# Distribute
T.any( T.all(A, A) , T.all(B, A) )
# T.all(A, A) is just A (idempotence)
T.any( A , T.all(B, A) )
# A and B are classes (not mixins) and neither inherits the other.
# It's impossible to have a value of that type, so it's bottom:
T.any( A , T.noreturn )
# bottom is the identity of union
AYou can start to see how usability and performance and might get a little out of hand if Sorbet didn’t keep attempting to simplify things! The cumulative effect of all the control flow in a program would result in huge, unweidly types.
Until now you could claim that I’ve been hyping up intersection types as the solution to problems that were self-imposed. That if we just invented some other method for modeling control flow, it would have been naturally usable or naturally performant, and we wouldn’t have had problems in the first place. So next let’s look at some examples to see why intersection types really are the most natural solution:
module I1
def foo1; end
end
module I2
def foo2; end
end
sig {params(x: I1).void}
def example3(x)
x.foo1 # Works outside
case x
when I2
x.foo1 # Should (and does) still work inside
x.foo2
end
endUnlike in the other examples, this is the first example where had we
tried to implement control-flow-sensitive typing by throwing away the
old type and using the new type instead it wouldn’t have worked. The key
thing to notice: this example uses modules. Outside the
case of course calling x.method_from_1 works
because x starts out at type I1. But if we
treated x as only I2 inside the
when I2, we’d start reporting an error for calling
x.method_from_1 because it doesn’t exist on
I2.
Unlike intersecting unrelated classes (our T.all(B, A)
example from earlier), intersecting unrelated modules does’t collapse to
T.noreturn. There’s nothing stopping some class from
including both I1 and I2. Instances of that
class would be values of type T.any(I1, I2):
class SomeClass
include I1
include I2
end
# This type assertion is okay:
T.let(SomeClass.new, T.all(I1, I2))So at least for implementing certain cases of flow sensitive typing, we’ll need intersection types anyways. Then for these certain cases we’d incur the usability and performance problems we discovered earlier and have to solve them.
But more than that, intersection types are fundamentally easier to work with compared to some ad hoc approach to flow sensitive typing. Type system bugs are weird. It’s frquently harder to figure out whether the current behavior is buggy in the first place than it is to find the cause!
In that light, intersection types present an elegant, robust model for arriving at what the correct behavior should be, independent of what Sorbet’s existing behavior is. It’s clear how intersection types interact with union types, and with subtyping, and with generics, and with variance, etc.
By repurposing intersection types to model control flow sensitivity, when things go wrong there’s a framework for discovering what’s right.
(Speaking of repurposing, intersection types also play an important role in how Sorbet suggests potential method signatures! That’s three birds with one stone.)