Another Look at Typed Array Access

Disclaimer: this post was first drafted as a Stripe-internal email. On December 10, 2022 I republished it here, largely unchanged from the original. See Some Old Sorbet Compiler Notes for more. The Sorbet Compiler is still largely an experimental project: this post is available purely for curiosity’s sake.

Any benchmark numbers included in this post are intended to be educational about how the Sorbet Compiler approaches speeding up code. They should not be taken as representative or predictive of any real-world workload, and are likely out-of-date with respect to improvements that have been made since this post originally appeared.

Last week in Types Make Array Access Faster we compared the Ruby VM’s performance on array accesses with the Sorbet Compiler’s performance on array accesses, as an example of how making types available to the Sorbet Compiler let it speed up code. The snippet under scrutiny was basically this operation:

xs[0]

but repeated many (10M) times to make the performance difference obvious.

The data we collected looked like this:

benchmark interpreted compiled interpreted,
minus while
compiled,
minus while
compiler speedup,
w/o while
while_10_000_000.rb 0.205s 0.048s
untyped_array_aref.rb 0.282s 0.174s 0.077s 0.126s 0.61x
typed_array_aref.rb 0.282s 0.061s 0.077s 0.013s 5.92x

And our ultimate conclusion was:

With type information, Sorbet-compiled code is even faster than both the interpreted code and the compiled but untyped code.

But there was an interesting caveat along the way:

The array access operation is actually slower than the Ruby VM if Sorbet doesn’t have type information (0.61x speedup is less than 1, so it’s a slowdown).

The idea was that for our plain xs[0] program, the compiler was actually slower than the interpreter.

Why was the compiler slower?

It turns out that array access is one of the operations the Ruby VM is already pretty good at, because it’s special cased. We can check this looking at the bytecode instructions that the Ruby VM uses to evaluate an array access:

❯ ruby --dump=insns -e 'xs = []; xs[0]'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,14)> (catch: FALSE)
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] xs@0
0000 newarray                     0                                   (   1)[Li]
0002 setlocal_WC_0                xs@0
0004 getlocal_WC_0                xs@0
0006 putobject_INT2FIX_0_
0007 opt_aref                     <callinfo!mid:[], argc:1, ARGS_SIMPLE>, <callcache>
0010 leave

Here’s how to read this output:

That’s interesting! Instead of treating array access like any other method call,Did you know that square brackets are just a method call in Ruby?

it treats it as a special, optimized instruction called opt_aref. Checking the implementation of that instruction, we find that the optimization only works if the method receiver (xs in this case) is exactly an instance of the Array or Hash class.

In other words, it’s easy to defeat this optimization by subclassing Array:

class MyArray < Array
end

xs = MyArray.new([2])
xs[0]

In this case, since xs is not exactly Array or Hash anymore, the optimization won’t apply, and the Ruby VM falls back to calling a method named [] on xs with argument 0.

We can see the effect of this by writing another Sorbet compiler benchmark, and adding it to our table:

benchmark interpreted compiled interpreted,
minus while
compiled,
minus while
compiler speedup,
w/o while
while_10_000_000.rb 0.205s 0.048s
untyped_array_aref.rb 0.282s 0.174s 0.077s 0.126s 0.61x
typed_array_aref.rb 0.282s 0.061s 0.077s 0.013s 5.92x
untyped_array_subclass_aref.rb 0.388s 0.172s 0.183s 0.124s 1.48x

By changing the untyped Array to an untyped subclass of Array, the interpreter slows downEditing note: These numbers are unchanged from when I first measured in September 2020. They do not necessarily reflect the Sorbet Compiler’s current performance.

an extra 0.106ms, but our compiled version doesn’t care whether it was the Array case or MyArray case, because they’re both untyped.

Now that the Ruby VM hasn’t effectively special cased our benchmark, the compiler starts to shine! This is another reason why we’re really optimistic about the impact of the compiler. Our initial plans were to speed up typed code, and count on other teams adding types everywhere. While adding types definitely helps (look at that 5.92x speedup!), the compiler can still speed up certain kinds of untyped code, too.