Another Look at Typed Array Access

Disclaimer: this post was first drafted as a Stripe-internal email. On December 10, 2022 I republished it here, largely unchanged from the original. See Some Old Sorbet Compiler Notes for more. The Sorbet Compiler is still largely an experimental project: this post is available purely for curiosity’s sake.

Any benchmark numbers included in this post are intended to be educational about how the Sorbet Compiler approaches speeding up code. They should not be taken as representative or predictive of any real-world workload, and are likely out-of-date with respect to improvements that have been made since this post originally appeared.

Last week in Types Make Array Access Faster we compared the Ruby VM’s performance on array accesses with the Sorbet Compiler’s performance on array accesses, as an example of how making types available to the Sorbet Compiler let it speed up code. The snippet under scrutiny was basically this operation:

xs[0]

but repeated many (10M) times to make the performance difference obvious.

The data we collected looked like this:

benchmark	interpreted	compiled	interpreted, minus while	compiled, minus while	compiler speedup, w/o while
while_10_000_000.rb	0.205s	0.048s	—	—	—
untyped_array_aref.rb	0.282s	0.174s	0.077s	0.126s	0.61x
typed_array_aref.rb	0.282s	0.061s	0.077s	0.013s	5.92x

And our ultimate conclusion was:

With type information, Sorbet-compiled code is even faster than both the interpreted code and the compiled but untyped code.

But there was an interesting caveat along the way:

The array access operation is actually slower than the Ruby VM if Sorbet doesn’t have type information (0.61x speedup is less than 1, so it’s a slowdown).

The idea was that for our plain xs[0] program, the compiler was actually slower than the interpreter.

Why was the compiler slower?

It turns out that array access is one of the operations the Ruby VM is already pretty good at, because it’s special cased. We can check this looking at the bytecode instructions that the Ruby VM uses to evaluate an array access:

❯ ruby --dump=insns -e 'xs = []; xs[0]'
== disasm: #<ISeq:<main>@-e:1 (1,0)-(1,14)> (catch: FALSE)
local table (size: 1, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1@-1, kwrest: -1])
[ 1] xs@0
0000 newarray                     0                                   (   1)[Li]
0002 setlocal_WC_0                xs@0
0004 getlocal_WC_0                xs@0
0006 putobject_INT2FIX_0_
0007 opt_aref                     <callinfo!mid:[], argc:1, ARGS_SIMPLE>, <callcache>
0010 leave

Here’s how to read this output:

We used the special --dump=insns flag to the ruby command line. You can try this at home!
Theres some stuff we don’t need on the first few lines, and then the bytecode instructions start with the line reading 0000.
The actual instruction that corresponds to the xs[0] instruction happens at index 0007. The name of the instruction is opt_aref.

That’s interesting! Instead of treating array access like any other method call,Did you know that square brackets are just a method call in Ruby?

it treats it as a special, optimized instruction called opt_aref. Checking the implementation of that instruction, we find that the optimization only works if the method receiver (xs in this case) is exactly an instance of the Array or Hash class.

In other words, it’s easy to defeat this optimization by subclassing Array:

class MyArray < Array
end

xs = MyArray.new([2])
xs[0]

In this case, since xs is not exactly Array or Hash anymore, the optimization won’t apply, and the Ruby VM falls back to calling a method named [] on xs with argument 0.

We can see the effect of this by writing another Sorbet compiler benchmark, and adding it to our table:

benchmark	interpreted	compiled	interpreted, minus while	compiled, minus while	compiler speedup, w/o while
while_10_000_000.rb	0.205s	0.048s	—	—	—
untyped_array_aref.rb	0.282s	0.174s	0.077s	0.126s	0.61x
typed_array_aref.rb	0.282s	0.061s	0.077s	0.013s	5.92x
untyped_array_subclass_aref.rb	0.388s	0.172s	0.183s	0.124s	1.48x

By changing the untyped Array to an untyped subclass of Array, the interpreter slows downEditing note: These numbers are unchanged from when I first measured in September 2020. They do not necessarily reflect the Sorbet Compiler’s current performance.

an extra 0.106ms, but our compiled version doesn’t care whether it was the Array case or MyArray case, because they’re both untyped.

Now that the Ruby VM hasn’t effectively special cased our benchmark, the compiler starts to shine! This is another reason why we’re really optimistic about the impact of the compiler. Our initial plans were to speed up typed code, and count on other teams adding types everywhere. While adding types definitely helps (look at that 5.92x speedup!), the compiler can still speed up certain kinds of untyped code, too.