I recently learned that linkers are really cool. It all started when I saw an error message that looked something like this:
I already wrote about finding where this error was coming from. The tl;dr is that it was coming from GNU’s libc implementation:
That led me to a fun exploration of how linux linkers work, and how Ruby C extensions rely on them.
I always knew that Ruby C extensions existed (that they break all the time is a constant reminder…) but I never really connected the dots between “here’s some C code” and how Ruby actually runs that code.
Ruby C extensions are just shared libraries following certain conventions. Specifically, a Ruby C extension might look like this:
The important part is that the name of that
matters. When Ruby sees a line like
it looks for a file called
my_lib.bundle on macOS),
asks the operating system to load that file as a shared library, and
then looks for a function with the name
Init_my_lib inside the library
it just loaded.
When that function runs, it’s a chance for the C extension to do
the same sorts of things that a normal Ruby file might have done if it
require’d. In this example, it defines a method
foo at the
top level, almost like the user had written normal Ruby code like this:
That’s kind of wild! That means:
- C programs can load libraries dynamically at runtime, using arbitrary user input.
- C programs can then ask if there’s a function defined in that library with an arbitrary name, and get a function pointer to call it if there is!
I was pretty shocked to learn this, because my mental model of how linking worked was that it split evenly into two parts:
“My application is statically linked, where all the code and libraries my application depends on are compiled into my binary.”
“My application is dynamically linked, which means my binary pre-declares some libraries that must be loaded before my program can start running.”
There’s actually a third option!
Then I looked into what code Ruby actually calls to do this. I found the
Ruby uses the
dlopen(3) function in libc to request that an arbitrary
user library be loaded. From the man page:
The function dlopen() loads the dynamic shared object (shared library) file named by the null-terminated string filename and returns an opaque “handle” for the loaded object.
— man dlopen
The next thing Ruby does with this opaque
handle is to find if the
thing it just loaded has an
Init_<...> function inside it:
dlsym(3) (again in libc) to look up a method with an arbitrary
buf) inside the library it just opened (
handle). That function
must exist—if it doesn’t, it’s not a valid Ruby C extension and Ruby
reports an error.
dlsym found a function with the right name, it stores a function
init_fct, which Ruby immediately dereferences and calls:
It’s still kind of mind bending to think that C provides this level of “dynamism.” I had always thought that being a compiled language meant that the set of functions a C program could call was fixed at compile time, but that’s not true at all!
This search led me down a rabbit hole of learning more about linkers, and now I think they’re super cool—and far less cryptic! I highly recommend Chapter 7: Linking from Computer Systems: A Programmer’s Perspective if this was interesting to you.