Union types are powerful yet often overlooked. At work, I’ve been using Flow which thankfully supports union types. But as I’ve refactored more of our code to use union types, I’ve noticed that our bundle size has been steadily increasing!
In this post, we’re going to explore why that’s the case. We’ll start with a problem which union types can solve, flesh out the problem to motivate why union types are definitely the solution, then examine the resulting cost of introducing them. In the end, we’ll compare Flow to other compile-to-JS languages on the basis of how they represent union types in the compiled output. I’m especially excited about Reason, so we’ll talk about it the most.
Setup: Union Types in a React Component
Let’s consider we’re writing a simple React 2FA2FA: two-factor authentication
modal. We’ll be using Flow, but you can pretend it’s
TypeScript if you want. The mockup we were given looks like this:
In this mockup:
- There’s a loading state while we send the text message.
- We’ll show an input for the code after the message is sent.
- There’s no failure screen (it hasn’t been drawn up yet).
We’ll need some way for our component to know which of the three screens is visible. Let’s use a union type in Flow:
=
type Screen | 'LoadingScreen'
| 'CodeEntryScreen'
| 'SuccessScreen';
Union types are a perfect fit! 🎉 Union types document intent and can help guard against mistakes. Fellow developers and our compiler can know “these are all the cases.” In particular, Flow can warn us when we’ve forgotten a case.
Our initial implementation is working great. After sharing it with the team, someone suggests adding a “cancel” button in the top corner. It doesn’t make sense to cancel when the flow has already succeeded, so we’ll exclude it from the last screen:
No problem: let’s write a function called
needsCancelButton
to determine if we need to put a cancel
button in the header of a particular screen:
const needsCancelButton = (screen: Screen): boolean => {
// Recall: 'SuccessScreen' is the last screen,
// so it shouldn't have a cancel button.
return screen !== 'SuccessScreen';
; }
Short and sweet. 👌 Everything seems to be working great, until…
switch
:
Optimizing for Exhaustiveness
The next day, we get some updated mocks from the design team. This time, they’ve also drawn up a “failure” screen for when the customer has entered the wrong code too many times:
We can handle this—we’ll just add a case to our Screen
type:
=
type Screen | 'LoadingScreen'
| 'CodeEntryScreen'
| 'SuccessScreen'
// New case to handle too many wrong attempts:
| 'FailureScreen';
But now there’s a bug in our
needsCancelButton
function. 😧 We should only show a close
button on screens where it makes sense, and 'FailureScreen'
is not one of those screens. Our first reaction after discovering the
bug would be to just blacklist 'FailureScreen'
too:
const needsCancelButton = (screen: Screen): boolean => {
return (
screen !== 'SuccessScreen' ||
screen !== 'FailureScreen'
;
); }
But we can do better than just fixing the current bug. We should write code so that when we add a new case to a union type, our type checker alerts us before a future bug even happens. What if instead of a silent bug, we got this cheery message from our type checker?
Hey, you forgot to add a case to
needsCancelButton
for the new screen you added. 🙂— your friendly, neighborhood type checker
Let’s go back and rewrite needsCancelButton
so that it
will tell us this when adding new cases. We’ll use a
switch
statement with something special in the
default
case:
const impossible = <T>(x: empty): T => {
throw new Error('This case is impossible.');
}
const needsCancelButton = (screen: Screen): boolean => {
switch (screen) {
case 'LoadingScreen':
return true;
case 'CodeEntryScreen':
return true;
case 'SuccessScreen':
return false;
default:
// (I named this function 'absurd' in my earlier post:
// https://blog.jez.io/flow-exhaustiveness/)
// This function asks Flow to check for exhaustiveness.
//
// [flow]: Error: Cannot call `impossible` with `screen` bound to `x` because string literal `FailureScreen` [1] is incompatible with empty [2].
return impossible(screen);
} }
Now Flow is smart enough to give us an error! Making our code safer,
one switch
statement at a time. 😅 Union types in Flow are
a powerful way to use types to guarantee correctness. But to get the
most out of union types, always“Always” is a very strong statement. Please use your
best judgement. But know that if you’re not using a switch
,
you’re trading off the burden of exhaustiveness & correctness from
the type checker to the programmer!
access them through a switch
statement.
Every time we use a union type without an exhaustive switch statement,
we make it harder for Flow to tell us where we’ve missed something.
Correctness, but at what cost?
You might not have noticed, but we paid a subtle cost in rewriting
our needsCancelButton
function. Let’s compare our two
functions:
// ----- before: 62 bytes (minified) -----
const needsCancelButton = (screen) => {
return screen !== 'SuccessScreen';
;
}
// ----- after: 240 bytes (minified) -----
const impossible = (x) => {
throw new Error('This case is impossible.');
;
}
const needsCancelButton = (screen) => {
switch (screen) {
case 'LoadingScreen':
return true;
case 'CodeEntryScreen':
return true;
case 'SuccessScreen':
return false;
default:
return impossible(screen);
}; }
With just an equality check, our function was small: 62 bytes
minified. But when we refactored to use a switch
statement,
its size shot up to 240 bytes! That’s a 4x increase, just to get
exhaustiveness. Admittedly, needsCancelButton
is a bit of a
pathological case. But in general: as we make our code bases
more safe using Flow’s union types of string literals,
our bundle size bloats!
Types and Optimizing Compilers
One of the many overlooked promises of types is the claim that by writing our code with higher-level abstractions, we give more information to the compiler. The compiler can then generate code that captures our original intent, but as efficiently as possible.
Flow is decidedly not a compiler: it’s only a type
checker. To run JavaScript annotated with Flow types, we first strip the
types (with something like Babel). All information about the types
vanishes when we run the code.Even though TypeScript defines both a language
and a compiler for that language, in practice it’s not
much different from Flow here. A goal of the TypeScript compiler is to
generate JavaScript that closely resembles the original TypeScript, so
it doesn’t do compile-time optimizations based on the types.
What can we achieve if we were to keep the types
around all the way through compilation?
Reason (i.e., ReasonML) is an exciting effort to bring all the benefits of the OCaml tool chain to the web. In particular, Reason works using OCaml’s mature optimizing compiler alongside BuckleScript (which turns OCaml to JavaScript) to emit great code.
To see what I mean, let’s re-implement our Screen
type
and needsCancelButton
function, this time in Reason:
screen =
type | LoadingScreen
| CodeEntryScreen
| SuccessScreen;
let needsCancelButton = (screen: screen): bool => {
switch (screen) {
| LoadingScreen => true;
| CodeEntryScreen => true;
| SuccessScreen => false;
}; }
Looks pretty close to JavaScript with Flow types, doesn’t it? The
biggest difference is that the case
keyword was replaced
with the |
character. Making the way we define and use
union types look the same is a subtle reminder to always pair union
types with switch
statements! More than being a nice reminder, it makes it easy to
copy / paste our type definition as boilerplate to start writing a new
function!
Another difference: Reason handles exhaustiveness
checking out of the box. 🙂
What does the Reason output look like?
// Generated by BUCKLESCRIPT VERSION 3.0.1, PLEASE EDIT WITH CARE
'use strict';
function needsCancelButton(screen) {
if (screen >= 2) {
return false;
else {
} return true;
} }
(Play with it on Try Reason →)
Not bad! Telling Reason that our function was exhaustive let it
optimize the entire switch
statement back down to a single
if
statement. In fact, it gets even better: when we run
this through uglifyjs
, it removes the redundant
true
/ false
:
"use strict";
function needsCancelButton(n){
return !(n>=2)
}
Wow! This is actually better than our initial,
hand-written if
statement. Reason compiled what used to be
a string literal 'SuccessScreen'
to just the number
2
. Reason can do this safely because custom-defined types
in Reason aren’t strings, so it doesn’t matter if the
names get mangled.
Taking a step back, Reason’s type system delivered on the promise of types in a way Flow couldn’t:
- We wrote high-level, expressive code.
- The type checker gave us strong guarantees about our code’s correctness via exhaustiveness.
- The compiler translated that all to tiny, performant output.
I’m really excited about Reason. 😄 It has a delightful type system and is backed by a decades-old optimizing compiler tool chain. I’d love to see more people take advantage of improvements in type systems to write better code!
Appendix: Other Compile-to-JS Runtimes
The above analysis only considered Flow + Babel and Reason. But then I got curious about how other typed languages that compile to JavaScript compare on the optimizations front:
TypeScript
Despite being a language and compiler, TypeScript maintains a goal of compiling to JavaScript that closely resembles the source TypesScript code. TypeScript has three language constructs for working with exhaustiveness:
- union types (identical to the Flow unions that we’ve been talking about),
enum
s, which are sort of like definition a group of variable constants all at once, andconst enum
s which are likeenum
s except that they’re represented more succinctly in the compiled output.
TypeScript’s union type over string literals are represented the same way as Flow, so I’m going to skip (1) and focus instead on (2) and (3).
TypeScript’s enum
and const enum
are subtly
different. Not having used the language much, I’ll refer you to the
TypeScript documentation to learn more about the differences. But
for sure, const enum
s compile much better than normal
enum
s.
Here’s what normal enum
s look like in TypeScript—they’re
even worse than unions of string literals:
var Screen_;
function (Screen_) {
("LoadingScreen"] = 0] = "LoadingScreen";
Screen_[Screen_["CodeEntryScreen"] = 1] = "CodeEntryScreen";
Screen_[Screen_["SuccessScreen"] = 2] = "SuccessScreen";
Screen_[Screen_[|| (Screen_ = {}));
})(Screen_ var impossible = function (x) {
throw new Error('This case is impossible.');
;
}var needsCancelButton = function (screen) {
switch (screen) {
case Screen_.LoadingScreen:
return true;
case Screen_.CodeEntryScreen:
return true;
case Screen_.SuccessScreen:
return false;
default:
return impossible(screen);
}; }
So for normal enum
s:
- It’s not smart enough to optimize away the
impossible
call. - It keeps around a JavaScript object representing the collection of enum values at run time, in a format that doesn’t minify well.
And then here’s what const enum
s look like—you can see
that TypeScript represents them under the hood without any sort of
Screen_
object:
var impossible = function (x) {
throw new Error('This case is impossible.');
;
}var needsCancelButton = function (screen) {
switch (screen) {
case 0 /* LoadingScreen */:
return true;
case 1 /* CodeEntryScreen */:
return true;
case 2 /* SuccessScreen */:
return false;
default:
return impossible(screen);
}; }
- It uses numbers instead of strings.
- It still uses a switch statement, instead of reducing to just an
if
statement.
PureScript
PureScript is another high-level language like Reason. Both Reason and PureScript have data types where we can define unions with custom constructor names. Despite that, PureScript’s generated code is significantly worse than Reason’s.
"use strict";
var LoadingScreen = (function () {
function LoadingScreen() {};
.value = new LoadingScreen();
LoadingScreenreturn LoadingScreen;
;
})()var CodeEntryScreen = (function () {
function CodeEntryScreen() {};
.value = new CodeEntryScreen();
CodeEntryScreenreturn CodeEntryScreen;
;
})()var SuccessScreen = (function () {
function SuccessScreen() {};
.value = new SuccessScreen();
SuccessScreenreturn SuccessScreen;
;
})()var needsCancelButton = function (v) {
if (v instanceof LoadingScreen) {
return true;
;
}if (v instanceof CodeEntryScreen) {
return true;
;
}if (v instanceof SuccessScreen) {
return false;
;
}throw new Error("Failed pattern match at Main line 10, column 1 - line 10, column 39: " + [ v.constructor.name ]);
; }
- It’s generating ES5 classes for each data constructor.
- It compiles pattern matching to a series of
instanceof
checks. - Even though it knows the match is exhaustive, it
still emits a
throw
statement in case the pattern match fails!
Admittedly, I didn’t try that hard to turn on optimizations in the
compiler. Maybe there’s a flag I can pass to get this Error
to go away. But that’s pretty disappointing, compared to how small
Reason’s generated code was!
Elm
I list Elm in the same class as Reason and PureScript. Like the other two, it lets us define custom data types, and will automatically warn when us pattern matches aren’t exhaustive. Here’s the code Elm generates:
var _user$project$Main$needsCancelButton = function (page) {
var _p0 = page;
switch (_p0.ctor) {
case 'LoadingScreen':
return true;
case 'CodeEntryScreen':
return true;
default:
return false;
};
}var _user$project$Main$SuccessScreen = {ctor: 'SuccessScreen'};
var _user$project$Main$CodeEntryScreen = {ctor: 'CodeEntryScreen'};
var _user$project$Main$LoadingScreen = {ctor: 'LoadingScreen'};
- It’s using string literals, much like Flow and TypeScript.
- It’s smart enough to collapse the last case to just use
default
(at least it doesn’tthrow
in thedefault
case!) - The variable names are long, but these would still minify well.
It’s interesting to see that even though Reason, PureScript, and Elm all have ML-style datatypes, Reason is the only one that uses an integer representation for the constructor tags.