Anjana Vakil: Immutable data structures for functional JS | JSConf EU


Hi, everybody. How we doing? We got caffeinated? Feeling good? Nice. So I’m Anjana Vakil, hello. You can find me on Twitter at my name and
today I’d like to talk to you about immutable data structures for functional programming
in JavaScript. We’re going to take a look at what immutable
data structures are, why they’re a really cool way to handle the immutability that we
typically use when we’re doing functional programming and how we can do that in JavaScript
because I hear y’all like JavaScript! So a little about me. I’m probably the only not-web-developer in
the room. I am an engineer for Uber Research. I work with them to develop a custom query
language for data in the scientific research funding domain. I’m also an alum of the Recurse Center, which
is a fantastic programming community in New York City, and I am an alum of the Outreach
Program, which if you have haven’t heard of it, it’s getting women and more folks involved
in these by giving them internships at Mozilla. So I’m really happy to chat about those things
if you want to come grab me after the talk. But you might know that I like functional
programming. I think it rocks. Anybody else agree with me that functional
programming is cool? Yeah! Yeah, so functional programming is a pretty
great way to avoid some of the headaches of like imperative and object-oriented programming. In functional programming, what we typically
do is conceive of our programs as being just pure functions. That means their transform their inputs to
outputs, and that’s all they do. They don’t have my side effects like changing
things in the console, and my taking things in the global state are side effects. But our data becomes data in, data out, and
transformers of data. And one thing that goes hand-in-hand with
this, with avoiding side effects is immutable data. Immutable data meaning once we’ve created
it, it never changes. So this is a really good way of changing something
accidental outside of your function. If everything is immutable, you can’t change
anything. So immutability another thing that rocks and
it rocks pretty hard for other reasons that we’ll see in a moment. But speaking of rocks, let’s talk about rocks. So this is a rock, and immutability rocks
in the way that rocks rock. Now I don’t know about you, but I’ve been
going to a lot of tech conferences recently and I’ve been feeling like there has enough
poetry. So I’d like to read you a poem:
Nobody sits like this rock sits. You rock, rock. The rock just sits and is. You show us how to just sit here. And that’s what we need. It’s so true, so deep. This is from —
Don’t thank me, thank I Heart Huckabees, that’s a great movie. Check it out. So this is really how immutable data rocks. It just sits there. It just is. Once we’ve created it, it never changes and
that’s amazing because it can help us avoid some of the headaches of immutability. So with immutability, we have some things
pretty easy, but other things become harder and we’ll see how that looks. So let’s say I have an array called foo and
it’s got some numbers in it. Hm, and I’m already bored. Let’s make it more fun. Let’s say I have a zoo with some animals — more
fun! And I decided that I want to change something
up about my zoo. Maybe I want to replace that rabbit there
with something a little more exotic. Like an alien! So this is cool. I’m happy because I wanted a more exotic zoo. I got an alien in my zoo now. I didn’t have to change anything except for
that one little cell in my array. That’s pretty sweet but my co-worker over
was expecting zoo to be filled with earth beings, earth animals, and wasn’t accounting
for there being an alien in it. Who put that in there? Now my program doesn’t work anymore. Who did that? So immutability has a couple problems. We have to manage who’s been changing what,
when — who’s been putting which animals in the zoo. We have to have a lot of overhead to manage
that state, and that gives us headaches as individuals, and as teams. We also get bugs in the code because maybe
I was only planning — or my co-worker was only planning — to handle terrestrial beings
and didn’t have a case of aliens being accounted for, and that broke something. So these are some side effects of immutability
that don’t make us happy. Let’s try doing things the immutable way. So in an immutable world, my array, my zoo,
once I’ve created it, it just sits and is forever. I cannot change it. What I can do if I want a new zoo that’s more
exotic is I can make a copy that’s the same size as my original array, and I can make
the modification I want, so I can put my alien in there in place of the rabbit. And so this is pretty sweet because now my
co-worker is maybe, and they’re, like, whoo, nothing broke in my program, and it’s all
still animal creatures but I had to copy over that whole array. I had to allocate the space for that entire
array, even all of the stuff that didn’t change. I had to copy all of that over, as well. So this means that my code runs pretty slow. And it also takes up a lot of memory. It takes up a lot of space and time. The complexity on those things are bad because
copying is a waste of both time and space. It makes us sad face! We don’t want that. So if we want to do immutability, we must
be able to find a better way of doing that. Luckily for us, a lot of very smart folks
have been thinking very hard about this problem for a while, and they’ve come up with some
really good solutions for how we can deal with immutability efficiently. immutable data structures! So immutable data structures is a term that
you may have heard about, with functional programming, or also in terms of React where
they come in handy. Technically, an immutable data structure is
like the rock, it just sits, and is once you create it. It never changes. But also hear the term persistent data structures
banged about. Sometimes these are used interchangeably,
but they have slightly different meanings. So if immutable data is data that never changes,
persistent data is data for which we have access to old versions. So as we’ve been creating new modified versions
of our data structures, we keep the old versions around. You might hear about partially persistent
data structures where we can look at the old versions, we can access them, but we can’t
go back and update any of them. All we can update is the most current version
that we have. And then you might also hear about fully persistent
data structures where we can actually time travel, we can go back and update any of our
past versions. And if this is starting to ring a bell like
it’s version control like git, it’s sort of the same idea. So we’re going to talk about these as persistent
immutable data structures, they’re both persistent, and immutable. Let’s see how this works. The key to all of this is we want the old
versions of our data, like, my original zoo to stay put. We just want to to sit like the rock but we
want new versions to be created efficiently. So what magical tricks do we have to use to,
like, make this happen? Do we have to make invocations do dances to
the gods of space and time complexity? No. It’s very simple. Trees and sharing. Isn’t that sweet? These two simple concepts will get us efficient
immutable data. How? So let’s talk about trees because trees rock
pretty hard, as well, alternative, unfortunately I don’t have a poem for that, sorry. Imagine that we could find a way to represent
our zoo array as a tree. So one thing I could do is I could put all
of my animals — all of my values — in the leaves of a tree, and I could make it so that
each leaf holds one value, one animal. But they might get lonely, so let’s put them
with a buddy. Let’s put them 2×2. So each of our leaves will have two values
and we’ll hope that the buddies get along and not each each other — looking at you,
tiger, number six, don’t eat that koala, and we can go up to intermediate nodes up and
up, until we get to the root node of the whole structure, and now that root is an array represented
previously by a tree. So this is my tree now in this structure. So given this type of structure, how do we
update something? Given that my data is immutable, and it can
never change, how can I handle the fact that it has an alien in it. So here what I would do is I would take the
node that contains the value that I want to change. So in this case it would be the 0/1 node that
you see on the bottom of the screen. And so I make a new copy where I’ve still
got my monkey but I’ve changed the rabbit to an alien. And then I need to copy any of the intermediate
nodes in the tree that were pointing to the node that I changed. So I basically trace a path up towards the
root of the tree, which, now, I’ve got a new root, which means another version of the data
structure. So this technique of making this update by
copying the path from the leaf I changed to the root is called path copying. That’s pretty cool because now I didn’t have
to copy the entire array; I just had to copy the nodes on the way from the root to the
leaf that I changed. So if we’ve turned in something linear and
copying into something logarithm. That’s pretty cool, that’s more performant,
and the data of this is that all of these nodes in yellow here, so most of the tree
is shared between the two versions, between the old version and the new. And so this saves me a lot of space because
I can actually reuse the parts of the original version that didn’t change, whereas, before,
I had to copy those over, as well. So this means that what was before, like,
a lot of memory consumption becomes a lot smaller because you don’t have to store as
many copies of the things if they didn’t change. And that’s called structural changing because
we’re sharing the structure of the tree between the two versions. So we’ve been talking about updating things
but how do we get at the values in our data structure? How do we access them? Well, it turns out this isn’t just a tree,
it’s a special type of tree called a TRIE tree, which originally came from the world
“retrieval,” so people could, I guess, call it tree, which is funny because we also call
TREE trees, so we can call them “tries” if we want. So a try is a type of tree, where the leaves
represent the values, and the paths to the value are the keys that that data is associated
with. So often you see TRIEs with values stored
as keys. So, for example, if I have T stored as a key,
what I do to get to the T is I trace the tree one letter at a time. Then I go to T, and then to E, and then to
EA, is my key, and then my value there is three. Because everything at the end sounds like
“ee” in this talk. So this is pretty cool, but in our data structure,
we weren’t using words, we just wanted an array-type thing, we wanted indeces, right? So the insight here is if we treat the index
as a binary number, then we can pretend that that’s kind of, like, our word and we can
descend the tree, bit-by-bit as if each representation of our binary representation is a letter. So let’s see how that works. If I’m trying to get at item five in my array,
so the animal at index five, I’d convert that to binary, so that’s one, zero, one, and then
I step through that as if it was a word. I step through it letter-by-letter, bit-by-bit. So I go from the root to the branch. I have a choice of either zero or one. I go to branch one first. And then I go to branch zero, and then I take
the thing on the one side. So I go one, zero, one, down my tree and then
I end up at my frog at index five. So this is a pretty simple insight but it
ends up being incredibly powerful because it allows us to quickly traverse this tree
structure, which lets us use that structural sharing to more efficiently represent our
new copies of our immutable data structure. And, importantly, we don’t have to be using
a binary tree, meaning we have two branches from each node. That fits pretty well on a slide, but actually
what you mostly see is a 32-way branching. So in our trees that we’ve been looking at,
we’ve kind of had one bit of information per level. And we’ve been descending bit-by-bit but if
we had a 32-way branching tree, it would be five bits of information that we would be
representing at each level. So that would look something like this. If we had a much bigger number, like, 18,977,
in binary, that’s that bunch of ones and zeros. This would be a really deep tree if I had
to descend into it one at a time, it would be like 15 levels deep. Too much, too long. So if I’d make more branches at each level,
then I can chunk this up into kind of 5-bit letters as it were, and descend the tree that
it’s now only three levels using the 32-way branching. So this is kind of a tradeoff between how
deep your tree is going to be, and how big the nodes are going to be because if I have
just one bit of information at each level then I have really small nodes. That’s quick to copy over but I have to go
very, very deep down in the tree for a larger array. And generally, research has found that 32
is a pretty good tradeoff between the depth of the tree. So what we’ve seen is a bitmap vector TRIE. That’s just jargon. We don’t need to care about that. But if you need something to Google, you can
Google that. This is cool for array-type of things and
we have an index we want to jump there, but what about objects? We also want to be able to associate objects
with arbitrary keys, not just indeces, so we want non-integers as keys, how does that
work? So if I want a version of my data structure
where it’s no longer an array but it’s something like an object where I’m associated letters
with each of my animals like M for monkey, and P for panda, et cetera, what I can do
is I can take my keys, in this case, they’re letters, and hash them to get a number that
represents the key. So that each key will have its own number. They won’t be in order necessarily, but that’s
okay. Objects don’t have to be in order. And then we can use the hash of that number
in binary to descend the tree as before. So if I wanted to look up the value associated
with key “F,” I could hash F, get some number, and let’s say I get five, like, A, B, C, D,
E, five. And that would be represented in binary as
one, and I descend the tree as before, here, for simplicity, just using a one bit at a
time, two-way branching tree. But typically we would be doing this with
32 branches per level. So, again, we just descend the tree using
the binary representation of our key, in this case, we used a hash function to transform
it from some arbitrary object into a number and we get the animal we want — in this case,
our frog. Cool. So that, if you want to Google it, the thing
you could Google is a hash array mapped TRIE. And this was a data structure parented by
Phil Bagwell, and Rich Hickey, kind of started using it, and a lot of these an implemented
in languages like Clojure to implement the data efficiently. There’s a ton of optimizations that are usually
done on these data structures to make them super-duper fast and lots of details that
we’re not covering here but this is the basic idea. Trees to represent our data, structural sharing
so that we can reuse as much information as possible between the old versions and the
new versions. And this idea of using binary representations
of our keys, whether indeces, or hashed keys to descend the tree to find the thing we’re
looking for. So to recap, mutability induces headaches. It is to be avoided especially if you’re doing
functional programming where the essential idea is to not have side effects and only
be using pure functions that don’t change anything except do the computation on the
input and return the output. Immutability, on the other hand, is great
because if I’m using immutable data, I can’t mess up my co-worker’s program by making the
zoo she only thought was animals suddenly have an alien in it. But copying is a really bad way of handling
data because it is not efficient neither with respect to time, nor space. And structural sharing, using these tree structures
— or TRIE structures, and sharing as much information from one version to the next is
the really performant way to do this. And so you’re probably thinking, okay, these
data structures are pretty cool. But what am I supposed to do with them? I’m not going to be building boxes of emoji
here, am I? No, you don’t have to. In JavaScript, there are some really great
libraries out there to help you use these right off the bat. There are various solutions but I’m going
to talk about a couple of them. So one is called Mori. Mori is basically a port of Clojure script
by David Nolan that allows you to leverage the implementations of these data structures
from ClojureScript, which is the version of Clojure which targets JavaScript from the
comfort of your vanilla JavaScript. And it’s got a bit more of a Clojure feel
to it. A bit more of a functional language feel. The API is functional and we’re going to see
what that looks like in a moment. But that’s one thing that kind of sets this
library apart. On the other hand, there’s also Immutable.js. This is a library put out by Facebook. It was created by Lee Byron. And this is a JavaScript implementation of
these data structures. So it has a bit more of that native JavaScript
feel to it. It doesn’t have kind of the Clojure background
brought in. And that means it’s got a more object-oriented
style API, although it is still returning new versions of data structures instead of
changing mutable structures in place. So let’s see what those look like. This is how you might use Mori to create what’s
called a vector. A vector is the data structure from Mori that
you’d probably be using as an array-type thing. So I’ve got a vector that I’m calling A because
it’s sort of array-ish. It’s got one and two in it. And if I want to push something onto that,
the function that I’d use is conj. This is from the Clojure called, Lisp-speak. And what I would put in is the original A,
and then what I want, which is, in this case, three. And you’ll see that this creates this new
structure on the right. These vector, one, two, and one, two, three,
they look different because they’re not really JavaScript arrays although you can convert
back and forth. But the point is this cong function returns
a new value which I can catch as A2 and I can prove to myself that my original A didn’t
change by using the count function to see how many things are in it. And there’s only two things in it. But I can prove that my version, A2, has the
third thing by trying to access, by using the get function to trying to get two, which
it tells me, it is indeed three. This is the same thing that you would use
in Immutable.js. Here you would use Immutable.js.list.of, that’s
interesting syntax. But it creates something more like a JavaScript
array. Although it is not an array, it is a JS list. That I’ll call an array and if I want to add
something onto a new version of A, I use this sort of dot-method notation that we’re used
to. I’d say a.push(3), but, importantly, this
is not changing a. It’s just returning a new value of a, which
I’m going to capture as a2 and I can prove to myself that it didn’t change. A.size tells me it’s two, and if I try to
get the item at index two, I find that it’s three, as I expected. So, similarly, for what are called maps, which
is kind of the key-value object that we might be using, if I create an object, o, which
is going to be my Mori hashmap data structure, I’m associating a is one, b with two, again,
we see that the syntax is a little different from our regular JavaScript beastlier not
regular JavaScript objects. They’re super special immutable data structures,
they need special syntax. And so if I want to change the value of one
of my keys, I can use this asoc function, and then change the value of three in my new
version, o2, and then I can prove to myself that the original didn’t change by using the
get function to make sure that a in the original one — o, is one, and the a in o2 is three,
as I would expect. And it looks quite similar in Immutable.js
except the structure is called map, not hashmap, and I can pass in a little JavaScript object,
and it gives me a little o, a little more JavaScript syntax than we’re used to. This has a bit more of a syntax and feel that
you might be used to from JavaScript programming, I can use the set method on o to create a
new version where a is now three, and I can use the get methods on my old version o, and
my new-version o2 to prove to myself that the old one didn’t change. So these are really immutable data structures. They look really weird if you try to look
at them in the console just as JavaScript objects. They’re really fun to kind of poke down into
because they have this complicated tree structure. So I highly recommend that you try out these
libraries and see what works for you. I can tell you really just briefly before
I run out of time here, that how they compare is basically, again, Mori is from the Clojure
world, it’s ClojureScript. But the Immutable.js has more of the o.get()
kind of feel to it, if you’re comfortable writing JavaScript like that. However, for me, it gives me a little bit
of a cognitive dissonance there because it looks like we’re mutating things with those
calls — we’re not — but for me, to get more into the mindset of functional programming,
I prefer the functional programming of Mori because it gets to the way that we conceive
things as inputs and not just outs. We don’t want to be in the mindset of making
changes in place to objects. There’s also some minor performance differences
between the two, Mori is a bit faster, and Immutable.js is a bit smaller. But they’re both great options, try them out,
and I hope one of them works for you. So that’s my talk. I hope it’s been useful. Go forth and don’t mutate your data! Here’s some references for you.