An ode to immutability

(WARNING: this is a mostly technical post. If you don't realy care about it, look at the categories of the post before spending your time trying to understand what I mean by "immutability")

I've been working lately on a very large and interesting project at work that is taking a lot of my brain cycles. There are lots of technical challenges that I'm trying to resolve, the biggest of them being performance. Lots of weekends and late nights later, it's still here...

Anyway, for the last week or so, I was working on a piece of code to improve performance of feature extraction from large amounts of text. My features are mostly textual and what I need to do is identify where in the text terms appear. Sounds easy, right? Well... I'll skip the details of my approaches so far to try to jump right into my immutability ode here on my latest approach.

So I have an iterator over the parsed string that identifies where in the graph it is. An iterator patern is easy, in its by-the-OO-patterns-book not stateless at all. The challenge is that there are parts of the string that I find that I could interpret it multiple ways, i.e., try multiple directions on my parsing state machine.

At this point a CS purist will say that if I don't really need to have any memory (which I don't), I could just increase the size of my parse graph and still keep moving in one direction only. That is true, but that would basically blow up my graph in size, in a factor of N^2, which is not a very pleasant solution.

So I found myself changing my code to be recursive and creating an Iterator.clone() which would copy the state of the iterator, so that I could handle those different cases.

void handleEntry(ItemIterator it) {
    Item item = it.next();
    if(shouldSplit(item)) {
        handleEntry(split(it.clone());
    }
    handleEntry(it);
}

After making this started becoming more complicated and deciding whether to clone it or not because trickier and had to be decided after the "handleEntry(it)" (because it depended on the result of handleEntry()), I found myself just always cloning my iterators. Then, if I was cloning it all the time, why not make it immutable? And that's what I did and that made this code cleaner, with only one minor detail: the iterator became a little "fatter":

void handleEntry(ItemIterator it) {
    it = it.next();
    Item item = it.getItem();
    if(shouldSplit(item)) {
        handleEntry(it);
    }
    handleEntry(it);
}

So now the Iterator actually keeps the Item around and not just produces it and moves on. Implementation-wise, though, that doesn't really affect things much.

The interesting thing is that not only this code became cleaner, but I suddenly had a code that I could switch around from being recursive to being iterative without having to worry about state management, as the piece that contained most of the state could be put into a stack and retrieved it later without having to worry about it having a different state when I retrieved it later. It saved me a lot of work! So go immutable objects! Maybe I should start coding in Haskell or Erlang (I played with Erlang before and it was interesting - more on that some other day)...