Optimizing Search in JavaScript

Skiplist is JavaScript turtles all the way down.

It is like a zombie virus. The language has taken over everything. And, my hand has a bite. Do I embrace my inner JavaScript kitty, become what I have always feared, or should I chop it off while I have a chance?

This week I optimized an in-memory cache lookup. The customer dataset was a few orders of magnitude larger than anticipated. As a result, we had to refactor an in-memory cache data structure.

The initial version of an in-memory cache is pair programmed and written using TDD. I like to embrace my inner agile technical coach when I can. It may only be a coincidence, but it was easy to refactor the implementation details such that lookups now occur in constant time. The technical details of the optimization trick are described below.

The source for example below can be found here in GitHub.

Declarative Programming

Imperative tells the how. Declitive code tells the what.

Let's look at a simple example of gathering the ages of three people:

const people = [
  {id: 1, name: "Jason", age: 38}
  {id: 2, name: "Justin", age: 34},
  {id: 3, name: "Josh", age: 33}
]

// imperative
const ages = []
for(let person of people) {
    ages.push(person.age);
}

// declarative 
const ages = people.map(person => person.age)

JavaScript offers a few of built-in declarative helper functions: map(), reduce(), filter(), forEach(), and find(). The indoctrinated claim, declarative code is expressive, elegant, and functional... "clean". I agree, not caring how the sausage is made allows you to enjoy it that much better! Yet, there are times when the how is important, such as speed or memory optimization.

Using Find to Search for a Value

What about a similar scenario where you are looking up a person by id in a list of a million entries?

const idToFind = 1000000
person = people.find(person => person.id === idToFind);

The above statement is clean, find the first person whose id is 10000. In contrast, the imperative approach to the same linear search is about half a dozen more lines of code. Simple is awesome. Simple is clean. But, Big(O) notation ("Big O Notation") tells us linear search is literally the worse. We sacrifice performance for cleanliness, which is the trade-off I personally will pick 99.99% of the time. #empatheticprogramming

If the keys are unique and our data set is of a manageable size, we can increase performance by turning our list of people into a map of people by id, then perform a hash lookup, O(1), on the id! We have at worse, a one time O(n) arrangement step, and then O(1) lookup on each record.

Code Example

As good stuarts of software craftsmanship, let's start off with a failing JavaScript Unit Test to assert the expected behavior.

const assert = require('assert');
const Search = require("./search");

describe('Search', function () {
  const people = [];

  before(() => {
    people.push({id: 1, name: "Jason", age: 38});
    people.push({id: 2, name: "Justin", age: 34});
    people.push({id: 3, name: "Josh", age: 33});
  });

  it('should return the correct element', function () {
    const expectedName = "Justin";
    const search = new Search(people);

    const person = search.find(2);

    assert.equal(expectedName, person.name);
  });
});

Imperative

At this point, we have a red test. Let's implement our first approach, a traditional imperative search using a for loop. 

class Search {
  constructor(people) {
    this.people = people;
  }

  find(id) {
    for(let person of this.people) {
      if(person.id === id) {
        return person;
      }
    }
  }
}

module.exports = Search;

Before moving forward I setup a test harness to evaluate performance.

const Search = require("./search");

const testIterations = 1000;

function testPerformance(searchId) {
  const items = [];

  for (let i = 1; i <= searchId; i++) {
    items.push({id: i})
  }

  const searchSmall = new Search(items);

  const timings = [];
  for (let i = 0; i < testIterations; i++) {
    const start = new Date();
    searchSmall.find(searchId);
    const stop = new Date();
    timings.push(stop - start);
  }

  const timingsSum = timings.reduce((total, current) => total + current, 0);
  console.log(`Average time for find for ${searchId} records: ${timingsSum / testIterations} ms`);
}

const smallId = 3;
const largeId = 1000000;

testPerformance(smallId);
testPerformance(largeId);

// performance output:
// >>> Average time for find for 3 records: 0.001 ms
// >>> Total time for find for 3 records: 2 ms
// >>> Average time for find for 1000000 records: 2.617 ms
// >>> Total time for find for 1000000 records: 2906 ms

Declarative

Now we have a green test that asserts the exact behavior we are looking for and a performance test harness, we are free to refactor the internals of the find method! Moving from imperative to declaritve looks like this.

// ...

  find(id) {
    return this.people.find(person => person.id === id);
  }

// ...

// performance output:
// >>> Average time for find for 3 records: 0.001 ms
// >>> Total time for find for 3 records: 2 ms
// >>> Average time for find for 1000000 records: 2.356 ms
// >>> Total time for find for 1000000 records: 2690 ms

With the move from imperative to declarative, we gain code cleanliness. Yet our search time on a million records is still close to two and a half milliseconds, relatively unchanged. A majority of cases, this delay is totally acceptable.

Optimized

Finally, what if we were performing this search within a nested loop of a large collection? Two and a half milliseconds on a search of a million records may be totally unacceptable at this point. So, let's use a hashmap (associative array in JavaScript)!

class Search {
  constructor(people) {
    const peopleMap = [];
    people.forEach(person => peopleMap[person.id] = person);
    this.people = peopleMap
  }

  find(id) {
    return this.people[id]
  }
}

module.exports = Search;

// performance output:
// Average time for find for 3 records: 0.001 ms
// Total time for find for 3 records: 2 ms
// Average time for find for 1000000 records: 0 ms
// Total time for find for 1000000 records: 302 ms

Final Thoughts

Something something thoughtful, something something data structures and algorithms.

Cheers,

Justin Beall (VP of Engineering at Skiplist)