Learning data structures & algorithms - Hash table

[Part 2 ]: Hash table..hash what?

Continuing from the previous post, If you take the bells and whistles out of it, a data structure is nothing but a container to store data. Algorithm is nothing but connecting these data structures together to write a program. A good analogy is, we put words & tunes tother to produce music. Data structure is like our words, algorithm is like the tunes. When both are in harmony, we get awesome music.

From a javascript perspective, think about stack & heap.

Variables are stored in heap, and function calls in stack.

Programming is about trade-offs. What is the priority for the requirement? Is it the speed? is it storage space? is it something else?

By default, we already have some basic data structures that we already know. For eg, arrays. We also would have heard of stacks, queues, linked lists, graphs, trees, dictionaries or hash tables.

So naturally, my first question was, why should I go for all these alternatives, when I already know arrays & objects.

Arrays:

Arrays are of two types

  • Static arrays have fixed size
  • Dynamic ones have their memory getting dynamically allocated

Arrays are useful when sequence & order matters. We already get utility methods like push, append, prepend, etc. We access them via its index.

Is it always the best choice?

Say, if we have 10,000 records, and if we are looking for the 9999th item, basically, we need to loop 9998 items to reach to it. if we need to check if something is available or not, we need to go through the full loop to confirm if the search is success or not.

When searching and indexing faster is the priority, & we don't bother about the sequence, objects are a better choice.

Irrespective of an object having one property or a 1000, object[key] gives value, object.hasProperty('propName') gives a yes /no.

With Big O, we look at the broader picture. Rather than getting an exact value, most times, we get a bit more broader, elaborate value when it comes to BigO. Most times, we look at the worst case complexity only. With a program, most times, what we need to know is also the same. We know the program works when there is 10 users, what if it is a black friday sale and there is a million users logged in together? Do we really care about 100 or 123 users?

Let's look at the complexity of arrays.

  • For search, we have O(1), because we know the index, and arrays are sequenced. If i ask, what is the 50th item in an array, we can do a array[49] to get its value.

  • With arrays, we can push values either at the start or at the end. Either ways, we have O(1). But, there is a small caveat here.

If we add a new item at the start, every other index shifts by 1. If we add the item at the end, then none of them will be changed as well.

For insert / delete, we can do it anywhere in the array, via splice, slice etc. So, the big O is O(n).

With arrays, the key is always a numeric one.

const myArray = [];
myArray.push(10);
myArray.pop();
myArray.append(3);

Hash tables / Dictionaries

We can set the key as well as value in a hash table.

One limitation with hash table is that, we have limited memory to play around with. Imagine a theatre with 100 seating capacity. It's the ticket number that decides which row, which seat are we getting.

There is something called a hash function who decides where in the memory, is this data getting stored. Think of a ticket counter person who decides the seat number for us.

Whereas with arrays, we directly provide the index, here in hash tables, we provide that index to a hash function, who gives us the address of the memory location where the item is stored. So basically, a mediator middleman who decides the memory address for us.

My first question was, big deal, why do we need this when we have arrays.

Look at the Big O complexity of the hash tables -

For inserting, searching, deleting, look up, all have a complexity of O(1). With the key, I exactly know who goes where, and what to search for. But the problem being, hash tables have limited memory.

Again, we circle back to trade offs. Nothing is perfect. ( Like the marriage proposals of a traditional South Indian culture, the whole family comes together and assess the trade offs, - Is the bride's family well-settled, is she educated? is she working in IT field? Does she have siblings, does her hair length is till her knees.. when some criteria matches, the other doesn't! .. pun & self mocking intended.. )

I told, the Hash tables have limited memory. So, what happens, when there is 100 seats and we try to allocate seats to 150 people? It overlaps!

Now that is called hash collision, which is nothing but a technical jargon wrapper for saying "Seats may overlap"

Think if dad and son gets same seat number - And son is sitting at dad's lap to watch movie. That's hash collision! 😎

With hash tables, the hash function randomly assigns a memory location. That location may already have something already existing.

That's when we tack on more and more requirements, I need O(1) as the complexity, but I don't need arrays, I can't use hash tables, it is having collision. Now what? That's when we look into Linked lists, which can solve this collision issue.

See, necessity is the mother of invention.

Thats how, we move towards different choices for data structures.

I need to carry something - Ok, here is a plastic bag. But the item is heavy, Ok, here is my bag - This holds weight, But my item is a sofa set - Fine, we will rent a small van to carry them. See, .. when we start asking more details, we get different set of requirements.

Let's see how we can write hash tables.

Screen Shot 2021-06-06 at 3.51.30 PM.png

We know, a hash table is of limited size. So lets implement step 1 first.

class HashTable {
  constructor(size) {
     this.data = new Array(size)
  }
}

We alos know, the memory allocation is done by a hash function that accepts an index key and returns a memory.

For simplicity sake, lets take a reduced version of hash stripping off the logic and replacing it with a simple one.

class HashTable {
  constructor(size) {
     this.data = new Array(size)
  }
  _hash(key) {
    if(key === 'grapes'){
      return 'AC1010'
    }
    if(key === 'banana') {
      return 'BC2020'
    }
    else {
      return 'CC3030'
    }
  }
}

Now, we need to get as well as set the hash table values.

  set(key, value) {
    // get the memory location
    const memoryLocation = this._hash(key);
    // is it empty or not? 
    if(!this.data[memoryLocation]) {
      // It is empty, so create an array & push
      this.data[memoryLocation] = [];
      this.data[memoryLocation].push([key, value])
    }
    else {
      // The emory location already has something
      this.data[memoryLocation].push([key, value])
    }
  }

And also, we need to get them . There is a caveat here.

We need to get the memory location first, and from there, we need to get the item corresponding to a key. Then only we will get the desired values.

  get(key) {
    const memoryLocation = this._hash(key);
    const buckets = this.data[memoryLocation];
    if(buckets.length) {
      for(let i = 0; i < buckets.length; i++) {
        if(buckets[i][0] === key){
          return buckets[i][1];
        }
      }
    }
    return undefined;
  }

And to see the thing working,

const myHashTable = new HashTable(25);
myHashTable.set('grapes', 100);
console.log(myHashTable.get('grapes')) // gives 100

Okay, so why cant we use Hash tables all the time?

There are downsides for this too, starting with limited memory space. Think of a series of drawers without numbers. We have no idea , on which drawer does our item contain. We need to close & open every single one to verify. In the memory, the items are put randomly on top of one another. So, we need to loop through each draw in the memory location to figure out. That's what the for loop in the code snippet is for.

In the next post, we see Linked lists.

Enjoying the posts?

let me know your thoughts!

No Comments Yet