Easy Wins Optimizations – HashSet

HashSet is a data structure introduced in .Net 3.5 framework. The main advantage of HashSet is that it adds and fetches items in very low complexity (constant amortized time). We’ll try to see how much faster than List it can be. In the example we’ll go over code that stores all historical lottery results and checks if some results appear among the historical ones.

Registering results

public abstract class ResultsRegistrarBase
{
protected ICollection<long> codes;

public void AddResult(int num1, int num2, int num3, int num4, int num5, int num6)
{
long code = ConvertToCode(num1, num2, num3, num4, num5, num6);

codes.Add(code);
}

public bool ContainsResult(int num1, int num2, int num3, int num4, int num5, int num6)
{
long code = ConvertToCode(num1, num2, num3, num4, num5, num6);

return codes.Contains(code);
}

private static long ConvertToCode(int num1, int num2, int num3, int num4, int num5, int num6)
{
return num1 + (long) Math.Pow(10, 0) + num2 * (long) Math.Pow(10, 2) +
num3 * (
long) Math.Pow(10, 4) + num4 * (long) Math.Pow(10, 6) +
num5 * (
long) Math.Pow(10, 8) + num6 * (long) Math.Pow(10, 10);
}
}

This class is in charge of storing and finding the results. It assumes it gets the numbers sorted and the ConvertToCode method maps all the numbers to one “big” number”. For example: 1,5,20,30,45,49 will be mapped to 494530200501. The codes collection will be initialized in a base class:
The naive code:

public class NaiveResultsRegistrar : ResultsRegistrarBase
{
public NaiveResultsRegistrar()
{
codes =
new List<long>();
}
}

The code based on HashSet:

public class HashSetResultsRegistrar : ResultsRegistrarBase
{
public HashSetResultsRegistrar()
{
codes =
new HashSet<long>();
}
}

How much faster is it than the naive code?

The difference is huge. In a simple case where we have 250,000 historical results and we’re querying the collection 50,000 times the average running time of the naive code is 1:28 minutes while the code based on HashSet takes only 0.2 seconds. The HashSet code run ~360 times faster than the naive code. The reason for this major difference is that List has to iterate over all the items in order to determine if it contains the requested item. HashSet uses a hash method, without getting to implementation details, it enables the Contains method look in very few entries in order to determine whether or not an item is contained in the collection.
This major improvement was received with minimal change in the code. So in cases where we query a large collection about whether some items are in it, usage of HashSet can give us major running time  improvement with minimal change.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s