December | 2009 | Theory to Practice to Theory

HashSet is a data structure introduced in .Net 3.5 framework. The main advantage of HashSet is that it adds and fetches items in very low complexity (constant amortized time). We’ll try to see how much faster than List it can be. In the example we’ll go over code that stores all historical lottery results and checks if some results appear among the historical ones.

Registering results

public abstract class ResultsRegistrarBase
{
    protected ICollection<long> codes;

    public void AddResult(int num1, int num2, int num3, int num4, int num5, int num6)
    {
        long code = ConvertToCode(num1, num2, num3, num4, num5, num6);

        codes.Add(code);
    }

    public bool ContainsResult(int num1, int num2, int num3, int num4, int num5, int num6)
    {
        long code = ConvertToCode(num1, num2, num3, num4, num5, num6);

        return codes.Contains(code);
    }

    private static long ConvertToCode(int num1, int num2, int num3, int num4, int num5, int num6)
    {
        return num1 + (long) Math.Pow(10, 0) + num2 * (long) Math.Pow(10, 2) +
               num3 * (long) Math.Pow(10, 4) + num4 * (long) Math.Pow(10, 6) +
               num5 * (long) Math.Pow(10, 8) + num6 * (long) Math.Pow(10, 10);
    }
}

This class is in charge of storing and finding the results. It assumes it gets the numbers sorted and the ConvertToCode method maps all the numbers to one “big” number”. For example: 1,5,20,30,45,49 will be mapped to 494530200501. The codes collection will be initialized in a base class:
The naive code:

public class NaiveResultsRegistrar : ResultsRegistrarBase
{
    public NaiveResultsRegistrar()
    {
        codes = new List<long>();
    }
}

The code based on HashSet:

public class HashSetResultsRegistrar : ResultsRegistrarBase
{
    public HashSetResultsRegistrar()
    {
        codes = new HashSet<long>();
    }
}

How much faster is it than the naive code?

The difference is huge. In a simple case where we have 250,000 historical results and we’re querying the collection 50,000 times the average running time of the naive code is 1:28 minutes while the code based on HashSet takes only 0.2 seconds. The HashSet code run ~360 times faster than the naive code. The reason for this major difference is that List has to iterate over all the items in order to determine if it contains the requested item. HashSet uses a hash method, without getting to implementation details, it enables the Contains method look in very few entries in order to determine whether or not an item is contained in the collection.
This major improvement was received with minimal change in the code. So in cases where we query a large collection about whether some items are in it, usage of HashSet can give us major running time improvement with minimal change.

Make data objects declare columns as Data Members

This step is very simple: all that’s needed is to define in the DBML editor that the serialization mode is Unidirectional.

By defining this serialization mode the generated code will be decorated with DataContract and DataMember attributes on classes and columns. By doing so the objects are compatible with WCF service and declared as part of the contract.

Visit the objects associations properties before returning them from the service

For example, if we’re dealing with this simple schema and our service returns a customer with requested ID.

Assuming we have this naive code:

public class StoreService : IStoreService
{
public Customer GetCustomer(int id)
{
using (var store = new StoreDataContext())
{
return store.Customers.First(dbCustomer => dbCustomer.ID == id);
}
}
}

The client will get the correct customer, but it won’t have the Orders property filled, even if the customer has orders the property value will be null. The reason for this behavior is that the generated code doesn’t execute the query against the database when it’s serializing. The solution is to access all data members before it serializes.

public class StoreService : IStoreService
{
public Customer GetCustomerWithRelations(int id)
{
using (var store = new StoreDataContext())
{
Customer customer = store.Customers.First(dbCustomer => dbCustomer.ID == id);
VisitDataMembers(customer);
return customer;
}
}

public static void VisitDataMembers(object dataMember)
{
if (dataMember == null)
{
return;
}

bool isEnumerable = dataMember is IEnumerable;
if (isEnumerable)
{
foreach (var item in (IEnumerable)dataMember)
{
VisitDataMembers(item);
}
}

var properties = dataMember.GetType().GetProperties();
IEnumerable<PropertyInfo> dataMemberProperties =
properties.Where(property => property.IsDefined(typeof(DataMemberAttribute), true));
foreach (PropertyInfo dataMemberProperty in dataMemberProperties)
{
var dataMemberValue = dataMemberProperty.GetValue(dataMember, null);
VisitDataMembers(dataMemberValue);
}
}
}

This VisitDataMembers method ensures all data members’ properties are accessed at least once before serialization and the client will be able to use all the data which is accessible for the data object tree. For example in the sample schema, the client will be able to get from customer to orders to products to materials.

	Eli Shalom on Shortest path on sparse g…
	anon on Shortest path on sparse g…
	kopiluwak nya on Chicken, Shut Up!
	Omer Mor on Improving performance using pa…
	ravyoli on Improving performance using pa…

Theory to Practice to Theory

Monthly Archives: December 2009

Easy Wins Optimizations – HashSet

Registering results

How much faster is it than the naive code?

How to pass LINQ-To-SQL data objects with associations through WCF service?

Make data objects declare columns as Data Members

Visit the objects associations properties before returning them from the service