This is the first post of easy wins optimizations posts. The need for these posts comes from the fact the most of us hardly ever need it 🙂 Sounds weird? It should. What it means is that we don’t need it most of the time; the most naive code does the best work for us. When we encounter the case that requires optimization we hardly remember the simplest ways of doing it.
In this post on sorted list we’ll go over an example of optimizing a simple case.
How to find a person in a collection whose age is closest to mine?
Assume we have a person class with age defined on it:
public class Person
{
public double Age { get; private set; }
public Person(double age)
{
Age = age;
}
}
What we want is to collect many people into a list, and any time we asked about an age, we find the closest person to this age. We can think of it as an application for a dating service that matches people by age.
The naive solution
So what can be simpler than collecting all the people in a list? Probably nothing, this is how we do it:
public class NaivePersonFinder : IPersonFinder
{
private readonly List<Person> persons = new List<Person>();
public void Add(Person person)
{
persons.Add(person);
}
public Person FindPeopleInRange(double requiredAge)
{
Person closestPerson = null;
double smallestDiffrence = double.MaxValue;
foreach (var person in persons)
{
double difference = Math.Abs(person.Age - requiredAge);
if (difference < smallestDiffrence)
{
smallestDiffrence = difference;
closestPerson = person;
}
}
return closestPerson;
}
}
The code iterates over the collection in the most straightforward way there is and updates the closest person whenever it encounter a closer person. Simple! Isn’t it?
But I need faster results!
The sorted list solution
How does sorted list help us here? Sorted list allow us to search items using Binary Search. Binary search saves a lot of time since it has to query very few items in the collection in order to find the closest one. For example, to find the closest item in a collection of 1,000 items it’ll have to query ~10 and saves iteration over the hole 1,000.
Some code
public class SortedPersonFinder : IPersonFinder
{
private readonly SortedList<double, Person> persons = new SortedList<double, Person>();
public void Add(Person person)
{
persons.Add(person.Age, person);
}
public Person FindPeopleInRange(double requiredAge)
{
if (persons.ContainsKey(requiredAge))
{
return persons[requiredAge];
}
IList<double> sortedAges = persons.Keys;
int index = BinarySearchIndex(requiredAge);
var closestPersons = new List<Person>(3);
if (index > 0)
{
closestPersons.Add(persons[sortedAges[index - 1]]);
}
if (index < sortedAges.Count - 1)
{
closestPersons.Add(persons[sortedAges[index + 1]]);
}
closestPersons.Add(persons[sortedAges[index]]);
return closestPersons.OrderBy(person => Math.Abs(person.Age - requiredAge)).First();
}
private int BinarySearchIndex(double requiredAge)
{
var sortedAges = persons.Keys;
var scopeStart = 0;
var scopeEnd = sortedAges.Count - 1;
int index = 0;
while (scopeEnd > scopeStart )
{
index = (scopeStart + scopeEnd) / 2;
double ageAtIndex = sortedAges[index];
if (requiredAge > ageAtIndex)
{
scopeStart = index + 1;
}
else
{
scopeEnd = index;
}
}
return index;
}
}
The code is more complex than the naive version, but it finds the results much faster. It uses .Net Sorted List collection. This is a collection that updates a collection of keys whenever the collection changes. The keys are kept ordered and we can easily access them and binary search them.
How much faster is it?
In a collection of 100,000 people and 50,000 queries the running time of the sorted list is much more faster than the naive:
The naive solution did it in average time of ~100 seconds
The sorted list solution had average time of ~6 seconds. This is a major improvement; it means that it ran more the 15 times faster than the naive solution.
Conclusion
If we have a collection in which order can be applied AND we also need quick results for queries based on the order, sorted list can easily improve running time. We must always remember that the running time is usually negligible even in the most naive code. If no optimization required – don’t optimize, more gain will be earned from simple and maintainable code.