Improving performance using page guards

The problems we’re facing today is, a little bit, unique. Given:

  • n contiguous arrays
  • Each array has m cells
  • Each cell is a Boolean flag

We receive a stream of signals, each signal is an absolute offset from the first array. For each signal we need to set the correct flag AND the first flag of the array. The motivation for setting the first flag is to enable quick filtering of arrays having some flags set.
For example, we have a usage tracking system for n websites and m users. If user i visited website j we’d like to signal that by setting the ith flag in the jth array. After some time, we’d like to query which sites had any visit and who visited them.

The intuitive solution

Assuming you don’t care too much for the performance the solution is straight forward. Whenever setting a flag in an array set also the array in offset 0. If the input is index, then the array index is index / m and the item index is index % m. Pretty simple. For simplicity the source of indexes will be an array named items and the address of the first array will be baseAddress:

for (int i = 0; i < numOfItems; ++i)
{
char* hitAddress = baseAddress + items[i];
*
hitAddress = 1;
char* blockStartAddress = hitAddress - (hitAddress - baseAddress) % dwPageSize;
*
blockStartAddress = 1;
}

It is clear that the first action, *hitAddress = 1, is impossible to avoid. But, what about the set of the signal at index 0? We can replace it with a condition but it is clear it won’t affect much the performance. So, how can we improve that part?

Enabling page guards

Windows provides several memory protections, one of them is the page guard. When allocating a new memory scope we can declare it as protected. Defining it as protected means that each page (page is an arbitrary partition of the memory based on OS page size) will throw an exception on the first access to it. After throwing the exception the protection is removed. We would like to use this mechanism to avoid re-setting the flag at index 0.

In order to define such a scope, we will use the VirtualAlloc method:

VirtualAlloc(NULL, TOTAL_SIZE,
MEM_RESERVE | MEM_COMMIT
PAGE_READWRITE | PAGE_GUARD)

It returns a pointer to the memory scope with size of TOTAL_SIZE in bytes. If a page size P then the new scope has TOTAL_SIZE / P pages.

Tracking page hit

As mentioned, at the first time the memory inside a page is accessed an exception is being thrown. We would like to catch it. In order to do so in the fastest way, we will use windows SetUnhandledExceptionFilter API. The filter is a simple method receiving the exception information and deciding how to treat it. Treating it has three options:

  1. Handling it
  2. Handling it and continue the code execution
  3. Pass the decision to other handler

As a simple filter we can request the runtime to ignore all page guards exceptions:

LONG WINAPI SmartFilter(_EXCEPTION_POINTERS *ep)
{
if (ep->ExceptionRecord->ExceptionCode != STATUS_GUARD_PAGE_VIOLATION)
{
return EXCEPTION_CONTINUE_SEARCH;
}

return EXCEPTION_CONTINUE_EXECUTION;
}

So after setting it as the filter all page guards exceptions will be ignored:

SetUnhandledExceptionFilter(&SmartFilter);

Extending the exceptions filter logic

Let’s assume that all arrays are smaller than the opration system page size and we’ll assume that we don’t care about reserving extra space to pad each array. We’ll denote the page size with dwPageSize.
Now, we can make our SmartFilter really smart. We will add to it the logic for setting the first flag on each array. Assuming baseAddress is some global variable:

LONG WINAPI SmartFilter(_EXCEPTION_POINTERS *ep)
{
if (ep->ExceptionRecord->ExceptionCode != STATUS_GUARD_PAGE_VIOLATION)
{
return EXCEPTION_CONTINUE_SEARCH;
}

char* hitAddress = (char*)ep->ExceptionRecord->ExceptionInformation[1];
char* blockStartAddress = hitAddress - (hitAddress - baseAddress) % dwPageSize;

*
blockStartAddress = 1;

return EXCEPTION_CONTINUE_EXECUTION;
}

We extract the exact address being touched by accessing ep->ExceptionRecord->ExceptionInformation[1]. Through it it’s easy to get the start address of the page. When having this filter method registered we can be sure that whenever we set a flag in the array the first flag will be set too.
Now, we can alter the original code which was in charge of setting the first signal whenever a flag was set:

for (int i = 0; i < numOfItems; ++i)
{
char* hitAddress = baseAddress + items[i];
*
hitAddress = 1;
}

Comparing the results

In order to make our comparison interesting let’s assume that we have 10000 arrays (websites in the tracking system) and each array has 25000 flags (users for example). In order to make it intense we’ll assume that during a short period 10% of the arrays were visited, for exmaple having 250000000 signals sent through the stream (repeating actions in a website by same users are allowed). On average, the time it took to run:

Seconds
Straight forward 1.744
Page guards 0.78

As can easily seen, the page guards solution saves ~50% of the runtime.

Conclusion

The operating system provides a few very fast facilities which can be exploited. Even though most of those facilities are designed for different purposes they can still be useful in different cases, like the one here requiring single time signal for a scope. Since Windows puts a lot of focus in being backward compatible, those exploitations are not too risky. As usual – if it doesn’t require performance optimization, don’t do it. The price of maintaining the code might not worth it.

2 responses to “Improving performance using page guards

  1. Interesting.
    How does this compare to allocating the pages with allocation type MEM_WRITE_WATCH, and using the GetWriteWatch/ResetWriteWatch to get/reset the write-tracking state?

    Like

Leave a comment