Skip to content

Optimizing your native program

November 2, 2011

This is an attempt at explaining some common tehniques to optimize native code. I find that the terminology used when talking about this is intimidating, so my article should make for an easy sunday evening read.


Have the body of a method copied inside the method that it is called from:

public class Cpp
    public void Foo()
       int i = 10;

    public void Bar(int i)
        std::out << i * i * i;

    //after inlining
    public void Foo()
       int i = 10;
       std::out << i * i * i;

In fact, in C++ you can use the keyword inline to tell the compiler it’s a good idea to do this since you expect the function to be called a lot. Of course, the compiler, being a disobedient beast that he is and knowing that there are a lot of stupid people out there, doesn’t have to listen to you. There’s quite a bit of theory behind when inlining is a good thing and when it goes bad, but suffice to say your best bet are small hot functions.

Partial inlining

This is the same as inlining but the compiler only copies a block of code instead of the entire function. Usually the block copied is a branch that is used frequently.

Virtual Call Speculation

Sounds fancy huh? Imagine the classical hierarchy Cat is an Animal and Dog is an Animal. The optimization would look like:

// --- before optimization --- //


// --- after optimization --- //

if (myAnimal is Cat)

//call Cat directly

if (myAnimal is Dog)

//call Dog directly

Register Allocation

Well, there are only so many registers to go around (I’m talking about CPU registers not the Windows thing) . They are fast but not many … actually the fastest memory is a register because of its physical location inside the CPU. As such, there is a lot of rivalry for them (the fancy word is contention, but it sounds a bit disturbing). This can be solved by good old repurposing of the registry – i.e. save its value somewhere and use it for something hotter (again, I use temperature to depict usage). On a personal note, if you could do this manually, you should probably work with the windows kernel team (or whatever OS you fancy).

Basic Block or Function Layout
Change how the basic blocks of a function are layed out in memory and you will improve the cache efficiency. The key to this is to have a heat map – which blocks are hot (used frequently) and which are cold. Same things can be done at a function level.

Dead Code / Data Separation

Dead code is the code that is not hit during normal scenarios. This could be defensive programming calls ( if param is null then assert), code that is dead due to churn or code that is simply hit only by wacky testers. Moving this code to a separate memory location increases the density of hot code and it decreases the working set (i.e. the memory actually used by the program or to be more academic, those pages in memory recently used). Why? Because dead code would be put in some memory page and never get loaded.

Conditional Branch Optimization
Aparently some shapes of the switch statement are expensive. Imagine that case 6 is hotter than the rest. Then

// this is faster

if (i==6)

    switch (i)
        case 1: //
        case 2: //

Size / Speed Optimization

Programs can be optimized for either size or speed by a switch in the compiler. This is done by using either longer instruction sequences, which is faster but needs more memory or the opposite. But this type of optimization is per the entire program and you would get much better results if you could optimize:

  • hot paths for speed
  • slow paths for size

The grand finale?

In hope I shed some light of what really is just high density of fancy words, your next questions should be: how the hell do I do this and what speed up should I expect?

To answer in reverse order: about 20% when using the Profile Guided Optimizer (free, here). It works by drilling on profiling information, so you need to collect this information by running your most common scenarios. The MSDN article explains the whole process. Enjoy!

One Comment leave one →
  1. Alexander permalink
    November 19, 2011 2:58 am

    Wow, does Visual Studio optimization really give you 20% performance increase?

    Regardless, using faster algorithms gives you much more, so make sure you do that first 🙂

    Thanks, great explanation!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: