Sunday, February 25, 2007

There are many things they don’t tell you in school (about C++)

The C++ language is not dead or replaced by Java or anything else. Instead has C evolved from a readable assembler code into the latest variant: C#. Every generation of C has had its legacy and even today we can neglect the previous generations of this versatile language. Today is the dominant flavour still C++ and was the choice for implementing my diploma work.


I thought I had an excellent knowledge of this language but my diploma work turned out to be quite complex with many interacting modules to implement. Quickly I realised I had to lean new aspects of this eminent programming language that is not the first things that meet the programmer's eyes. The truths you had learned in computer science lectures soon turned into shades of grey there I had to weight disadvantages against benefits.


AI and Computer Graphics are particularly tricky things to program as you want them to run fastest conceivably. Even if computers are getting faster and faster, they never seam to be speedy enough (if they ever will). So I turned to optimization of code and here you have some general ideas I gathered during my work.


Design Patterns

With the increasing size of code it becomes critical to use structured design methodology. The wise programmer invests time building solid utility libraries, it advantageous to view your code as a series of reusable components. If you would go in the wrong direction it is very hard to recover and you end up producing spaghetti code. First, you might think of Object oriented design and inheritance, but instead you should think of Design Patterns. From this we have two key ideas: the component and its interface. The nice thing here is that components can use inheritances but interfaces do not need to be implemented by inheritance. Interfaces are better to use with an object composition, the flyweight pattern.


If we complete separate interface from is implementation we get several benefits

  1. It is easy to switch implementation without affecting the rest of the application. This is good for experimentation.
  2. One can switch implementation during runtime!
  3. Fewer dependencies mean we will have faster recompilation times, and fewer times we have to recompile everything.
  4. New interfaces can be added afterwards.

Virtual Functions

To my surprise I discovered that virtual functions drain the performance. Virtual functions can be compared to function pointers. When the compilers generate a virtual function call on an object it sets up a virtual function table for that object that works as redirection instructions, and this is btw per object (!). The extra cost is this indirection jump through this virtual function table, and it is needed as the address is not known in advance, and this can cause an extra penalty if the redirection is not in the CPU cache.

  1. Avoid virtual calls in performance critical areas like loops, for example to “virtual DrawPolygon()”.
  2. Innate RTTI is not optimal but we can write our own.

Although all these have abstract interfaces attractive features and are very good for use for modules that can be replaced: render engine; database; behaviour; drawing tool.


Always keep interfaces and implementation as separated parts then possible. Try to avoid having one interface dependent on another interface, and that is done by not having any "#include" in the header file.


Templates are the top notch of efficiency and a good option to abstract classes, and if we want to avoid virtual functions it is a good idea to investigate Templates closer. Only template member functions that are called have any code generated for them. Compiler is btw usually only allowed to generate one copy of the code.


Good Methodology

Cyclic dependency — two classes both need from each other — is best treated by moving the shared data out to a separate class that both can depend on directly.


Every function in an abstract interface is a virtual function and every time these functions are called through the abstract interface will the computer have to go through one extra layer of indirection. This extra layer will cost noticeable time in continuous loops. Else this is not a big problem on modern computers, but if you would like to optimise your application more you would completely skip the virtual functions.


Providing a Null Object for every resource type let us do without checking for NULL all the time trying to access the resources. It allows greater control over the resources lifetime and better debug options.


Construction and Destruction

There are cases than the compiler generate code behind your back and this is an area there time can be saved by using good conventions.

  1. Delay initialisations of object until you know you rely need them will save your memory if you do not need them.
  2. The use of a “init()” function instead of relaying on the constructor doing the initialisation can speed up things for classes that are often instantiated but not used. Constructors are god to prevent sing un-initialised data, but time-consuming.
  3. But do not declare big objects in loops, you will pay for construction and destruction in every iteration.
  4. Use initialisation lists in constructors.
  5. Avoid calls to “operator= as this allocated more memory than necessary is.
  6. The same can be said about any operator that returns values, which are also slowed functions. Therefore, one should favour the use of “operator+=” over “operator=”.
  7. Empty constructors are optimised away by compilers.
  8. Btw it is better to use copy constructors than the “operator=”.
  9. Use pre-increment instead of post-increment when possible.
  10. Instead of using the costly “new” and “delete” we could override them and redirect them to pre-allocated objects.

Macro VS Inline

  1. Expressions passed as arguments to inline functions are evaluated before entering the function body; expressions passed to macros can be evaluated more than once and may result in unsafe.
  2. Inline functions can be debugged; macros cannot be because they are expanded in pre-advance.
  3. Macros are good for creating small script languages, perfect for state machine creation. Macros are string constants and therefore they can provide well debug information without any lookup-tables, especially in enum-case switches.
  4. The keyword "Inline” work well for small methods, three lines or less such as access on private members that are time critical in a rendering loop.

STL::set and STL::map

One cannot just look to the documented algorithmic running time when choosing an STL collection. An STL set guarantee that insertion, deletion, and look-up time into a set are O(log n), but it does not guarantee its memory usage(!). The implementation is a red-black tree and it is common to allocate memory every time a node is inserted. A look-up is likely to trash the cash memory as the elements are scattered over the memory. The set also have a significant overhead to manage the search tree, and is a huge waste of memory if all you store in it is pointers. The memory cost is of course worth it if we store many elements and are doing lots of membership queries.


STL::vector

An STL vector has constant-time insertion at the end of the collection because it only reallocates memory on occasion, and checking the entire vector take O(n) time, but the memory constant is likely to be small because all elements of a vector are mostly stored contiguously and therefore cash friendly. Unfortunately, there are situations there it can become fragmented. A delete operation take O(n) time as all the elements after the deleted is copied forward one step, but if the elements are pointers the operation can be done with a single call to “memcpy()”, and that is fast. If you do not use the STL collection often the difference is not important but it is surprising how well the vector can outperform a set.


Small is Faster

As memory is precious and limited, and because small is fast be careful with potential mistakes, we should strive for smaller executables.

  1. Do not "inline" functions that are called in loops, this will produce huge functions. Sometime the compiler does inline without your instructions (!)
  2. Disable debugging information generated by the compiler.
  3. Let the compiler optimise for size instead of speed(!) This sometimes works well when cash coherency kick in.
  4. Use the constant string “const char*” when possible. The usage of the dynamic sting “std:string” waste lots of memory.
  5. Avoid RTTI and “dynamic_cast” to save space.

Null Objects

The unwise programmer is a constant war with pointers, but this is a relationship that can blossom into friendship… any wise programmer learn to use weak references in form of smart pointers: an indirection layer to the resource object that counts references and automatically redirects to a null object if the resource object is deleted. This Null Object allows us to stop checking whether a pointer is NULL, and this object has some useful debugging effects as well.


More reading

For more interesting insights in the above areas I recommend that you further read the following sources.

Freeman, E et al. (2004) Head first design patterns: your brain on design patterns

Gamma, E et al. (1994) Design patterns: elements of reusable object-oriented software

The whole Game Programming Gems Series (2000-2006)

No comments: