Compile time string hash

Although we can use a simple stack to store data and remember their location, it will be easy for a computer but hard for us. Thereby we often name our variables with a clear and memorable name to better remember and understand them later. No need to be cryptic.

We always had this need to define and name things and that’s particularly true in programming.

While I was learning about Vulkan, I wanted to write a class that will store Vulkan function pointers and that class should also provide a simple accessor to retrieve these functions from a single name. Most of programmers will write a class like the following one (documentation  and asserts are omitted for readability purposes) :

We are often tempted to use a readable string as a key for the map (note that we use an unordered_map because ordering is not important here) but there are several performance issues.

The first issue is string dynamic memory allocation. Even though there is no direct call to new or malloc in this code, this is done internally in the std:string class. So for each function load, there is at least one unnecessary memory dynamic allocation. The second issue is runtime hashing. Because the map key type is std::string, there is a costly hash function with a linear algorithmic complexity involved for each map access. The longer the string is, the longer it takes to compute. Why not directly use a readable literal identifier to avoid memory allocation and runtime hashing ? This is achieved with compile time string identifier. In fact, in C++, we can ask the compiler to execute code at compile time, our hash function for example …

It was possible to do that in C++11 with recursive templates or recursive constexpr functions but this is now possible without recursion since c++14 constexpr. We can start writing our compile time hash function. First we need the following constants to avoid to hard code them :

Then the C++11 hash function :

This function is tail recursive. It will simply iterate over the string and update the hash parameter each time. The function will return as soon as it will encounter the string terminator. With C++11 constexpr limitation, the function body must only have one return statement, that why we are using the ternary operator. Moreover this code will not be evaluated at compile time. We can force the compiler to evaluate the function with template or enumeration. Binding the result of the function to an enum member will actually force the function evaluation (because enum are evaluated at compile time). If the function cannot be evaluated at compile time, it will raise an error. We can also do the same with template parameters because they are also deduced at compile time.

We can now define two helper macros :

Putting it all together :

We can also make an improvement to remove the recursion if the C++ used is >=  C++14.

Finally we can improve our Vulkan function loader with our string identifiers :

With just a little more code, we can deal with compile time string hash as identifier and avoid dynamic memory allocation and runtime hashing. To conclude, this code is subject to further improvements like

  • Add assertions
  • Replace macros with constexpr functions
  • Change the hash function for a better one
  • etc..

The hash function proposed in this article is strong enough for general purposes. However, I strongly discourage the use of this function for cryptographic/security usages. It will expose you code to many vulnerabilities.

You can find the complete source code here on my GitHub in the repository Articles.

Thanks for reading !

Leave a Reply

Your email address will not be published. Required fields are marked *

Copyright © 2018 Vincent STEHLY--CALISTO. All Rights Reserved.

Up ↑