Consulting your compiler for interpretation of your code.
17 / 10 / 2020

Many programmers think that their job is to create, innovate, enginner in a way that makes the collaboration most effective - to, in the end, create a product that meets some requirements, whether it is to deliver piece of engineering that is the most reliable, fast, secure or simply provide a POC of some project ASAP. We have all been there, we know how it works. Except that most of the programmers work is not like this.

We might think that we spend most of our time creating, innovating, designing and what not, but actually as the Geohot once said - programmers job is to translate the ideas into code that then compiler will transform into machine code that can work as programmers implementation of the idea. In fact, according to Jim Keller only around 1% of the work (in designing chips) is the creative part. So the rest of it is the translation part. If the programmer translates part of his/her minds contents into some kind of intermediary form, usually high level code, for the compiler to finish the work. It is very important to agree with the compiler that the piece of code will be interpreted by the compiler as the programmer expected, right?

In this post I will share some of the my ways for inspecting compilers intepretation of the code, using C++ and llvm/clang. This is not a tutorial of any kind, ideas are only briefly mentioned and reader should experiment with it on It's own.

Example problem

Let's say we have a piece of code:
#include <cstdint>
#include <limits>
#include <cstdio>

int main()
{
	uint16_t v1 = std::numeric_limits<uint16_t>::max(), v2 = 10;

	if (const auto res = v1 + v2; res == 9) {
		puts("Haha, I know how computers count, I am l33te$t h4ck0r");
	}
}
          

You might expect this code to always print this extremely funny string. You are not a freshman, right? You know that unsigned numbers "wrap around" when you overflow them, and you also know that C++ standard allows unsgined overflow and says that they in fact overflow. Yeah, all cool, but, the code doesn't print the string. Let's see what compiler thinks about this code.

Viewing AST

Compilers usually turn user's code from human readable form into AST first. In clang there is a easy way of getting the AST from compiler pretty-printed by doing:

clang source.cc -std=c++17 -fsyntax-only -Xclang -ast-dump

And for the code above, we would get (with removed parts of the AST that comes from the includes):

ast-dump1

I recommend to the reader to study this AST output on It's own. I will just briefly highlight some concepts.

First we can see a shared variable declaration statement "DeclStmt". This node has two children with type VarDecl, which is variable declaration. we can see that types of declared variables it uint16_t, which is alias to unsinged short. One of them is also initialized with expression - "CallExpr" of type unsigned short - this is call to our numeric_limits::max(). The second variable is initialized with IntegerLiteral 10. This is fine, that's what we would expect.

The interesting part for us is the if statement, IfStmt in AST dump.

It is composed of three parts. Declaration statement DeclStmt (this is c++17 feature btw), Equality operator BinaryOperator '==' and CompoundStmt which happens to be the code that is executed when condition evaluates to true.

Declaration statement has our const auto res declaration and initialization, as we can see the type deduced is const int! That's interesting. Let's see why is that. Out res variable is the result of operator + on two, as AST tells implicitly casted to int expressions. Ok that makes sense that adding two int's results in int, but why this conversion happened? Unfortunetly, AST dump will not help answering this question and user has to consult the C++ language reference for the answer.

I will help with this one though. When arithmetic operation is performed the arguments are converted acording to the rules described in section "Usual arithmetic conversions" [expr.arith.conv], which later states (for the above examples case) that the integral promotion rules are applied. Diving into those is out of the scope of this post, however I highly advise reader to consult the C++ standard paper.

Ok, so the answer seems to be that auto deduced type that user might not expect to be deduced. Reader can try to add explicit cast or change type from const auto to uint16_t to check that the if statement therefore in fact would pass and print this extremely funny string.

Bonus: Asking compiler for deduced type (trick from Scott Meyers book)

If we know that deducing the wrong type is the problem, we might use another way to check the deduced type. This trick is shamelessly stolen from Scott Meyers book - Effective Modern C++ ISBN:9781491908389.

I will not go into why do this or how it works. For details check the internet or Scott's book. The modified code that shows us the deduced type looks like this:

#include <cstdint>
#include <limits>
#include <cstdio>

template <typename T>
class TD;

int main()
{
	uint16_t v1 = std::numeric_limits<uint16_t>::max(), v2 = 10;

	if (const auto res = v1 + v2; res == 9) {
		TD<decltype(res)> asdf;
		puts("Haha, I know how computers count, I am l33te$t h4ck0r");
	}
}
          

Which would result in compilation error revealing deduced type:

holz@ACO > clang++ source.cc -std=c++17
source.cc:13:21: error: implicit instantiation of undefined template 'TD<const int>'
                TD <decltype(res)> asdf;
                                  ^
source.cc:6:7: note: template is declared here
class TD;
      ^
1 error generated.
          

The interesting part is 'TD<const int>', which reveals what type of res really is. Scott has done good job describing why we can't ask compiler about deduced types in some more standard way and I encourage reader once again to consult his book for details.

That's all what I wanted to share. I am sure that reader will find more interesting ways of leveraging ast dump feature of clang to inspect It's own code.