In my previous post I talked about recursion problems in a Fibonacci function using 64-bit variables as function parameters, compiled using the Microsoft Visual C++ compiler. It turned out that while tail recursion was enabled by the compiler using 32-bit types it didn't really when switching to 64-bit ones. Just as a reminder, Tail Recursion is an optimization performed by the compiler. It is the process of transforming certain types of tail calls into jumps instead of function calls.
My conclusion was that tail recursion is not handled properly by the Visual C++ compiler and a possible explanation could be the presence of a bug.
The calculation of Fibonacci sequences of big integers is not an everyday task but it can still be a reliable example to show how tail calls are implemented.
Not happy with my conclusions and following several suggestions of users' comments (here on the blog, on Reddit and on Stack Overflow) I wanted to understand more about this issue and to explore other solutions using different compilers.
Let's pick again the Visual Studio project I used in my previous post and hosted on GitHub. This was the Fibonacci function used:
typedef unsigned long long ULONG64;
ULONG64 fib_tail_x64(ULONG64 n, ULONG64 res, ULONG64 next)
{
if (n == 0) {
return res;
}
return fib_tail_x64(n - 1, next, res + next);
}
As you might recall, compiling it in Release mode (which is generally needed to enable tail recursion due to the compiler optimization command /Ox), and targeting Win32 platforms resulted in having tail recursion disabled:
Tail Recursion is not Enabled! It's an ugly assembly though I didn't omit frame stack pointers
One thing I didn't try to do (well to be honest I actually completely forgot) was to pay attention to the solution platform chosen to build the project. In Visual Studio Win32, which means to compile the solution using x86 processor instructions, is the default configuration set. Looking carefully at the generated assembly in the picture above it can be seen that registers used EAX, EBX, etc are, accordingly with the configuration chosen, 32-bit processor registers.
Switching configuration to x64, building and running the project resulted in this assembly:
Tail Recursion is back!!!
Surprise!! Tail recursion is back, x64 registers are used instead (RAX, RDX, etc.) and the assembly is much shorter and cleaner!
Summarizing what said so far for tail calls we can take the following table as a reference:
It seems then that the problem is only "restricted" when a x86 platform is targeted and x64 variables used.
At this stage I honestly couldn't really say "it's definitely a bug in the MS compiler" so still not satisfied I went trying the same experiment using another compiler.
This time I decided to change operating system altogether and use Clang, installed on my Macbook as part of XCode and based on LLVM.
Terminal - clang version
To view the generated assembly I used a handy tool called Hopper Disassembler, available on both OS X and Linux (but for gdb debugger would have sufficed too I guess).
I repeated the exact same experiment. The following image shows the first generated assembly:
Tail Recursion is clearly enabled: JNE (Jump Not Equal) is just a conditional jump when ZF (the "zero" flag) is equal to 0. The red arrow on the left points at the address where the program execution jumps if the condition is true. The careful observer might have already noticed that this code is not compiled for 32-bit processors. The assembly registers are actually 64-bit (RBD, RDI, etc.).
Our goal here is to verify instead that targeting a 32-bit platform tail recursion would still be enabled.
It is possible to force the compiler to generate 32-bit code, using the compiler flag -m32. Building again the solution in 32-bit mode I obtained:
Tail Recursion enabled!
Bingo!!! By looking at the type of registers we can state that the executable is targeting the correct architecture we chose, and look at the last instruction the surprisingly result: Tail Recursion is enabled for 32-bit too!!!!!
It seemed that a lot of users could not reproduce this problem (see comments below and on Reddit too). I went playing with the compiler optimization options, tweaking some parameters. By default using Release mode this is the default configuration for optimization:
Default optimization options
After a few attempts I changed 'Whole Program Optimization' from /GL (yes) to No. I changed 'Omit Frame Pointers' too as I was pointed out that the generated x86 assembly, with frame pointers, is way longer and uglier (and we probably save a few clock cycles avoiding some cache misses too). There is a nice explanation on Stack Overflow if you want to read more about omitting frame pointers.
However, with those changes in mind, I recompiled the solution using Win32 and I got this surprising result:
Tail recursion for x86. Worked by disabling 'Whole Program Optimization'
A few users pointed out that regardless of WPO (Whole Program Optimization) they did not experience any issue with tail recursion. This has puzzled me for a while until I realized I was not paying attention to one important detail: the way I was calling the fibonacci function from the main. Since so far I've used this function call to test my x64 fibonacci:
ULONG64 res = fib_tail_x64(40, 0, 1);
So far so good, nothing special there. I'm just passing literals as parameters.
What if we use a variable, or instead a function pointer? Let's try this:
ULONG64 a = 40;
ULONG64 res = fib_tail_x64(a, 0, 1);
Although it doesn't look like a big change, calling the fib_tail_x64 using variables, it enables tail recursion regardless of WPO (as long as an optimization flag >= /O2 is used). A Reddit user highlighted that calling the function indirectly through pointers (regardless of using literals) would produce the same result:
volatile bool a = true;
ULONG64 res = 0;
ULONG64(*ptr)(ULONG64 n, ULONG64 res, ULONG64 next) = NULL;
if (a)
{
ptr = &fib_tail_x64;
}
res = ptr(40, 0, 1);
Although I was initially sure that a possible explanation for the absence of tail recursion was the presence of a bug in the Visual C++ compiler it turned out it is not. The assemblies produced by both VC++ and Clang are very similar for x86 platforms. My first conclusion was to keep in mind is to disable 'Whole Program Optimization', if and only if you are calling the x64 recursive function directly and passing literals as parameters.
After a second analysis it emerged that for a few users this was not happening as they were calling the function directly using variables or indirectly through function pointers.
I would be still interested to hear some thoughts from somebody working on the Visual C++ compiler team to know the reason behind this "issue".
See you soon!
This article was originally published at Coding Adventures Blog. Republished by the author permission.