Skip to content

Performance in the Fibonacci benchmark is almost entirely determined by factors other than function call overhead #358

@dlnnlsn

Description

@dlnnlsn

In the readme for the Fibonacci benchmark, we see the following requirements for different implementations of the benchmark:

ALL IMPLEMENTAITONS MUST...
Have a function that recursively compute a fibonacci number with this naive algorithm
Base case for input 0
Base case for input 1
Must make two recursive calls for each non-base invocation
No result caching, conversion to tail recursion, or iterative solutions.

This is because the benchmark Emphasizes function call overhead, stack pushing / popping, and recursion.

This is misleading. The recursive calls in the reference implementation are tail calls, and gcc at optimization levels higher than -O1 does optimize this by eliminating one of function calls. Thus the compiler produces assembly that is roughly equivalent to the following c code:

int32_t fibonacci(int32_t n) {
  int32_t result = 0;
  while (n > 1) {
    result += fibonacci(n - 1);
    n -= 2;
  }
  if (n == 1) result += 1;
  return result;
}

The Fortran compiler gfortran goes even further. It "unrolls" the recursive calls, and produces assembly that is roughly equivalent to the following c code:

int32_t fibonacci(int32_t n) {
  if (n == 0) return 0;
  if (n == 1) return 1;
  if (n == 2) return 1;
  if (n == 3) return 2;
  int32_t a = fibonacci(n - 3);
  int32_t b = fibonacci(n - 4);
  if (n == 4) return a + 2;
  a += b;
  int32_t c = fibonacci(n - 5);
  b += c;
  a += b;
  if (n == 5) return b + 1;
  return a + b + c + fibonacci(n - 6);
}

(See the output on Compiler Explorer)

If tail calls are not optimized, then the reference implementation requires 331160280 function calls to calculate fibonaaci(40), while the fortran version only requires 313883 function calls. This is calculated as follows:

  • If T_n is the number of function calls that a naive recursive solution requires to calculate the nth Fibonacci number, then we have that T_0 = T_1 = 0, and T_{n + 2} = 2 + T_{n + 1} + T_n.
  • If U_n is the number of function calls that the fortran optimization requires to calculate the nth Fibonacci number, then we have that U_0 = U_1 = U_2 = U_3 = 0, U_4 = 2, U_5 = 3, and U_{n + 6} = 4 + U_n + U_{n + 1} + U_{n + 2} + U_{n + 3}.

Thus, contrary to its stated goal, using this benchmark to compare c and fortran does not give us an accurate comparison of the overhead that each language introduces when making recursive function calls. Instead the difference in performance is almost entirely a result of the compilers' ability to remove the function calls altogether.

At the very least, the readme should clarify that while the source code for each implementation is not allowed to eliminate one of the recursive calls, and must make exactly two recursive calls on each iteration, and that these recursive calls must calculate the (n - 1)th and (n - 2)th Fibonacci numbers, the compiler may still make whatever optimizations it is capable of.

Metadata

Metadata

Assignees

No one assigned

    Labels

    levenshteinAnything regarding the levenshtein benchmark/test

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions