Comments Page - Calculating the Fibonacci numbers on GPU

« Back Calculating the Fibonacci numbers on GPUveitner.bearblog.devSubmitted by rbanffy 4 days ago

lb-guilherme 3 hours ago
The algorithm here, fast doubling, is sequential (there is no parallelism) and is quite fast, with less than a hundred operations to arrive to the final result. This runs in nanosecond on a CPU and the benchmark in the article is mostly measuring the communication time rather than actual computation time.
- Someone 3 hours ago
  They also do not make use of the fact that matrix multiplication is associative, so M⁴ = M² × M², M⁸ = M⁴ × M⁴, etc. That means one can compute F32 using 5 matrix multiplications. This uses 31.
  It wouldn’t surprise me at all if doing those 5 matrix multiplications on a CPU were faster than doing 31 on the GPU.
  Also, FTA: “To calculate extremly large fibonacci numbers we need to avoid integer overflow. A simple way to avoid that is to just take everything modulo a given number in the matrix multiplication“
  ⇒ This also is NOT calculating the Fibonacci numbers, copping out when things get difficult on a GPU.
  If you want to compute Fn mod some constant for 1 ≤ n ≤ some large constant fast on a GPU, I think I would compute quite a few of the intermediate matrices using the fast way, then spin up some ‘threads’ to compute the intermediate versions by copying those with the ‘standard’ matrix.
  KeplerBoy 2 hours ago
  The matrix in question is tiny (2x2). There's no hope a GPU would ever outperform a CPU on that.
  It only gets interesting if you need a large matmuls or millions of small matmuls in parallel.
eesmith 3 hours ago
> Using the power of GPUs we are able to calculate F99999999 in only 17 milliseconds on my Consumer GPU
FWIW, on my 2020 era laptop, the following:
```
  #include <stdio.h>
  
  int main(void) {
    int a = 1, b = 1, c, n = 99999999;
    while (--n) {
      c = (a+b) % 9837;
      a = b;
      b = c;
    }
    printf("%d\n", a);
    return 0;
  }
```
takes 330 ms measured as "time a.out".
The problem with Fibonacci is there are algorithmic ways to speed things up. The following (using the method at https://en.wikipedia.org/wiki/Fibonacci_sequence to compute Fibonacci numbers recursively in O(log n) arithmetic operations) takes Python 0.1 ms to compute the same value:
```
  import functools

  @functools.cache
  def F(n):
      if n <= 2:
          return 1

      m = n // 2 # rounds down
      if n % 2:
          # odd
          return (F(m+1)**2 + F(m)**2) % 9837
      else:
          return ((2*F(m+1) - F(m))*F(m)) % 9837

  print(F(99999999))
```
- qsort 10 minutes ago
  Well, there's a closed formula right? I wonder if it's faster to carry out the computation in Z[sqrt(5]).
- amelius an hour ago
  Maybe a better test is digits of Pi?
  Or recursive hashing?
  staunton an hour ago
  A test of what?
  amelius an hour ago
  Test of your GPU approach/library versus CPU.