• lb-guilherme 3 hours ago

    The algorithm here, fast doubling, is sequential (there is no parallelism) and is quite fast, with less than a hundred operations to arrive to the final result. This runs in nanosecond on a CPU and the benchmark in the article is mostly measuring the communication time rather than actual computation time.

    • Someone 3 hours ago

      They also do not make use of the fact that matrix multiplication is associative, so M⁴ = M² × M², M⁸ = M⁴ × M⁴, etc. That means one can compute F32 using 5 matrix multiplications. This uses 31.

      It wouldn’t surprise me at all if doing those 5 matrix multiplications on a CPU were faster than doing 31 on the GPU.

      Also, FTA: “To calculate extremly large fibonacci numbers we need to avoid integer overflow. A simple way to avoid that is to just take everything modulo a given number in the matrix multiplication“

      ⇒ This also is NOT calculating the Fibonacci numbers, copping out when things get difficult on a GPU.

      If you want to compute Fn mod some constant for 1 ≤ n ≤ some large constant fast on a GPU, I think I would compute quite a few of the intermediate matrices using the fast way, then spin up some ‘threads’ to compute the intermediate versions by copying those with the ‘standard’ matrix.

      • KeplerBoy 2 hours ago

        The matrix in question is tiny (2x2). There's no hope a GPU would ever outperform a CPU on that.

        It only gets interesting if you need a large matmuls or millions of small matmuls in parallel.

    • eesmith 3 hours ago

      > Using the power of GPUs we are able to calculate F99999999 in only 17 milliseconds on my Consumer GPU

      FWIW, on my 2020 era laptop, the following:

        #include <stdio.h>
        
        int main(void) {
          int a = 1, b = 1, c, n = 99999999;
          while (--n) {
            c = (a+b) % 9837;
            a = b;
            b = c;
          }
          printf("%d\n", a);
          return 0;
        }
      
      takes 330 ms measured as "time a.out".

      The problem with Fibonacci is there are algorithmic ways to speed things up. The following (using the method at https://en.wikipedia.org/wiki/Fibonacci_sequence to compute Fibonacci numbers recursively in O(log n) arithmetic operations) takes Python 0.1 ms to compute the same value:

        import functools
      
        @functools.cache
        def F(n):
            if n <= 2:
                return 1
      
            m = n // 2 # rounds down
            if n % 2:
                # odd
                return (F(m+1)**2 + F(m)**2) % 9837
            else:
                return ((2*F(m+1) - F(m))*F(m)) % 9837
      
        print(F(99999999))
      • qsort 10 minutes ago

        Well, there's a closed formula right? I wonder if it's faster to carry out the computation in Z[sqrt(5]).

        • amelius an hour ago

          Maybe a better test is digits of Pi?

          Or recursive hashing?

          • staunton an hour ago

            A test of what?

            • amelius an hour ago

              Test of your GPU approach/library versus CPU.