Comments Page - Compiling models to megakernels

« Back Compiling models to megakernelsblog.luminal.comSubmitted by jafioti a day ago

measurablefunc 3 hours ago
There are only 4 optimizations in computer science: inlining, partial evaluation, dead code elimination, & caching. It looks like AI researchers just discovered inlining & they already knew about caching so eventually they'll get to partial evaluation & dead code elimination.
- johndough 2 hours ago
  Which categories do algorithmic optimizations fall under? For example:
  Strassen algorithm for matrix multiplication https://en.wikipedia.org/wiki/Strassen_algorithm
  FFT convolution https://dsp.stackexchange.com/a/63211
  Winograd convolution https://www.cv-foundation.org/openaccess/content_cvpr_2016/p...
  And of course optimization algorithms themselves.
  j-pb an hour ago
  Partial evaluation on the symbolic structure of the problem.
- imtringued 38 minutes ago
  Your list is so short it doesn't even include the basics such as reordering operations.
  It also feels incredibly snarky to say "they knew about caching" and that they will get to partial evaluation and dead code elimination, when those seem to be particularly useless (beyond what the CUDA compiler itself does) when it comes to writing GPU kernels or doing machine learning in general.
  You can't do any partial evaluation of a neural network because the activation functions are interrupting the multiplication of tensors. If you remove the activation function, then you end up with two linear layers that are equivalent to one linear layer, defeating the point of the idea. You could have trained a network with a single layer instead and achieved the same accuracy with a corresponding shorter training/inference time.
  Dead code elimination is even more useless since most kernels are special purpose to begin with and you can't remove tensors without altering the architecture. Instead of adding useless tensors only to remove them, you could have simply used a better architecture.
- fragmede 3 hours ago
  Dead code elimination is already a technique in AI when someone takes an MoE model and removes an unused "E" from it.
- mxkopy 3 hours ago
  AI actually has some optimizations unique to the field. You can in fact optimize a model to make it work; not a lot of other disciplines put as much emphasis on this as AI
  tossandthrow 2 hours ago
  Can you list these optimizations?
  mxkopy 2 hours ago
  RLHF is one that comes to mind
  tossandthrow an hour ago
  Well, this is an entirely other category of optimizations - not program performance but model performance.
  lucrbvi 22 minutes ago
  Yes, in "runtime optimization" the model is just a computation graph so we can use a lot of well known tricks from compilation like dead code elimination and co..
  tossandthrow 17 minutes ago
  We are getting closer!
  What other optimizations are there that can be used than what explicitly falls into the 4 categories that the top commenter here listed out?