• pama 2 hours ago

    I feel like a kid in a candy shop. Some of these tricks would take way too long to reverse engineer correctly based on the papers. I hope that the releases this week start a renaissance of the use of MoE as baseline academic models.

    • helloericsf 4 hours ago

      - Efficient and optimized all-to-all communication - Both intranode and internode support with NVLink and RDMA - High-throughput kernels for training and inference prefilling - Low-latency kernels for inference decoding - Native FP8 dispatch support - Flexible GPU resource control for computation-communication overlapping X: https://x.com/deepseek_ai/status/1894211757604049133

      • ofou 3 hours ago

        You gotta love these guys, they're really pushing the open source frontier for all of us, thanks for sharing

        • grg0 3 hours ago

          Open AI™ (with a space)

          • hackit2 2 hours ago

            Kind of ironic that DeepSeek is more Open than ChatGPT

            • gostsamo 2 hours ago

              They do it for their own reasons, but OpenAI are straight up liars and they are neither open nor give a fuck about humanity.

              • chefandy 16 minutes ago

                OpenAyyyyI swear babe I’m gonna open it up any day. Yeah for that grated good or whatever it is you keep yappin about.

            • echelon 2 hours ago

              I hope you're reading this Sam Altman:

              Make Open AI open.

              Or else you'll lose to the ecosystem.

          • deyiao 19 minutes ago

            Now it includes the highly anticipated PTX! Of course, I don’t understand it, but I’ve already click the star and even the fork button, which basically means I’ve mastered it, right? I feel incredibly powerful right now...

            • deyiao 2 hours ago

              Is the PTX that everyone was looking forward to included this time?

              • find0x90 2 hours ago

                Yes, there's some in the csrc/kernels directory. Search for 'asm' to find uses of it.

                • swyx an hour ago

                  > the PTX that everyone was looking forward to

                  explanation for the rest of us why this is so important?

                • Bimos 3 hours ago

                  The PTX instructions they talked about in the tech report should be pointing to the code here?

                  • zardinality an hour ago

                    "For extreme performance, we discover and use a behavior-out-of-doc PTX instruction: ld.global.nc.L1::no_allocate.L2::256B. This instruction will lead to an undefined behavior: accessing volatile GPU memory with non-coherent read-only PTX modifiers .nc. But the correctness is tested to be guaranteed with .L1::no_allocate on Hopper architectures, and performance will be much better. If you find kernels not working on some other platforms, you may add DISABLE_AGGRESSIVE_PTX_INSTRS=1 to setup.py and disable this, or file an issue."

                    • helloericsf 3 hours ago
                    • rvz 2 hours ago

                      Round 2 of open source releases from an actual "Open AI™" company and licensed under MIT.

                      Once again, DeepSeek is more open than the $157B+ one that is claiming to be "Open".

                      Almost no-one is talking about Meta's Llama and everyone should expect them to release Llama 4 with reasoning.

                      The objective is to not be squeezed in the middle of the race to zero.