Comments Page - DeepSeek open source DeepEP – library for MoE training and Inference

« Back DeepSeek open source DeepEP – library for MoE training and Inferencegithub.comSubmitted by helloericsf 4 hours ago

pama 2 hours ago
I feel like a kid in a candy shop. Some of these tricks would take way too long to reverse engineer correctly based on the papers. I hope that the releases this week start a renaissance of the use of MoE as baseline academic models.
helloericsf 4 hours ago
- Efficient and optimized all-to-all communication - Both intranode and internode support with NVLink and RDMA - High-throughput kernels for training and inference prefilling - Low-latency kernels for inference decoding - Native FP8 dispatch support - Flexible GPU resource control for computation-communication overlapping X: https://x.com/deepseek_ai/status/1894211757604049133
ofou 3 hours ago
You gotta love these guys, they're really pushing the open source frontier for all of us, thanks for sharing
- grg0 3 hours ago
  Open AI™ (with a space)
  hackit2 2 hours ago
  Kind of ironic that DeepSeek is more Open than ChatGPT
  gostsamo 2 hours ago
  They do it for their own reasons, but OpenAI are straight up liars and they are neither open nor give a fuck about humanity.
  chefandy 16 minutes ago
  OpenAyyyyI swear babe I’m gonna open it up any day. Yeah for that grated good or whatever it is you keep yappin about.
  echelon 2 hours ago
  I hope you're reading this Sam Altman:
  Make Open AI open.
  Or else you'll lose to the ecosystem.
deyiao 19 minutes ago
Now it includes the highly anticipated PTX! Of course, I don’t understand it, but I’ve already click the star and even the fork button, which basically means I’ve mastered it, right? I feel incredibly powerful right now...
deyiao 2 hours ago
Is the PTX that everyone was looking forward to included this time?
- find0x90 2 hours ago
  Yes, there's some in the csrc/kernels directory. Search for 'asm' to find uses of it.
- swyx an hour ago
  > the PTX that everyone was looking forward to
  explanation for the rest of us why this is so important?
Bimos 3 hours ago
The PTX instructions they talked about in the tech report should be pointing to the code here?
- zardinality an hour ago
  "For extreme performance, we discover and use a behavior-out-of-doc PTX instruction: ld.global.nc.L1::no_allocate.L2::256B. This instruction will lead to an undefined behavior: accessing volatile GPU memory with non-coherent read-only PTX modifiers .nc. But the correctness is tested to be guaranteed with .L1::no_allocate on Hopper architectures, and performance will be much better. If you find kernels not working on some other platforms, you may add DISABLE_AGGRESSIVE_PTX_INSTRS=1 to setup.py and disable this, or file an issue."
- helloericsf 3 hours ago
  this might help: https://x.com/main_horse/status/1894215779521794058/photo/1
rvz 2 hours ago
Round 2 of open source releases from an actual "Open AI™" company and licensed under MIT.
Once again, DeepSeek is more open than the $157B+ one that is claiming to be "Open".
Almost no-one is talking about Meta's Llama and everyone should expect them to release Llama 4 with reasoning.
The objective is to not be squeezed in the middle of the race to zero.
- swyx an hour ago
  https://www.llama.com/events/llamacon/signup/