• hmate9 a minute ago

    About 600GB needed for weights alone, so on AWS you need an p5.48xlarge (8× H100) which costs $55/hour.

    • Tepix 3 hours ago

      Huggingface Link: https://huggingface.co/moonshotai/Kimi-K2.5

      1T parameters, 32b active parameters.

      License: MIT with the following modification:

      Our only modification part is that, if the Software (or any derivative works thereof) is used for any of your commercial products or services that have more than 100 million monthly active users, or more than 20 million US dollars (or equivalent in other currencies) in monthly revenue, you shall prominently display "Kimi K2.5" on the user interface of such product or service.

      • endymi0n 19 minutes ago

        One. Trillion. Even on native int4 that’s… half a terabyte of vram?!

        Technical awe at this marvel aside that cracks the 50th percentile of HLE, the snarky part of me says there’s only half the danger in giving something away nobody can run at home anyway…

        • dheera 2 hours ago

          > or more than 20 million US dollars (or equivalent in other currencies) in monthly revenue, you shall prominently display "Kimi K2.5" on the user interface of such product or service.

          Why not just say "you shall pay us 1 million dollars"?

          • viraptor 23 minutes ago

            Companies with $20M revenue will not normally have spare $1M available. They'd get more money by charging reasonable subscriptions than by using lawyers to chase sudden company-ending fees.

            • vessenes 42 minutes ago

              ? They prefer the branding. The license just says you have to say it was them if you make > $250mm a year on the model.

              • clayhacks 2 hours ago

                I assume this allows them to sue for different amounts. And not discourage too many people from using it.

              • Imustaskforhelp 2 hours ago

                Hey have they open sourced all Kimi k2.5 (thinking,instruct,agent,agent swarm [beta])?

                Because I feel like they mentioned that agent swarm is available their api and that made me feel as if it wasn't open (weights)*? Please let me know if all are open source or not?

              • bertili an hour ago

                The "Deepseek moment" is just one year ago today!

                Coincidence or not, let's just marvel for a second over this amount of magic/technology that's being given away for free... and how liberating and different this is than OpenAI and others that were closed to "protect us all".

                • Barathkanna 10 minutes ago

                  A realistic setup for this would be a 16× H100 80GB with NVLink. That comfortably handles the active 32B experts plus KV cache without extreme quantization. Cost-wise we are looking at roughly $500k–$700k upfront or $40–60/hr on-demand, which makes it clear this model is aimed at serious infra teams, not casual single-GPU deployments. I’m curious how API providers will price tokens on top of that hardware reality.

                  • bertili a minute ago

                    The other realistic setup is for a small company that needs a private AI for coding or other internal agentic use with two Mac Studios connected over thunderbolt 5 RMDA. $20k.

                  • jumploops 4 hours ago

                    > For complex tasks, Kimi K2.5 can self-direct an agent swarm with up to 100 sub-agents, executing parallel workflows across up to 1,500 tool calls.

                    > K2.5 Agent Swarm improves performance on complex tasks through parallel, specialized execution [..] leads to an 80% reduction in end-to-end runtime

                    Not just RL on tool calling, but RL on agent orchestration, neat!

                    • vinhnx 28 minutes ago

                      One thing caught my eyes is that besides K2.5 model, Moonshot AI also launched Kimi Code (https://www.kimi.com/code), evolved from Kimi CLI. It is a terminal coding agent, I've been used it last month with Kimi subscription, it is capable agent with stable harness.

                      GitHub: https://github.com/MoonshotAI/kimi-cli

                      • Reubend 3 hours ago

                        I've read several people say that Kimi K2 has a better "emotional intelligence" than other models. I'll be interested to see whether K2.5 continues or even improves on that.

                        • storystarling 2 hours ago

                          yes, though this is highly subjective - it 'feels' like that to me as well (comapred to Gemini 3, GPT 5.2, Opus 4.5).

                        • monkeydust 20 minutes ago

                          Is this actually good or just optimized heavily for benchmarks? I am hopefully its the former based on the writeup but need to put it through its paces.

                          • pu_pe an hour ago

                            I don't get this "agent swarm" concept. You set up a task and they boot up 100 LLMs to try to do it in parallel, and then one "LLM judge" puts it all together? Is there anywhere I can read more about it?

                            • vessenes 38 minutes ago

                              You can read about this basically everywhere - the term of art is agent orchestration. Gas town, Claude’s secret swarm mode, or people who like to use phrases like “Wiggum loop” will get you there.

                              If you’re really lazy - the quick summary is that you can benefit from the sweet spot of context length and reduce instruction overload while getting some parallelism benefits from farming tasks out to LLMs with different instructions. The way this is generally implemented today is through tool calling, although Claude also has a skills interface it has been trained against.

                              So the idea would be for software development, why not have a project/product manager spin out tasks to a bunch of agents that are primed to be good at different things? E.g. an architect, a designer, and so on. Then you just need something that can rectify GitHub PRs and bob’s your uncle.

                              Gas town takes a different approach and parallelizes on coding tasks of any sort at the base layer, and uses the orchestration infrastructure to keep those coders working constantly, optimizing for minimal human input.

                              • rvnx 44 minutes ago

                                You have a team lead that establishes a list of tasks that are needed to achieve your mission

                                then it creates a list of employees, each of them is specialized for a task, and they work in parallel.

                                Essentially hiring a team of people who get specialized on one problem.

                                Do one thing and do it well.

                                • jonkoops an hour ago

                                  The datacenters yearn for the chips.

                                • zmmmmm 3 hours ago

                                  Curious what would be the most minimal reasonable hardware one would need to deploy this locally?

                                  • NitpickLawyer 2 hours ago

                                    I parsed "reasonable" as in having reasonable speed to actually use this as intended (in agentic setups). In that case, it's a minimum of 70-100k for hardware (8x 6000 PRO + all the other pieces to make it work). The model comes with native INT4 quant, so ~600GB for the weights alone. An 8x 96GB setup would give you ~160GB for kv caching.

                                    You can of course "run" this on cheaper hardware, but the speeds will not be suitable for actual use (i.e. minutes for a simple prompt, tens of minutes for high context sessions per turn).

                                  • spaceman_2020 3 hours ago

                                    Kimi was already one of the best writing models. Excited to try this one out

                                    • Alifatisk 37 minutes ago

                                      To me, Kimi has been the best with writing and conversing, its way more human like!

                                    • Topfi 2 hours ago

                                      K2 0905 and K2 Thinking shortly after that have done impressively well in my personal use cases and was severely slept on. Faster, more accurate, less expensive, more flexible in terms of hosting and available months before Gemini 3 Flash, I really struggle to understand why Flash got such positive attention at launch.

                                      Interested in the dedicated Agent and Agent Swarm releases, especially in how that could affect third party hosting of the models.

                                      • msp26 an hour ago

                                        K2 thinking didn't have vision which was a big drawback for my projects.

                                      • striking 3 hours ago
                                        • Jackson__ 2 hours ago

                                          As your local vision nut, their claims about "SOTA" vision are absolutely BS in my tests.

                                          Sure it's SOTA at standard vision benchmarks. But on tasks that require proper image understanding, see for example BabyVision[0] it appears very much lacking compared to Gemini 3 Pro.

                                          [0] https://arxiv.org/html/2601.06521v1

                                          • pplonski86 3 hours ago

                                            There are so many models, is there any website with list of all of them and comparison of performance on different tasks?

                                            • Reubend 3 hours ago

                                              The post actually has great benchmark tables inside of it. They might be outdated in a few months, but for now, it gives you a great summary. Seems like Gemini wins on image and video perf, Claude is the best at coding, ChatGPT is the best for general knowledge.

                                              But ultimately, you need to try them yourself on the tasks you care about and just see. My personal experience is that right now, Gemini Pro performs the best at everything I throw at it. I think it's superior to Claude and all of the OSS models by a small margin, even for things like coding.

                                              • Imustaskforhelp 2 hours ago

                                                I like Gemini Pro's UI over Claude so much but honestly I might start using Kimi K2.5 if its open source & just +/- Gemini Pro/Chatgpt/Claude because at that point I feel like the results are negligible and we are getting SOTA open source models again.

                                                • wobfan 2 minutes ago

                                                  > honestly I might start using Kimi K2.5 if its open source & just +/- Gemini Pro/Chatgpt/Claude because at that point I feel like the results are negligible and we are getting SOTA open source models again.

                                                  Me too!

                                                  > I like Gemini Pro's UI over Claude so much

                                                  This I don't understand. I mean, I don't see a lot of difference in both UIs. Quite the opposite, apart from some animations, round corners and color gradings, they seem to look very alike, no?

                                              • coffeeri 3 hours ago
                                                • pplonski86 2 hours ago

                                                  Thank you! Exactly what I was looking for

                                              • DeathArrow 3 hours ago

                                                Those are some impressive benchmark results. I wonder how well it does in real life.

                                                Maybe we can get away with something cheaper than Claude for coding.

                                                • oneneptune 3 hours ago

                                                  I'm curious about the "cheaper" claim -- I checked Kimi pricing, and it's a $200/mo subscription too?

                                                  • NitpickLawyer 3 hours ago

                                                    On openrouter 2.5 is at 0.60/3$ per Mtok. That's haiku pricing.

                                                    • storystarling an hour ago

                                                      The unit economics seem tough at that price for a 1T parameter model. Even with MoE sparsity you are still VRAM bound just keeping the weights resident, which is a much higher baseline cost than serving a smaller model like Haiku.

                                                    • mrklol 2 hours ago

                                                      They also have a $20 and $40 tier.

                                                  • lrvick 3 hours ago

                                                    Actually open source, or yet another public model, which is the equivalent of a binary?

                                                    URL is down so cannot tell.

                                                    • typ an hour ago

                                                      The label 'open source' has become a reputation reaping and marketing vehicle rather than an informative term since the Hugging Face benchmark race started. With the weights only, we cannot actually audit that if a model is a) contaminated by benchmarks, b) built with deliberate biases, or c) trained on copyrighted/privacy data, let alone allowing other vendors to replicate the results. Anyways, people still love free stuff.

                                                      • Der_Einzige an hour ago

                                                        Just accept that IP laws don't matter and the old "free software" paradigm is dead. Aaron Swartz died so that GenAI may live. RMS and his model of "copyleft" are so Web 1.0 (not even 2.0). No one in GenAI cares AT ALL about the true definition of open source. Good.

                                                      • Tepix 3 hours ago

                                                        It's open weights, not open source.

                                                      • mangolie 4 hours ago

                                                        they cooked

                                                        • billyellow 4 hours ago

                                                          Cool

                                                          • rvz 2 hours ago

                                                            The chefs at Moonshot have cooked once again.