• ribhu97 a day ago

    How does this compare to modal (modal.com)? Faster cold-start? Easier config? Asking because I've used modal quite a bit for everything from fine-tuning LLMs to running etl pipelines and it works well for me, and I haven't found any real competitors for them to even think of switching.

    • za_mike157 a day ago

      Modal is a great platform!

      In terms of cold starts, we seem to be very comparable from what users have mentioned and tests we have run.

      Easier config/setup is feedback we have gotten from users since we don't have and special syntax or a "Cerebrium way" of doing things which makes migration pretty easier as well as doesn't lock you in which some engineers appreciate. We just run your Python code as is with an extra .toml setup file.

      Additionally, we offer AWS Inferentia/Tranium nodes which offer a great price/performance trade-offs for many open-Source LLM's - even when using TensorRT/vLLM on Nvidia GPU's and gets rid of the scarcity problem. We plan to support TPU's and others in future.

      We are listed on AWS Marketplace as well as others which means you can subtract your Cerebrium cost from your commited cloud spend.

      Two things we are working on that will hopefully make us a bit different is: - GPU checkpointing - Running compute in your own cluster to use credits/for privacy concerns.

      Where Modal does really shine is training/data-processing use cases which we currently don't support too well. However, we do have this on our roadmap for the near future.

      • doctorpangloss 13 hours ago

        Why use modal instead of SkyPilot?

        • za_mike157 4 hours ago

          I haven't used SkyPilot so I am unfamiliar with the experience and performance.

          However, some of the situations you would like to use Cerebrium over Skypilot are: - You don't want to manage you own hardware - Reduced costs: With serverless Runtime and low cold starts (unclear if SkyPiolet offers this and what the peformance is like if they do) - Rapid iteration: Unclear of the deployment process on SkyPilot and how long projects take to go live - Observability: Looks like you would just have k8s metrics at your disposal

      • tmshapland 21 hours ago

        We use Cerebrium for our Mixpanel for Voice AI product (https://voice.canonical.chat). Great product. So much easier to set up and more robust than other model hosting providers we've tried (especially AWS!). Really nice people on the the team, too.

        • za_mike157 21 hours ago

          Thanks Tom! Excited to to support you and the team as you grow

        • android521 12 hours ago

          i had no idea what it does and just have this vague idea that they make it easy for you to deploy, host and use models. I looked the tutorials and was amazed by what can be done and decided to try it. My suggestion is to have more tutorials and perhaps one-click deployment for some really cool models. Another thing is support typescript and you will capture a big section of the developer market that does not come from ML background. After i finish an inteview with tutorial demo , it would be great to give an estimate of cost so that i know if i can afford to cost for my software

          • jono_irwin 12 hours ago

            Thanks for the feedback! I like the sound of all of those:

            - clearer messaging - more tutorials - one-click deploys - clear & upfront costing

            We have plans to add other runtimes (like Typescript) in the future but Python is our focus for now.

          • mdaniel 16 hours ago

            Being a toml-n00b, why is this quoted? https://github.com/CerebriumAI/examples/blob/85815f8e09e9e77...

            Related to that, it seems the syntax isn't documented https://docs.cerebrium.ai/cerebrium/environments/config-file...

            • za_mike157 15 hours ago

              Do you mean why the individual file names aren't quoted?

              You can see an example config file at the bottom of that link you attached - agreed we should probably make it more obvious

              • mdaniel 14 hours ago

                heh, I don't need an example in the docs, the whole repo is filled with examples, but unless you expect some poor soul to do $(grep -r ^include . | sort | uniq) and guess from there, what I'm saying in that the examples -- including the bare bones one in your documentation -- do not SPECIFY what the glob syntax is. The good thing about standards is that there are so many to choose from, so: python's os.glob, golang's glob, I'm sure rust-lang has one, bash, ... I'm sure I could keep going

                As for the quoting part, it's mysterious to me why a structured file would use a quoted string for what is obviously an interior structure. Imagine if you opened a file and saw

                  fred = "{alpha: ['beta', 'charlie''s dog', 'delta']}"
                
                wouldn't you strongly suspect that there was some interior syntax going on there?

                Versus the sane encoding of:

                  fred:
                    alpha:
                    - beta
                    - charlie's dog
                    - delta
                
                in a normal markup language, no "inner/outer quoting" nonsense required

                But I did preface it with my toml n00b-ness and I know that the toml folks believe they can do no wrong, so maybe that's on purpose, I dunno

            • benjamaan a day ago

              Congrats and thank you! We’ve been a happy customer since early on. Although we don’t have much usage, our products are mostly R&D, having Cerebrium made it super easy to launch cost effectively on tight budgets and run our own models within our apps.

              The support is next level - team is ready to dive into any problem, response is super fast, and has helped us solve a bunch of dev problems that a normal platform probably won’t.

              Really excited to see this one grow!!

              • za_mike157 a day ago

                Thank you - appreciate the kind words! Happy to continue supporting you and the team.

              • undefined 13 hours ago
                [deleted]
                • ekojs a day ago

                  Congrats on the launch!

                  We're definitely looking for something like this as we're looking to transition from Azure's (expensive) GPUs. I'm curious how you stack against something like Runpod's serverless offering (which seems quite a bit cheaper). Do you offer faster cold starts? How long would a ~30GB model load takes?

                  • za_mike157 a day ago

                    Yes RunPod does have cheaper pricing than us however they don't allow you to specify your exact resources but rather charge you the full resource (see example of A100 above) so depending on your resource requirements our pricing could be competitive since we charge you only for the resources you use.

                    In terms of cold starts, they mentioned their cold starts are 250ms which I am not sure what workload that is on, or if we have the same measure of cold starts. We have had quite a few customers that we have told us we are quite a bit faster 2-4 seconds vs ~10 seconds although we haven't confirmed this ourselves.

                    For a 30GB model, we have a few ways to speed this up such as using the Tensorizer framework from Coreweave, we cache model files in our distributed caching layer but I would need to test. We see reads of up to 1GB/s. If you tell me the model you are running (if open-source) I can get results to you - you can message me on our Slack/Discord community or email me at michael@cerebrium.ai or

                    • spmurrayzzz a day ago

                      > Yes RunPod does have cheaper pricing than us however they don't allow you to specify your exact resources but rather charge you the full resource (see example of A100 above) so depending on your resource requirements our pricing could be competitive since we charge you only for the resources you use.

                      I may be misunderstanding your explanation a bit here, but Runpod's serverless "flex" tier looks like the same model (it only charges you for non-idle resources). And at that tier they are still 2x cheaper for A100, at your price point with them you could rent an H100.

                      • za_mike157 a day ago

                        Ah I see they recently cut their pricing by 40% so you are correct - sorry about that. It seems we are more expensive compared to their new pricing

                        • spmurrayzzz 21 hours ago

                          FWIW Their most expensive flex price I've ever seen for 80GB A100 was $0.00130 back in January of this year, which is still cheaper albeit by a smaller magnitude, if that's helpful at all for your own competitive market analysis.

                          (Congrats on the launch as well, by the way).

                      • risyachka 21 hours ago

                        Yeah Runpods cold start is definitely not 250ms, not even close. Maybe for some models idk but a huggingface model 8B params takes like 30 seconds to cold start in their serverless "flash" configuration.

                        • za_mike157 20 hours ago

                          Thanks for confirming! Our cold start, excluding model load is 2-4 seconds typically for HF models.

                          The only time it gets much longer when companies have done a lot with very specific CUDA implementations

                    • eh9 a day ago

                      Congratulations on the launch!

                      I just shared this on Slack and it looks like the site description has a typo: "A serverless AI infrastructure platform [...] customers experience a 40%+ cost savings as opposed to AWS of GCP"

                      • za_mike157 a day ago

                        Thank you - updated! My team makes fun of my spelling all the time!

                      • yuppiepuppie a day ago

                        Very nice demo!

                        When you ran it the first time, it took a while to load up. Do subsequent runs go faster?

                        And what cloud provider are you all using under the hood? We work in a specific sector that excludes us from using certain cloud providers (ie. AWS) at my company.

                        • za_mike157 a day ago

                          You are correct! After the first request, an image will be on a machine and it’s cached for future use. This makes subsequent container startups much faster. We also route requests to machines where the image is already cached as well as dedupe content between images in order to make startups faster

                          We are running on top of AWS however can run on top of any cloud provider as well as are working on you using your own cloud. Happy to hear more about your use case and see if we can help you at all - email me at michael@cerebrium.ai.

                          PS: I will state that vLLM has shocking load times into VRam that we are resolving.

                        • abraxas 19 hours ago

                          Would this be a direct competitor of paperspace? If yes what do you feel are your strenghts vis-a-vis paperspace?

                          • jono_irwin 18 hours ago

                            There are definitely some parallels between Cerebrium and paperspace, but I don't think they are a direct competitor. The biggest difference being that paperspace doesn't have a serverless offering afaik.

                            Cerebrium abstracts some functionality - like streaming and batching endpoints. I think you would need to build that yourself on paperspace.

                            • abraxas 13 hours ago

                              Paperspace lets you bring your own containers and will scale them automatically. I don't know whether that would qualify as "serverless".

                              • za_mike157 5 hours ago

                                I guess then the next question would be how quickly can they start executing your container from cold start when a workload comes in? Typically we see companies on around 30-60s

                          • undefined 18 hours ago
                            [deleted]
                            • mceachen a day ago

                              Good luck on your launch! Your loom drops audio after 4m25s.

                              • za_mike157 a day ago

                                Thanks for pointing that out!

                              • chaosinblood 9 hours ago

                                which ui framework do you use? it's very nice