I don't know that 37Signals counts as a "major enterprise". Their Cloud exodus can't have been more than a few dozen servers, right?
Meanwhile AWS is growing at 20%/year, Azure at 33% and GCP at 35%. That doesn't seem compatible with any kind of major cloud repatriation trend.
37signals spends more than $3M a year on cloud. So while it definitely isn't a major enterprise. It is also a lot more than a a few dozen servers.
I am not anti-cloud and pro cloud. My major problem with the new trend is that a lot of people are basically rediscovering pre "Cloud" era. which is VPS, Dedicated server and Colocation. And people are suggesting Hetzner or OVH or many other players are equivalent to AWS. While I dont disagree AWS is charging a lot for their offering, putting AWS to other services isn't even a valid comparison.
Completely ignoring the basics such as Server / CPU / RAM / SSD quality. Network quality such as interconnect, redundancy, as well as Data Center quality. If you rally want to do simple price and spec comparison you might as well go to Lowendbox to find a low cost VPS which some people have been doing since 2008.
I really wish there is a middle ground somewhere before using Hyperscalers. Both DO / Linode couldn't reach a larger scale. Hetzner is expanding their Cloud offering only and no dedicated outside EU.
Yep, Hetzner, OVH or even DO aren't even close to offering what AWS offers. Once you start exploring all the things they have to offer you understand why so many large companies use hyperscalers.
Although to be fair, most hobbyists only need basic services like cloud servers/VMs, and hyperscalers like AWS are an awful deal compared to other options if you only need compute + storage + bandwidth. You don't need to use S3, Lambdas and Cloudfront to host a personal blog, a simple VPS will be more than enough.
It feels like most devs nowadays prefer using services that abstract away the infrastructure, at the cost of not developing SysOps skills, so I don't see a future where the Cloud is going to lose relevance.
> Hetzner, OVH or even DO aren't even close to offering what AWS offers
But I think the argument is - do they need to be? How much of the services of AWS (Google/azure/etc) are really needed by the majority of customers?
For companies that need hyperscaling services, I get it. There are definite benefits to the cloud when operating at extremes (auto scaling up and down). But for the majority of workloads, I think you could be well served by a more barebones offering.
> How much of the services of AWS (Google/azure/etc) are really needed by the majority of customers?
Very many. And none of them are EC2 (or its equivalent). Any service that comes with the consumption based charging (i.e. no 24x7 running costs whether it is used or not) and offers a clearly defined functional feature, has plenty of appeal to cloud customers. Another part of the appeal is the mix and match nature of mature cloud platforms: the customers get substantial freedom to choose from services they can instantly start using, or roll (and maintain) their own albeir at a higher cost.
I.e. if the customer wants a queue, they get a queue, and nothing else. The cloud platform abstracts away and takes care of the underlying platform that «runs» the queue and eliminates the operational overhead that comes with having to look after the message broker that provides the said queue. There are other many examples.
Never seen the "operational overhead elimination" really happen in the wild. Sure, you lose the N Sysadmins, you gain at least N+1 SREs/Cloud/DevOps Engineers.
> are really needed by the majority of customers?
Some companies want the space to grow into it. At my job, we just started getting into video decoding. AWS has elastic video processing services. Where as DO would cost way more to setup those services on our own.
> you understand why so many large companies use hyperscalers.
While there are valid use-case where you get value from the extra services the hyperscaller are providing, most of the time people go for AWS “because everybody does it” or because the choice was made by a consulting company that doesn't pay the final cloud bill and is optimizing their own added value at the expenses of the customer's one.
I've been doing freelance work for 7 years now for roughly two dozens if companies of various size, and I can't tell you how many massively underused AWS / Azure VPS I've seen, but it's more than half the cloud bills of the said companies (or division for big companies since I obviously only had the vision on the division I worked for and not the whole company).
> Yep, Hetzner, OVH or even DO aren't even close to offering what AWS offers.
I think it was mentioned in the article what AWS offers. Lock-in.
> Lock-in.
So do DO, OVH, Hetzner, Azure, GCP, Oracle, IBM etc. Every cloud platform comes with a hard lock-in.
Pretty sure those lockins are not comparable
He's mixing plain hosting with "advanced"/"cloud" hosting.
There is indeed a large gap in the market between outsourcing all your infrastructure to Hyperscalers vs. hosting it on DIY-bare-metal and/or VPC providers. An open source alternative to AWS would do much to fill that gap, and we are building just that at Ubicloud (I'm one of the co-founders).
So far with Ubicloud, you get virtual machines, load balancers, private networking, managed PostgreSQL, all with encryption at rest and in-transit. The Ubicloud managed service uses Hetzner bare metal as one of its hosting providers, which cuts costs 2x - 10x compared to AWS/Azure. Would love to hear any feedback if you'd like to give it a try, or go through the repo here: https://github.com/ubicloud/ubicloud
> You can set it up yourself on these providers or you can use our managed service.
Are all the bits and pieces necessary for starting one's own managed service open source? In case somebody is interested in starting their own commercial cloud. How easy would that be to deploy?
There's OpenStack. It's a private IaaS. Had loadbalancers, ipv6 support, support for K8s hosting via the magnum component (and other container orchestrators), HA via Masakari component. The networking is very flexible. It does not currently have functions as a service, I believe that was in the Senlin component, but that's been abandoned, I believe a new incarnation of the idea is in the works though. With something like Kolla-ansible a containerized OpenStack infrastructure is pretty damn easy to manage, upgrades are just making sure you make any needded changes in the global config file (just a vimdiff with the new sample one included in the release) and then literally just a kolla-ansible upgrade -i inventory-file.yml.
I'm just a home labber and I've run OpenStack via kolla-ansible for like 7 years now, and Ceph since the jewel release I think almost 8 years ago for storage. Both are pretty easy to manage.
I totally agree, but I’ve also worked on a project with 0 customers spending about $2M/year on AWS, and there was absolutely zero incentive from the stakeholders to reduce the cost. There’s a certain disconnect between boots on the ground engineers, and decision makers when it comes to infra management.
> Server / CPU / RAM / SSD quality
I had no issues with Hetzner's component and especially network quality for a decade now. Before that, yeah, there could be some hiccups and issues. But they stopped being issues a long time ago. And really, what hardware issues do you expect on this hardware:
https://www.hetzner.com/dedicated-rootserver/brands/matrix-d...
My opinion from analyzing the 37signals cloud offboard case is that it shouldn't have been done.
They didn't save a whole lot of money from it (they aren't spending a whole lot on it anyway), and now their business ever so slightly loses focus. Not to mention, as you said, the quality aspects. Cloud gives you many things "for free" (like local disk RAID, network and power redundancy, datacenter compliance, in-transit and on-disk encryption, optimized OS images, overall network security, controls around what their employees can do - that $5/month lowendbox provider from Romania is almost certainly logging into your VM and going through your files) which you lose when going to a "pre-cloud" provider.
there's a mile of difference between Romanian lowendbox.com and renting a cage in, say, an equinix datacentre
if this approach to DC compliance/security/redundancy is good enough for the world's financial services industry then it's probably good enough for everyone else too
(but yes, then only saves about 90% of the cost instead of 95%)
Umm, millions saved in the future seems like a decent amount of money? Yeah, they paid for the hardware in year 0 that equals to their AWS bill, but the subsequent years that money is not spent on aws or new servers
Although it's private so we can never be sure, their revenue seems to be in the ballpark of $100m with about 40% margin. So even if they save a million per year, it's not worth it, especially when it's a trade-off.
> 37signals spends more than $3M a year on cloud.
Isn’t most of that just S3 storage/bandwidth?
If so, they should move to R2.
You can have multiple trends at once. Veteran cloud users leaving, international business onboarding.
And then there's me: never left the datacenter in the first place.
Wise person. Wish we hadn't. Managed to multiply costs 8x (no joke).
No way that is true if you did it properly. Practically nobody has a workload where this could be true - and it's definitely not a workload smaller than several DCs.
It doesn't work out well if you just create some long lived EC2 instances and call it a day. But that's not really using a cloud, just a VPS - and that has indeed never been cheaper than having your own servers. You need to go cloud native if you want to save money.
Any egress heavy workload can quickly cost more on cloud than on prem. Especially if you’ve got consistent egress bandwidth that can be negotiated against.
If it's so heavy that you pay 8x the price of deployment and maintenance of physical servers then you're either very small in which case I'm surprised you don't need the flexibility, or you have many options to make a deal. Don't accept the listed prices.
Can I suggest that perhaps I have extensive experience with very large aws deployments and negotiations and stand by what I said.
Sorry but this claim makes me seriously question your experience with this particular regard. I'm an AWS partner and this (negotiating better prices) is what we do every week for our clients. There is no way egress causes your costs to 8x compared to on-premise deployment, even if you pay the listed price, and definitely not if you pick up the phone and call the first partner in registry.
If you said 2 times I'd think it's overestimated but okay, let's not dwell on details. 3x is bullshit and so is the rest.
Perhaps you're comparing apples and oranges - yes, it's possible to do a much less capable on-premise deployment that will obviously cost much less. But if we're comparing comparable - just the internet subscription you'd need in your DC to match the AWS offer in availability, connectivity and stability would make any egress costs pale in comparison. Saying this as someone who used to run a hosting company with 3000 servers before the clouds made it obsolete.
And lastly, yes - paying people to do stuff for you usually costs more than time and materials. If you fuck it up, it's up to you to fix it. If AWS fucks it up, you're compensated for it - part of the price are guarantees that are impossible to get with a DIY deployment. Maybe you don't need it, so choose accordingly - a cheaper hosting provider, or even the less capable on premise. But matching the cloud offer all by yourself is not going to be cheaper than the cloud unless you're on AWS scale.
There are so many blogposts about AWS egress being crazy expensive. Here is one: https://www.vantage.sh/blog/cloudflare-r2-aws-s3-comparison . Their example "image hosting" has ~$7K for AWS, vs $36 on R2, mostly due to egress costs.
Yeah, maybe "AWS partner" can give a discount but I bet it'd be 10% for most, or maybe 30% tops. This won't turn $7K into $36.
AWS offers Cloudfront as an alternative to Cloudflare. Serving traffic straight from your S3 bucket is wrong. S3 stands for Simple Storage Service and they really mean it - it's a low level object storage service intended for programatic usage that does exactly what you tell it without any caching or anything else, not a web hosting. Add Cloudfront and your costs will instantly lower multiple times. AWS tells you this during S3 bucket creation when you try to make it public, btw - it's not hidden.
Cloudflare networking solution doesn't nearly match - and to be fair, they're not trying - what AWS offers. Cloudflare is a small, focused service; AWS is enterprise universal do everything and stay secure&compliant while doing it solution that has the entire Cloudflare offering included and it's not even a big part of AWS. Don't conflate the two - use whatever is better for your use case, budget/margin, risk profile, reliability requirements etc, but each has some and the price is justified.
Are you sure you are AWS partner? Cloudfront is not going to "instantly lower multiple times" - it's still $0.060/GB (for US, other countries are even more expensive), so that would be at least $6K monthly bill. Its only few tens of percents reduction.
And sure, Cloudflare does not have all the breath of Amazon services, but I find it hard to justify $60 vs $6000 price difference. Amazon egress is simply incredibly overpriced, and any price-sensitive company should avoid using it.
It is not overpriced, it's simply not fit for your purpose - that's all I'm saying. That's fine, use the best tool for the job - I use Cloudflare too, it's great. But there are times when the capabilities offered by AWS networking are necessary and the price is well justified for what it offers.
It’s easy. Lift and shift, then fuck it up by not quite completely migrate everything to numerous badly managed kubernetes clusters. That’s what we did.
> No way that is true if you did it properly.
It's quite easier to mess up in a hyperscaling cloud because it's extremely forgiving. In a different setting you wouldn't be able to make as many mistakes and would have to stop the world and fix the issue.
there is absolutely a crossover point at which it would've made more sense to stay put.
My organisation is feeling it now and while our cloud environment isn't fully optimised it has been designed with cost in mind.
Using opex to make up for otherwise unjustifiable capex is suitable only in the beginning or if you need the latest servers every six (or whatever) months
I assume you just run everything on prem and have a high speed up/down connection to the Net? Do you have some kind of AWS/Heroku/Azure -type thing running locally or just use something like Apache or what?
But you have provided zero evidence for any of it.
How much of that is what technologists would consider "cloud" (IAAS, PAAS) versus what someone on the business side of things would consider "cloud" - office365, google gsuite, etc?
Given that AWS is doing $100B in annual revenue and still growing at 17% YoY ... and they do NOT have a collaboration suite (office/gsuite) - it'd say at least for AWS it's nearly all IaaS/PaaS.
It may not be as popular but they do have Amazon WorkDocs
> Amazon WorkDocs is a document storage, collaboration, and sharing system. Amazon WorkDocs is fully managed, secure, and enterprise scale.
https://docs.aws.amazon.com/workdocs/latest/developerguide/w...
I weep for Amazon leadership, forcing themselves to use workdocs over quip is self sabotage imo
This service has been discontinued recently, with a bunch of others that lacked adoption from customers.
I'd agree on IaaS/PaaS being the main driver. Id guess that everyone is running away from serverless offerings from all the main cloud providers. It's just day 1 lock in to a platform with no shared standards. It's very uncompetitive and kind of slow to innovate.
We’re migrating over a hundred apps to Azure App Service.
One has an issue with the platform-enforced HTTP timeout maximum values.
I migrated that app back to a VM in an hour.
It turns out that the “integration” for something like App Service (or CloudRun or whatever) is mostly just best practices for any kind of hosting: parameters read from environment variables, immutable binaries with external config, stateless servers, read only web app folders, monitoring with APMs, etc…
Sure, you’ll experience lockin if you use Durable Functions or the similar Lambda features… but no worse than any other workflow or business rules platform.
Ask people how easy it is to get off BizTalk or MuleSoft…
Amazon loves it when you run idle EC2 instances ($$$) rather than using Lambda.
Most real workloads I've seen (at 3 startups, and several teams at Amazon) have utilization under 10%.
That's really where you see that no answer is right across the board.
I worked at a very small startup years ago that leaned heavily on EC2. Our usage was pretty bipolar, the service was along the lines of a real-time game so we either had a very heavy work load or nothing. We stood up EC2 instances when games were lice and wound them down after.
We did use Lambda for a few things, mainly APIs that were rarely used or for processing jobs in an event queue.
Serverless has its place for sure, but in my experience it have been heavily over used the last 3-5 years.
I think the solution to that problem is usually to have fewer and smaller EC2 instances.
And you only need to get utilization up to like 15% to make reserved instances significantly better than lambda.
Not to naysay, any idea of that includes their own website? Just curious. I don’t az itself is the largest aws customer anymore.
I’d suspect there is significant growth of businesses acting as intermediaries for cloud storage. I think that other software providers have also realized that ransoming users data is a great way to extract predictable, hedge-fund-owner-pleasing revenue without performing useful work.
AEC software providers all do this. ProjectWise is worse than owning or renting a plain file server in every way I can imagine, yet every consultant in transportation dutifully cuts Bentley a five-figure check or larger every year so they can hold your project files hostage and pretend to develop software.
I pray for a merciful asteroid to end it all.
I’ve worked with a few organisations that I’d call “late adopters” to the cloud, and it’s rare for them to use IAAS or even PAAS. It’s all SAAS and serverless, and while they all say they’re doing devops it’s almost always clickops.
For Azure, all of it. Microsoft clumps Azure together with their server software (e.g. Windows Server, SQL Server) licensing when reporting the revenue, but give more fine-grained information on growth rates. This is the latter. (We also know the Azure business was already massive at $34 billion in 2022, since it got revealed during one of Microsoft's ongoing antitrust cases.)
For Google, I'm not aware of a reliable way of estimating the GCP vs. Workspace numbers. But they get asked it during earnings calls, and the answer has always been that the GCP growth is substantially faster than the Workspace growth.
Afaik, MSFT shows growth in Azure and Office as separate things during earning reports, so the % mentioned before is just Azure, and 31% is huge.
I sincerely doubt 37 signals has "a few dozen servers." Every company I've been in has a huge, sprawling cloud and that no one has governance over. New instances are stood up by individual teams in order to avoid bureaucratic delay, and these propagate indefinitely.
> Their Cloud exodus can't have been more than a few dozen servers, right?
"At the moment we have somewhere between 20-25 servers in each cab, or about 90 servers in each site. Here’s what the rack layout looks like in Chicago, for instance."
- https://dev.37signals.com/37signals-datacenter-overview/
Their "server count" is definitely much higher than what you are thinking.
"In parallel, GEICO, one of the largest automotive insurers in the United States, is actively repatriating many workloads from the cloud as part of a comprehensive architectural overhaul."
Is GEICO a major enterprise
Per google, more than 30,000 employees, so I'd say enterprise-scale, sure. One of the biggest? No, but not tiny or even medium-sized.
Insurance companies tend to have far more capital than their employee numbers would suggest. Particularly Geico, who are famously cheap.
"Have" is an interesting word. Much of that capital is covering a bad year in Florida or California.
aws and other hyperscalers will keep growing, no doubt. Public cloud adoption is at around 20%. So the new companies that migrate into the cloud will keep the growth going. That doesn't deny the fact that some might be repatriating though. Especially ones that couldn't get the benefits out of the cloud.
One thing I've seen in every startup I've been in over the last decade is that cloud asset management is relatively poor. Now I'm not certain that enterprise is better or worse, but ultimately when I think back 10+ years ago resources were finite. With that limitation came self-imposed policing of utilization.
Looking at cloud infrastructure today it is very easy for organizations to lose sight on production vs frivolous workloads. I happen to work for an automation company that has cloud infrastructure monitoring deployed such that we get notified about the resources we've deployed and can terminate workloads via ChatOps. Even though I know that everyone in the org is continuously nagged about these workloads I still see tons of resources deployed that I know are doing nothing or could be commingled on an individual instance. But, since the cloud makes it easy to deploy we seem to gravitate towards creating a separation of work efforts by just deploying more.
This is/was rampant in every organization I've been a part of for the last decade with respect to cloud. The percentage of actual required, production workloads in a lot of these types of accounts is, I'd gather, less than 50% in many cases. And so I really do wonder how many organizations are just paying the bill. I would gather the Big cloud providers know this based on utilization metrics and I wonder how much cloud growth is actually stagnant workloads piling up.
The major corps I've worked in that did cloud migrations spent so much time on self-sabotage.
Asset management is always poor, but thats half because control over assets ends up being wrestled away from folks by "DevOps" or "SREs" making K8S operators that just completely fuck up the process. The other half is because they also want "security controls" and ensure that all the devs can't see any billing information. How can I improve costs if I can't tell you the deltas between this month and last?
Yep. Developers leave and when they go, dev resources are often left running. Sometimes, it is whole environments. This junk adds up.
> That doesn't seem compatible with any kind of major cloud repatriation trend.
Agreed. I don't think this is a real trend, at least not right now.
Also, fwiw, I'm really not a fan of these types of articles that identify like a small handful of people or organizations doing something different and calling it a "trend".
Submarine like articles trying to create a trend I suppose.
37 Signals has enterprise scale influence in some software development circles.I’m no fan of them but they have it right on this one.
Revolutions cannot start huge, they must start small.
GEICO is moving away from the cloud because their IT is a joke. They had a horrible on-prem infrastructure, so they moved to the cloud not knowing how, and they made the same mistakes in the cloud as on-prem, plus the usual mistakes every cloud migration runs into. They are moving away from the cloud because their new VP's entire career is focused on running her own hardware. What we know about their new setup is absolutely bonkers (like, K8s-on-OpenStack-on-K8s bonkers). Look to them for what not to do.
37signals is like the poster child for NIH syndrome. They keep touting cost savings as the reason for the move, but from what I have gathered, they basically did nothing to save cost in the cloud. It is trivial to save 75% off AWS's list price. They will even walk you through it, they literally want you to save money. That, plus using specific tech in specific ways, allows you to reap major benefits of modern designs while reducing cost more. 37signals didn't seem to want to go that route. But they do love to build their own things, so servers would be a natural thing for them to DIY.
Almost every argument against the cloud - cost inefficiency, fear of vendor lock-in, etc - has easy solutions that make the whole thing extremely cost competitive, if not a way better value, than trying to become your own cloud hosting provider. It's very hard to estimate the real world costs, both known and unknown, of DIY hosting (specifically the expertise, or lack of it, and the impacts from doing it wrong, which is very likely to happen if cloud hosting isn't your core business). But it's a 100% guarantee that you will never do it better than AWS.
AI is the only place I could reasonably imagine somebody having an on-prem advantage. At the moment, we still live in a world where that hardware isn't a commodity in the way every other server is. So you might just be faster to deploy, or cheaper to buy, with AI gear. Storage is similar but not nearly as tight a market. But that will change eventually once either the hype bubble bursts, or there's more gear for cheaper for the cloud providers to sell.
> Almost every argument against the cloud - cost inefficiency, fear of vendor lock-in, etc - has easy solutions that make the whole thing extremely cost competitive, if not a way better value, than trying to become your own cloud hosting provider. It's very hard to estimate the real world costs, both known and unknown, of DIY hosting (specifically the expertise, or lack of it, and the impacts from doing it wrong, which is very likely to happen if cloud hosting isn't your core business)
Please define your concept of self-hosting here. Does it mean you need to have your very own DC? Renting a few racks that you fill yourself? Rent CPU, storage and networking, with remote hands and all the bells and whistles? Depending on the scenario it changes dramatically the burden of ownership (at a monetary cost, obviously). And depending on the size of the company and the variability of the workload, it can (or can not) make sense to be on-prem. But being like "cloud is perfect for everyone and everything, if you tune it well enough" seems a bit too much black&white to me.
The co-location costs are relatively minor. The bulk of the cost of self-hosting comes from quantization: needing to buy 2x (for HA) of things and then not being able to use the full capacity. Things like tape libraries, Internet routers, hardware firewalls, etc...
The multi-vendor aspect can be expensive as well. There's a lot of different end-of-life issues to track, firmware to update, specialist skills to hire, and so on.
Just backup alone can be shockingly expensive, especially if it has a decently short RTO/RPO and is geo-redundant. In Azure and AWS this is a checkbox.
It's very easy to estimate the real-world cost of on-prem or dedicated hosting - there is a wide range of providers that will quote you fixed monthly prices to manage it for you (including me) because we know what it costs us to manage various things for you.
AI is the only place I don't currently see much on-prem advantage, because buying SOTA equipment is hard, and it gets outdated too quickly.
For pretty much everything else, if you can't save 70%+ TCO, maintenance/devops included, over an optimized cloud setup, you're usually doing something very wrong, usually because the system is designed by someone who defaults to "cloud assumptions" (slow cores, too little RAM, too little fast storage, resulting in systems that are far more distributed than they need be is the typical issue).
The main problem with AWS is their outrageous pricing on some aspects like traffic. And some very unexpected pricing nuances which could burn thousands of dollars in a blink of an eye.
While AWS engineers are more competent, may be you don't need that much competency to run simple server or two. And expense structure will be more predictable.
Right here - network is expensive. Another issue is cost estimation.
Servers have prices, rack space has prices, engineers have salaries.
You now literally have people hired now to work on how to calculate infra costs on aws (finops). Spot prices fluctuate. So what are the real savings in FTEs?
>K8s-on-OpenStack-on-K8s bonkers
Do what now???
It's actually quite reasonable if for bad reasons.
TL;DR setting up OpenStack was so horrible I think SAP started deploying it through k8s.
So if you want to setup local "private cloud" kind of setup it makes sense to set up OpenStack on k8s.
If you then want to provide multiple clusters cloud-style to the rest of the organization... well, it's just layered again.
In fact, at least one significantly-sized european vendor in on-prem k8s space did exactly that kind of sandwich, to my knowledge.
Openstack on k8s is basically giving up on openstack on openstack. https://wiki.openstack.org/wiki/TripleO You need some kind of orchestration for the control plane. Either you create it from scratch or use something existing.
I am totally unsurprised people gave up on Triple-O.
I haven't personally worked on OpenStack setups myself, but my coworkers in few places did (including supporting if not actually being contracted out to build some of the commercial offerings in that space) and especially upgrades were always this huge project where it made more sense to tear down the entire setup and bring it up again.
That was made easier with OpenStack packaged as docker containers, but ansible was arguably still more pain to setup the cluster than just using k8s.
I know nothing about Geico's IT but I find your comments surprising. GEICO is one of the most profitable insurance companies in the world which, of course, is the end goal of every company.
It's a short simple post that comes down to this:
> Weekly explains that “just running legacy applications in the cloud is prohibitively expensive,” highlighting how lift-and-shift approaches often fail to deliver expected benefits.
Yes, if you have a mature business without active development at a scale where compute/storage costs is a substantial accounting line item, then it makes sense to run on hardware that doesn't have the flexibility and cost of the cloud.
There is an in-between that makes much more sense for most though. Running on provisioned bare metal. Lots of providers offer this as a better performance/price option where you don't have to deal with provisioning hardware but do everything else from the OS+maintenance and up.
At one company we used large bare-metal machine instances provisioned for stable parts of the application architecture (e.g. database and webapp instances) and the cloud for new development where it made sense to leverage capabilities, e.g. DynamoDB with cross-region replication.
I can't tell you how often I've run into cloud deployments that were lift-and-shifts, pushed on by bean counters wanting OPEX instead of CAPEX. They then run into actual cashflow expenses, less stability, more complex security (now you get IAM on top of basic networking!), and the ability for one underpaid person to easily do a lot of damage - because you're certainly not going to hire top-tier cloud talent - these are bean counters running things after all.
It makes it really clear why you so many data leaks via badly configured s3 buckets of dynamo tables...
> now you get IAM on top of basic networking!
You always had IAM even on prem. Just before IAM meant admin:admin everywhere with domain admin creds for everyone and an NPS that nobody knew how to configure so they beat it with a wrench until things started working a decade ago.
Ah, the good old days.
It’s a bit naive to think that this sort of an org went from hiring top tier sysadmin staff to bottom of the barrel developers for cloud dev. It’s likely they were a thundering mess when they managed their own hardware too.
Very large mature businesses that don’t see IT as a core function have probably outsourced management to a third party. There’s not much daylight between that third party’s margin and just paying a hyperscaler.
There are certain workloads that have never been really economical to run in cloud. Cloud economics is based on multi-tenancy, eg if you have a lot of hardware that is sitting idle a lot of the time, then cloud may be economical for you as the cloud provider can share it between you and others.
Cloud is also good for episodic use of expensive exotic systems like HPC and GPU fleets, if you don’t need them all the time- I call this serial multi-tenancy.
Cloud is not economical for massive storage, especially if you’re not willing to use backup solutions and reduced availability. For example, AWS S3 default keeps multiple copies of uploaded data; this is not comparable to typical on-premises RAID 1 or RAID 3. You can save money with reduced redundancy storage but then you have to take on more of the reliability burden. Likewise compute is cheap if you’re buying multi-tenant instances, but if you want dedicated instances or bare metal, then the economics aren’t nearly as attractive.
Cloud is also good for experimentation and rapid development - it’s so much faster to click a few buttons than to go through the hardware acquisition processes at many enterprises.
The companies that regret cloud due to financial concerns usually make two mistakes.
First, as noted above, they pay for premium services that are not directly comparable to on-prem, or they use workloads in cloud that are not cloud economical, or both.
Second, they don’t constrain random usage enough. It is super easy for a developer doing some testing to spin up thousands of dollars of bill. And it’s even worse if they leave it at the end of the day and go home- it’s still racking up hourly usage. And it’s downright ugly if they forget it and move on to something else. You have to be super disciplined to not spin up more than you need and turn it off as soon as you’re done with it.
> but if you want dedicated instances or bare metal
Multitenant instances on AWS statically partition the hardware (CPU, RAM, network), so tenants don't really share all that much. Memory bandwidth is probably the only really affected resource.
> Second, they don’t constrain random usage enough.
AWS now has billing alerts with per-hour resolution and automatic anomaly detection. There are third-party tools that do the same.
> Multitenant instances on AWS statically partition the hardware (CPU, RAM, network), so tenants don't really share all that much.
You are missing several points:
First, density. Cloud providers have huge machines that can run lots of VMs, and AWS in particular uses hardware (”Nitro”) for hypervisor functionality so they have very low overhead.
Cloud providers also don’t do “hardware” partitioning for many instance types. AWS sells “VCPUs” as the capacity unit; this is not necessarily a core, it may be time on a core.
Cloud providers can also over-provision; like airlines can sell more seats than exist on a plane, cloud providers can sell more VCPUs than cores on a machine, assuming (correctly) that the vast majority of instances will be idle most of the time, and they can manage noisy neighbors via live migration.
And lots of other more esoteric stuff.
> Cloud providers can also over-provision
But they don't. AWS overprovisions only on T-type instances (T3,T4,T5). The rest of the instance types don't share cores or memory between tenants.
I know, I worked with the actual AWS hardware at Amazon :) AWS engineers have always been pretty paranoid about security, so they limit the hardware sharing between tenants as much as possible. For example, AWS had been strictly limiting hyperthreading and cache sharing even before the SPECTRE/Meltdown.
AWS doesn't actually charge any premium for the bare metal instance types (the ones with ".metal" in the name). They just cost a lot because they are usually subdivided into many individual VMs.
For example, c6g.metal is $2.1760 per hour, and c6g.16xlarge is the same $2.1760. c6g.4xlarge is $0.5440
> And lots of other more esoteric stuff.
Not really. They had some plans for more esoteric stuff, but anything more complicated than EC2 Spot does not really have a market demand.
Customers prefer stability. EC2 and other foundational services like EBS and VPC are carefully designed to stay stable if the AWS control plane malfunctions ("static stability").
Seems par for the course that even AWS employees don't even understand their pricing. I noticed the pricing similarity and tried to deploy to .metal instances. And that's when I got hit with additional charges.
If you turn on a .metal instance, your account will be billed (at least) $2/hr for the privilege for every region in which you do so. A fact I didn't know until I had racked up more charges than expected. So many junk fees hiding behind every checkbox on the platform.
Also former AWS :-D
S3 has two more cost saving dimensions: How long will you commit to storing these exact bytes and how long are you willing to wait to get them. Either of those will allow you to reduce S3 costs without having to chance data loss due to AZ failure.
> it’s so much faster to click a few buttons than to go through the hardware acquisition processes at many enterprises.
My companies on prem requisition process used to be so horrifically bad that it forced ball of mud solutions because nobody had time to wait 9 months for new servers. Also we had scaling issues and we couldn't react in any timely manner. I feel like its a penny wise pound foolish approach to stay on prem.
Most enterprises on prem already run VMware for virtualisation, it is the antiquated way of provisioning that affects how slow it is to spin something up on prem. And frequently these antiquated practices are carried to the cloud, negating any benefit.
> And frequently these antiquated practices are carried to the cloud, negating any benefit.
I should have brought that up too. Airlifting your stuff to the cloud and expecting cloud to run like your data center is a way to set yourself up for disappointment and expense. The cloud is something very different than your on-premise datacenter and many things that make sense on prem, do not make sense in cloud.
What I was surprised to find in some big orgs is the processes have not evolved to be cloud first. There is lack of maturity, still a chain of committees, approvals, and manual processes; risk management still treats the services as a giant intranet, deployments are not scripted, ad hoc designs. Resources are placed in vnets so that they resemble a system they already know, and comes with all the associated risks.
This is the reality IME. I'm currently in an org that has been "in the cloud" for over ten years but is only now architecting (some) new projects in a cloud-first way. Meanwhile there is big pressure to get out of our rented cages so there is even more lift-and-shift migration happening. My guess is that we eat roughly 5x as much compute as we would need with proper scaling, and paying cloud prices for almost all of it.
Yep transition to cloud-first is still such a challenge for many big organizations
Any large scale transition actually !
Change can be tough
The transition has to be accompanied by a revamp of all the technical processes associated with IT provisioning, which is too much and too risky to do.
Kjell's Law: the cost of a platform eventually exceeds the cost of the one it replaced. But each cost is in a different budget.
We seem to have replaced cooling and power and a grumpy sysadmin with storage and architects and unhappy developers.
I've never worked in a data center that did cooling and power correctly. Everyone thinks they're doing it right, and then street power gets cut - there's significant impact, ops teams scramble to contain, and finally there's the finger-pointing.
The first time we tested cutting the power back in the day, the backup generator didn't fire! Turns out someone had pushed the big red stop button, which remains pushed in until reset.
That would have been a major problem if we'd had a nighttime power outage.
After that we ran regular switchover testing :)
The other time we ran into trouble was after someone drove a car into the local power substation. Our systems all ran fine for the immediate outage, but the power company's short term fix was to re-route power, which caused our voltage to be low enough for our UPS batteries to slowly drain without tripping over to the generator.
That was a week or two of manually pumping diesel into the generator tank so we could keep the UPS batteries topped up.
I have, and least power. I worked in a DC with 4 very large diesel generators, each backed up by a multi-ton flywheel that managed the transition between a power cut and the generators taking over.
Area wide power cut, winter afternoon so it was already getting dark. The two signs I knew there was something wrong were that all the lights went out outside, ie other businesses, street lighting etc. And my internet connection stopped working. Nothing else in the DC was affected. Even the elevator was working.
Amazing story, and now the flywheel is living rent free in a steampunk-inspired corner of my brain. What are these called so I can look them up on the net? Like this maybe?
They're chonky devices which were not really off-the-shelf until the last decade, as far as I know. There's very few images of them, but a smaller one looks like this: https://www.pv-magazine.com/2018/03/14/pilot-project-for-fly...
Very cool. Somehow I missed the idea of flywheels used to store energy. I assumed one that small would peter out in seconds.
These were huge. In fact iirc the power that came into the building spun them, and the DC ran off the resultant generated electricity. So the risk at cutover is minimal, there is in fact no loss of electricity unless the wheel drops below the required revs.
One of these units blew at one point. We had 4 and only needed two running, so no big deal. The company who managed the whole thing (Swiss) came to replace it. Amazing job, they had to put it on small rollers, like industrial roller skates, then embed hooks in the walls at each corridor junction, and slowly winch the thing along, it was like watching the minute hand of a clock.
Then the whole process in reverse to bring in the new one. Was fascinating to watch. The guy in charge was a giant, built like a brick outhouse. They knew their stuff.
Much smaller-scale, but I worked at a company with a mini-mainframe-type computer (VAX-11/780, iirc) that had a 'motor-generator' to run it (really a motor-flywheel-generator).
The computer, storage, etc. ran off the generator, which first eliminated any risk of power spikes and surges (as the flywheel is a very effective low-pass filter), and the circuits controlling motor speed also ensured the AC frequency was better than the power company supply. This was located in a rural area, so the long power lines with few sinks (customers pulling power) made lightening spike risk spread further, and the rural voltage and frequency fluctuated a lot. Seemed like a really cool system that worked flawlessly in the years I was there.
I'm realising, with my limited understanding of electronics, that the flywheel acts in these cases as a capacitor, albeit a frickin' huge mechanical one.
Elegant! Flywheel as line conditioner. Most cool.
That whole story is clutch. Thanks a ton.
> then street power gets cut
Or the electrician doing maintenance on the backup generator doesn't properly connect the bypass and no one notices until he disconnects the generator and the entire DC instantly goes quiet.
Or your DC provisions rack space without knowing which servers are redundant with which other servers, and suddenly when two services go from 10% CPU use to 100% CPU across ten servers the breaker for that circuit gives up entirely and takes down your entire business.
The colo I’m used to has survived multiple switch overs to backup and then to diesel generators without a blip that I could detect.
I say “I’m used to” because having things there has spanned more than one job.
One power outage was days to a week. Don’t recall exactly.
It’s possible to do it right.
Yes it's possible. But it's not cheap. If you buy a bunch of UPS and a few generators (you need more than one, in case it doesn't start) and don't maintain them regularly and test them regularly that's when you get some bad surprises.
I mean; it's impossible to plan for everything, and I'd argue that if you actually did plan for everything; it would be so extraordinarily overbuilt that it couldn't be considered 'correct'.
We had happy developers before? Amazing.
They are Grumpy because now they are doing Sysadmin stuff
without the grumpy sysadmin they jump out more.
It’s the same old MBA cycle we had with onshoring / offshoring. Everyone wants to build their resume so they have to change things.
In this cycle a new MBA comes in wants to make an impact so does a cloud transition. Then they move on and the next guy comes in, wants to make an impact so moves things back in house. Repeat until some new fad comes along.
You can have a 100Gb uplink on a dedicated fibre for less than 1000$/month now. Which is insanely less than cloud bandwidth. Of course there are tons of other costs, but that alone can suffice to justify moving out of the cloud for bandwidth intensive app.
We went to cloud because 1) we only need 3 infra guys to run our entire platform and 2) we can trivially scale up or down as needed. The first saves us hundreds of thousands in skilled labor and the second lets us take on new customers with thousands of agents in a matter of days without having to provision in advance.
1) You may more than pay for that labor in cloud costs, but you can also pretty easily operate rented dedicated hardware with a 3-man team if they know how to do it, the tools to scale are there they're just different.
2) I don't know what your setup looks like, but renting a dedicated server off of Hetzner takes a few minutes, maybe hours at most.
My personal opinion is that most workloads that have a load balancer anyways would be best suited to a mix of dedicated/owned infrastructure for baseline operation and dynamic scaling to a cloud for burst. The downsides to that approach are it requires all of skillset A (systems administration, devops) and some amount of skillset B (public cloud), and the networking constraints can be challenging depending on how state is managed.
Just to clarify, AWS lets you provision bare-metal too if your goal is to just rent hardware someone else is maintaining. And things like trivially distributing load and storage across multiple datacenters/regions is another big bonus for us.
Hetzner has all of that. And cloud. Its just that their dedicated server offerings are SO attractive that people keep mentioning that. Otherwise its not like their cloud offering is also very attractive.
Correct, but most of your cost in public clouds is in bandwidth, not server rental. To my knowledge, AWS also charges a hefty premium for their dedicated servers compared to competitors.
With 3 people it’s basically impossible to build a ha storage solution that can scale to a certain amount - it’s also impossible to keep that maintained.
Can you give a ballpark figure of what scale you have in mind?
Some distributed databases and file systems are notoriously finicky to operate; Ceph comes to mind in particular. Choice of technology and architecture matters here a lot. Content addressed storage using something like Minio with erasure codes should scale pretty easily and could be maintained by a small ops team. I personally know a couple of people that were effectively solo operations for 100PB Elasticsearch clusters, but I'd say they're more than a bit above average skill level and they actively despised Elasticsearch (and Java) coming out of it.
Just curious where do you get 100Gb with Internet transit and dedicated fiber for 1000$/month? I'm in a small town in eastern Germany and looked for a simple Gigabit fiber access for our office without any bandwidth guarantees and it's 1000€/month for 1Gb here with the most budget provider but with some nebulous bandwidth guarantees. I'm not talking about residential fiber that also very expensive after a certain threshold. I know there is init7 in Switzerland but it's the exception to the rule in Europe it seems. Getting a fast fiber and good transit is still expensive?
I'm in Switzerland, so maybe I am biased, I have 10Gbit/s dedicated on a 100Gbit/s link for about 600$/month. In practice I have 25Gbit/s with most datacenters in europe, 100Gbit/s with some that are close (OVH, Hetzner), and 10Gbit/s with the rest of the world.
Is this a home connection ?
No, dedicated business fiber from sunrise.
Sorry, I should have been more specific: does this offer stands if I need 200G ? Is the bandwith garanteed (as-in: I can use it all day long, as would a datacenter do) or is it burst-only (so its targets are homes and offices) ?
I expect the latter
I have 10Gbit/s guaranteed with 100Gbit/s burst.
Running a service takes more than a fat pipe. You need to handle power outages, need redundant internet connections, ect, ect.
Yes, but for example a 10Gbit/s pipe is about 3PB of transfer capacity per month which is about 150 000$/month in S3 traffic. A 40kW UPS which can handle about 2 racks (2x42U) of high density servers, with a generator cost about 50k$. A redundant link with your own AS so you can BGP should cost about 5k$ per month (at least here in switzerland).
Of course it really depends on the application, but if you host something like a streaming video service where bandwidth is the main factor, you can quickly reach a point where self hosting is cheaper.
10Gbps is one "teen with a stolen credit card" DDoS event away from being unusable. If you're running a big service that someone may dislike, that's really not enough.
As you’ve already alluded to elsewhere though - you host it behind a cdn or something. A single ec2 instance is just as vulnerable to a teen with a stolen credit card attack.
That's why you put your services behind a CDN, even if it's not cacheable traffic. Then you can rate limit what's coming to you.
With the cloud, that DDoS can bankrupt you by causing you unconstrained bills instead.
Oh definitely. I would've been more clear - I meant: you still can't stop there and you'll need a third-party to take the traffic with either solution.
Yea, I call BS on 100Gb uplink for $1000. I have racked a lot of servers at different data centers. No way.
I am in Switzerland where you have 25Gbit/s for about 70$/month so I understand that this might be an exception. But even if it is 10 000$/month, it is still widely cheaper than cloud bandwidth.
Business pricelist as PDF: https://www.init7.net/de/angebot/business-internet/business_...
It's interesting as it seperately lists price for 24x7 support.
CHF 111 ≈ $129
If you don't mind me asking, to where? That is, what uplink do you see to your nearest AWS or gcloud? In the US, advertised residential speeds don't nessarly translate to realized gains. Just pushes the bottle neck to the ISP.
Agreed that either way it is way cheaper than cloud bandwidth.
Chat, feeds and moderation run on AWS for us. Video on the other hand is bandwidth intensive. So we run the coordinator infra on AWS, but the SFU edge network on many different providers.
I think the cloud is good for some things, and not so great for others. S3 is fairly cost effective. RDS is expensive, bandwidth is crazy etc.
(5M a year spend on AWS atm.)
The article is incredibly thin on details.
In my experience, it comes down to two factors:
1. Egress cost. Cloud hosting providers have absolutely insane egress pricing. It's beyond stupid at this point, if you want to host anything bandwidth-intensive.
2. Storage pricing.
"Storage is cheap, but moving it ain't" is a quote a former co-worker frequently liked to remind people. The quote applied at the low level (eg between CPUs and their caches) all the way up to networking.
Anyways, cloud provider egress costs can be ridiculous. Amazon charges for egress transfer out of AWS, then quite a bit for NAT gateway transfer, and AWS network firewall on top of that (we dropped the firewall and moved our bulk traffic to a specific outer subnet because of that). Oh, and you can't give many serverless products (eg lambda) elastic IPs, so out the NAT gateway it goes...
So. Frustrating.
It doesn't seem to say in the article and it's not really discussed in these "LEAVING THE CLOUDS!!" articles, but what are these orgs doing for on-prem? Given the broadcom acquisition of vmware, rebuilding massive vsphere clusters like it's 2010 doesn't seem like a good long term play. Are they moving to kubernetes? Some other hypervisor?
At least in the case of 37signals, they went with colocated servers, some type of KVM and their own tool, Kamal, for containerized deployments without the complexity of kubernetes.
You can find one post here with many links at the bottom
Possibly some amount of Triton and Oxide
Well, major companies aren't ditching the cloud and there is no evidence for a trend otherwise. And 37signals isn't a major organization for any of the big cloud providers. They are just a rounding error.
Major companies aren't paying the headline rates.
Even at 37 signals size you're paying negotiated rates.
And 37 signals may not be a "major" organization to you, but they're bigger than the vast majority of companies.
They are certainly a household name in the startup community but they are not a major organization for the big three cloud providers. Why is this important? Because the headline claims falsely that major companies are ditching the cloud providers. I have insights into the decision process for a major organization moving to the cloud and the motivation why 37 would leave cloud are in no way comparable to that of a major org.
"major" is subjective. They are larger than the vast majority of companies. That makes them "major" to me at least.
Meanwhile, from Q3 Amazon earnings:
* AWS segment sales increased 19% year-over-year to $27.5 billion.
That means AWS brought in $4.3 BILLION more dollars in Q3 2024 vs 2023.
That's a huge amount of incremental revenue growth. If the net movement of workloads were out of the cloud, then it would have to show up in the results of Intel / TSMC / Equinix et. al.
I just took a look, and Equinix quarterly revenue is $2.1B.
Here people like arguing their opinions as if they're facts instead of using evidence (public proof) to support their argument.
Oh, correlation.
Almost any story about cloud repatriation is a story about a failure of the market to act competitively rather than someone actually able to do it for less money than the cloud providers can. The big providers margins are crazy, like over 50% which is normal for a software / service business but they are essentially hardware businesses.
I prefer this https://blogs.idc.com/2024/10/28/storm-clouds-ahead-missed-e... more nuanced article.
I can see how AI workloads makes clouds look expensive.
I think control is maybe a bigger factor than cost these days. Being able to hold anyone accountable at all seems to be an operational superpower. Working with cloud vendor support is a torturous experience on a good day. It also doesn't matter how expensive the virtual machine is if there isn't one available to be provisioned.
I know it's kind of harsh, but owning the whole vertical and having the power to instantly fire anyone for giving an Azure-tier response is why these companies are doing it in my mind. Waiting on a 3rd party to find their own ass with a whole S&R team every time you need help is quite exhausting. I've never worked with an IT vendor and thought "damn these people are so responsive I can't dream of doing it better myself".
I work for a small orgthat is owned by a very large corp. Our spending is managed by large corp.
If I want to buy a $10 domain, the process takes a month and requires escalating to a purchasing director. If I want to rent a new server from hetzner, same thing.
If I want to spin up a bedrock instance for $1000/day on AWS - it’s already a line item in the budget so as long as I have a cost tag on the resource it’s pre-approved. As long as something is on the software catalog on AWS it’s ok to use.
In some regards, absolutely.
But remember even when you're doing everything on-prem with your own employees, you're still running software written by third parties.
So you might still have an unresponsive third party, just they might be a database vendor instead of a cloud vendor.
Depending on you size, even just having some people from all the open source products on staff is probably cheaper anyway. And gives you pretty good control. And if it used to work the only one that can have broken the config is you, which means you can also fix it. Sure maybe you need to rollback a kernel update or whatever but in the end it's on you.
> “Ten years into that journey, GEICO still hadn’t migrated everything to the cloud, their bills went up 2.5x, and their reliability challenges went up quite a lot too.”
yes this would make cloud cost a lot without any of the benefits lol
They will want cloud-like APIs on-premises and most will implement OpenStack. The second wave of migrations to the cloud will be even quicker for these companies making their way back to on premises.
Recently, i've come to realize one real use of those clouds was to provide a good US-EU network connection. If you want to provide both continent users with correct bandwidth to your service, you have no choice but to have them connect to a datacenter on their own continent. Public data transit across the atlantic is simply miserable.
Then, because they probably have private atlantic cables, you can replicate at good reliable speed.
>Cloud repatriation is undoubtedly not for start-ups or scale-ups still on their way to profitability or product-market fit. For such companies, the cloud abstracts all the complexity of IT infrastructure and lets their teams focus on the business challenges.
Hmm, what about companies that are expected to be stronger-than-average in computer science & software engineering, and might not yet have as much competitive advantage in business momentum or financial resources to begin with?
Would it be better to leverage the strongest area of expertise or not?
Tough decision, which I would be very conservative about making.
Decade 0 of The Cloud didn't obscure very much of the heavens and it remained sunny with only a slight chance of scattered data.
Now on first pass (decade 01) it looks like the cloud is ideal if you have huge amounts of data that needs to be shared with just about anybody anywhere at any time 24/7.
I know I'm not in that league, so I can't speak from a position of expertise, but after this much dust has settled it does look like it would be most widely useful mainly for data which is not the least bit confidential.
Especially data which is completely public, or intended to be public more so than was possible any other way.
And then only as long as the ongoing cost is "virtually" insignificant compared to the fully amortized on-premises in-house alternative.
Seems like it would really make sense to do this kind of financial analysis before deciding how to best handle the data that you want the world to have access to.
Probably a good idea to consider how to best handle the other kind of data that you don't ever want to share with the world at all, which is a whole different equation. Any cloud in the way and it may be more challenging to break through the ceiling for the sky to be the limit on that one.
At least this seems to be the kind of thing that has been consistent since the overcast started rolling in.
But what do I know?
I'm just an earth-bound observer ;)
This is partially the result of cloud providers and partially business leadership. They, for whatever reason, insufficiently educated their clients on migration requirements. Lift & shift from on-premises to cloud only work for emergency. The shifted resources must be converted to cloud stack, or the cost will be multiples of on-prem costs. Business leadership was (is?) ignoring IT teams screaming of the problem with lift & shift.
Now, businesses shifting back to on-prem because they are still uneducated on how to make cloud useful. They will just shift all non-core activities to XaaS vendors, reducing their own cloud managed solutions.
Source: dealing with multiple non-software, tech firms that are doing just that, shifting own things back to on-prem, non-core resources to XaaS.
I keep reading ´ Lift and shift is bad ‘ on HN - what is the opposite of lift and shift ? ( ´cloud native ´ does not mean much to me). Is it that instead of oracle running on a rented vm you use whatever db your cloud provider is selling you, you move your monolith to a service oriented architecture running in k8s, etc ?
The opposite of lift and shift is rearchitecting for auto-scaling. Pretty much every price advantage comes from using the fact that the cloud provider absorbs the cost of idle resources instead of you.
So you either adopt serverless patterns (the kind of serverless that means scale to zero and pay nothing when you do) or you adopt auto-scaling, where you shut down instances when traffic is lower, to minimize idle time. Or both.
lift & shift = move your instances, which are expensive and not at all competitive
cloud native = use more serverless products (pay as you go with no base price)
For instance, one could implement internal DNS using instance which run bind, and connect everything through some VPC, and put a load balancer in front of the instances. One could also rework its DNS architecture and use route53, with private hosted zones associated with all the appropriate VPCs
Another real example: one could have hundreds of instance running gitlab runners, running all day, waiting for some job to do. One could put those gitlab runners into an autoscaled kubernetes, where nodes are added when there are lots of jobs, and deleted afterwards. One could even run the gitlab runners on Fargate, where a pod is created for each job, to run that job and then exit. No job = no pod = no money spent.
Of course, some work is required to extract the best value of cloud providers. If you use only instances, well, surprise: it costs a lot, and you still have to manage lots of stuff.
I ran our CI on fargate at my last job. It was a mess. The time from api request for an instance to it being ready to handle a job was minutes. It was about 50x slower than running on a mid range laptop that I occasionally used for development, and in order to work around that we kept hot EBS volumes of caches around (which cost $$$$ and a decent amount of developer time).
Just before I left I was investigating using hetzner instead - our compute bill would have been about 15% cheaper, we would have had no cache storage costs (which was about 5x our compute costs), and the builds would have finished before fargate had processed the request.
Our numbers were small fry, but we spent more on that CI system than we did on every other part of our infra combined.
Of course, while this move to cloud native, along the path from IaaS to PaaS to SaaS, saves money on one dimension, it also binds the customer more and more immovably to the cloud vendor...
In simple terms, yes.
The term “native” refers to adopting the vendor’s technology stack, which typically includes managed data stores, containerized microservices, serverless functions, and immutable infrastructure.
Thanks.
I work for a very large org, and cloud benefits are not obvious to me ( ie we’re large enough to absorb the cost of a team managing k8s for everyone, another team managing our own data centers around the world etc ).
I view cloud as mutualizing costs and expertise with other people ( engineers and infra), but adding a very hefty margin on top of it, along with vendor lockin.
If you’re big enough to mutualize internally, or don’t need some of the specific ultra scale cloud products, it’s not an obvious fit to me ( in particular , you don’t want to pay the margin )
I understand that for a significant chunk of people it’s useful provided that they use as many mutualizing levers as possible which is what going native is about.
Is my understanding correct ?
Yes, the profit margin for cloud providers is very real—and quite costly.
I think one point that’s often overlooked is the knowledge gap between the engineers at cloud providers (such as systems, platform, or site reliability engineers) and those that an individual company, even a large one, is able to hire.
This gap is a key reason why some companies are willing—or even forced—to pay the premium.
If average or mediocre management skills and a moderately complex tech stack are sufficient, then on-premise can still be the most cost-effective choice today.
Thanks a lot for the detailed answer.
I agree with the gap, and I understand why people would like to pay for the premium.
Where I work currently, wherever we don’t have the right people ( and usually because we can’t find them ), our cloud-like on-premise offering doesn’t work which ends up causing significant extra costs further down the chain.
I would guess that all of these companies that are moving back are throwing in the towel on their cloud migration/modernization plans under the guise of "repatriation" when it's really poor execution without any responsibility.
It was easy when everyone was spending cheap money for marketing and other vanity around moving to the cloud. But now that money costs something, and everyone has to control costs, repatriation is the new hotness when you want to save opex with capex. Cloud margins are org savings.
The trick is to not care, and be proficient as a technologist; you make money either way riding the hype cycle wave. Shades of Three Envelopes for the CIO and whomever these decisions and budgets roll up to.
https://kevinkruse.com/the-ceo-and-the-three-envelopes/
(If you genuinely get value out of premium compute and storage at a cloud provider, you're likely going to keep doing that of course, startups, unpredictable workloads, etc)
One of the things about startups is that if you’ve got any external validation, gcp/aws/azure will give you 2-3 years worth of free credits. When money is tight, free compute goes a long way.
Our poor strategic planning for cases where migration wasn’t necessary/feasible in the first place
How can we get ride of vendor lock-in and have fait market competition get prices down for cloud?
It must be possible to make cloud more cost effective via specialization versus every company building the same infrastructure again and again.
Proposed solution: A set of neutral validators that define standard Interfaces and then test any cloud wanting to get listed for compatibility and minimum included performance (also egress).
If all this data is open we should get competition back and fix cloud.
Disclaimer: I am working on suh a system, enterprise love the idea it does well at hackathons but not production ready on the validation standard yet. Would be happy to get critical HN feedback.
All large orgs start running their own cloud infra at some point. So this has been a case for very long.
Cloud is great until you have Sooooo much money and the running costs is too damn high.
> For instance, companies can utilize cloud native NVMe-based storage solutions for their database or implement custom database hosting on cloud compute instances using Kubernetes, all while maintaining the cloud’s scalability and flexibility, avoiding any lock-in.
I will always dispute this. K8s is also a lock-in, it does not magically free you from issues, it only brings in a separate set of issues, overheads and problems.
Telco, and I'm sure other industries, are adopting hybrid. Many things core to the business are being yanked out of the cloud.
Seems like CIOs are finally listening to the Grey beards.
It comes down to cost, especially cost predictability. And now businesses have more "expertise" to manage their servers after all these years. Obvously, not everything is migrated out of the cloud.
Cloud used to be cheap... it clearly isn't anymore.
There is a ton of companies in the cloud, that do not know- how to do cloud infra. So they park there administration- at their integrating customers. Which then ditch the avalanche of burning and rebuild huts.
A meta-standard for deployment and infrastructure setup is needed and should be forced down the throats of the resisting patient.
I will use an opportunity to confirm that cloud is ill-suited for almost all but niche business cases and majority of users were dragged into cloud platforms either by free credits or (my suspicion) some grey kick-back schemes with C-level guys.
At my current project (Fortune 500 saas company, was there for both on-prem to cloud and then cloud-to-cloud migration):
a) Resources are terribly expensive. Usual tricks you find online (spot instances) usually cannot be applied for some specific work related reason. In our estimates, in contrast to even the hw/sw list-prices, cloud is 5x-10x more expensive, of course depending on the features you are planning to use.
b) There is always a sort of "direction" cloud provider pushes you into: in my case, costs between VMs and Kubernetes are so high, we get almost weekly demands to make the conversion, even though Kubernetes for some of the scenarios we have don't make any sense.
c) Even though we are spending 6 figures, now maybe even 7 figures on the infrastructure monthly, priority support answer that we receive are borderline comical and in-line with one response we received when we asked why our DB service was down, quote: "DB has experienced some issues so it was restarted."
d) When we were having on-prem, some new features asked from ops side, were usually implemented / investigated in a day or so. Nowadays, in most cases, answers are available after week or so of investigation, because each thing has its own name and lingo with different cloud providers. This can be solved with specific cloud certifications, but in real-world, we cannot pause the business for 6 months until all ops are completely knowledgeable about all inner workings of the currently popular cloud provider.
e) Performance is atrocious at times. That multi-tenancy some guys are mentioning here is for provider's benefit not for the customer. They cram ungodly amount of workload on machines, that mostly works, until it doesn't and when it does not, effects are catastrophic. Yes, you can have isolation and dedicated resources, but a)
f) Security and reliability features are overly exaggerated. From the observable facts, in the last year, we had 4 major incidents lasting several hours strictly related to the platform (total connectivity failure, total service failure, complete loss of one of the sites, etc).
In the end, for anyone who wants to get deeper into this, check what Ahrefs wrote about cloud.
9 cents per gigabyte egress is the core of why companies are leaving the cloud.
That’s the start point that gets them thinking about all the other ways it’s a bad idea.
“The cloud is where Moore’s law goes to die.”
cloud was supposed to be the cheap one stop shop where sheer numbers make overall prices low. but instead, they priced themselves out of existence. slowly, but surely. when you can run any offered services on your own for cheaper, then you know their entire business model is based on entrapment and and vendor lock-in, making leaving them engineering impossibility.
Serious businesses are not doing this.
I think non-cloud is the new monolith, which is fantastic.
"Major organizations like 37signals and GEICO". Sorry, what? Citing two companies? And how does a $37bn company compare to 37signals?
Such an odd pair of companies to choose. Is DHH friends with the author?
I'd be more interested in statistics about total cloud vs onprem spend across all companies, over time, to support assertion that "companies are ditching the cloud"
A very poor article
The statistics can be found in the public earnings of AWS vs the companies that would get paid for on-prem workloads (Equinix, Dell/HP/IBM, Intel etc).
It is not reputable article. Click bait.
I don't think the article concludes that companies are ditching the cloud.. :)