Maybe this is a hot take, but I think it's because they just don't need to. The amount of money they make every year is massively more than the amount of money they need to spend to keep making that money, so they have plenty of extra to spend on this and then some. Couple that with a potential upside of either global domination or elimination of their entire workforce and it's a no-brainer even if it takes another decade or more. They don't need to worry about putting too much in until their investors demand that they stop. It's not like they aren't still turning massive profits.
It's easy to forget, but ChatGPT is barely 2 years old, and the difference between then and now is quite substantial.
Because the moment you slow down investing is the moment you realize you wasted all your previous investment in the technology. It's just sunk cost fallacy.
Anybody want to summarize for those of us who don't feel like shelling out over $500/yr for a WSJ subscription?
As the theory goes, there is a network effect forming. Early AI products with strong adoption get the users to implicitly and explicitly label data. This leads to improvements that are a moat due to “data flywheel”
So there’s a “first mover” type advantage because you can’t get that usage data anywhere else - not even paying participants - because there exists a domain gap between offline and online measurement.
There is a reasonable counter argument of “the bitterest lesson” which says all that matters is the improvements in foundational models.
the tl;dr is "reasoning models".
That's what costs more, but it does not answer why the companies are choosing to pursue it.
And to be more specific, the fact that reasoning models have outputs orders of magnitude longer than other models.
It writes more text so it spends more compute.