More information about the language used in the proofs: https://en.wikipedia.org/wiki/Lean_(proof_assistant)
in the first question, why do they even specify ⌊n⌋ (and ⌊2n⌋ and so on) when n is an integer?
Alpha need not be an integer, we have to prove that it is
Should have read more carefully, thank you!
“Only 5/509 participants solved P6”
This has to come with an asterisk, which is that participants had approximately 90 minutes to work on each problem while AlphaProof computed for three days for each of the ones it solved. Looking at this problem specifically, I think that many participants could have solved P6 without the time limit.
(I think you should be very skeptical of anyone who hypes AlphaProof without mentioning this - which is not to suggest that there's nothing there to hype)
Certainly an interesting information that AlphaProof needed three days. But does it matter for evaluating the importance of this result? No.
I agree that the result is important regardless. But the tradeoff of computing time/cost with problem complexity is hugely important to think about. Finding a proof in a formal language is trivially solvable in theory since you just have to search through possible proofs until you find one ending with the desired statement. The whole practical question is how much time it takes.
Three days per problem is, by many standards, a 'reasonable' amount of time. However there are still unanswered questions, notably that 'three days' is not really meaningful in and of itself. How parallelized was the computation; what was the hardware capacity? And how optimized is AlphaProof for IMO-type problems (problems which, among other things, all have short solutions using elementary tools)? These are standard kinds of critical questions to ask.
Though, if you start solving problems that humans can't or haven't solved, then questions of capacity won't matter much. A speedup in the movement of the maths frontier would be worth many power stations.
For some time a 'superhuman math AI' could be useful for company advertising and getting the attention of VCs. But eventually it would be pretty clear that innovative math research, with vanishingly few exceptions, isn't very useful for making revenue. (I am a mathematician and this is meant with nothing but respect for math research.)
If you were to bet on solving problems like "P versus NP" using these technologies combined with human augmentation (or vice versa), what would be the provable time horizon for achieving such a solution? I think we should assume that the solution is also expressible in the current language of math/logic.
The hard part is in the creation of new math to solve these problems not in the use of existing mathematics. So new objects (groups rings fields) etc have to be theorized, their properties understood, and then that new machinery used to crack the existing problems. I think we will get to a place (around 5 years) where AI will be able to solve these problems and create these new objects. I don’t think it’s one of technology I think it’s more financial. Meaning, there isn’t much money to be made doing this (try and justify it for yourself) and so the lack of focus here. I think this is a red herring and there is a gold mine in there some where but it will likely take someone with a lot of cash to fund it out of passion (Vlad Tenev / Harmonic, or Zuck and Meta AI, or the Google / AlphaProof guys) but in the big tech world, they are just a minnow project in a sea of competing initiatives. And so that leaves us at the mercy of open research, which if it is a compute bound problem, is one that may take 10-20 years to crack. I hope I see a solution to RH in my lifetime (and in language that I can understand)
I understand that a group of motivated individuals, even without significant financial resources, could attempt to tackle these challenges, much like the way free and open-source software (FOSS) is developed. The key ingredients would be motivation and intelligence, as well as a shared passion for advancing mathematics and solving foundational problems.
Ok but how do you get around needing a 10k or 100k h100 cluster
It is well known that cloud services like Google Cloud subsidizes some projects and we don't even know if in a few years improvements will arise.
Probably a bad example, P vs NP is the most likely of the millennium problems to be unsolvable, so the answer may be "never".
I'll bet the most technical open problems will be the ones to fall first. What AIs lack in creativity they make up for in ability to absorb a large quantity of technical concepts.
Thank you for the response. I have a follow-up question: Could these AIs contribute to advancements in resolving the P vs NP problem? I recall that the solution to Fermat’s Last Theorem relied on significant progress in elliptic curves. Could we now say that these AI systems might play a similar role in advancing our understanding of P vs NP?
Just my guess as a mathematician. But if LLMs are good for anything it will be for finding surprising connections and applying our existing tools in ways beyond human search. There's a huge space of tools and problems, and human intuition and brute force searching can only go so far. I can imagine that LLMs might start to find combinatorial proofs of topological theorems, maybe even novel theorems. Or vice versa. But I find it difficult to imagine them inventing new tools and objects that are really useful.
No one is focused on those. They're much more focused on more rote problems.
You might find them used to accelerate research math by helping them with lemmas and checking for errors, and formalizing proofs. That seems realistic in the next couple of years.
There are some AI guys like Christian Szegedy who predict that AI will be a "superhuman mathematician," solving problems like the Riemann hypothesis, by the end of 2026. I don't take it very seriously, but that kind of prognostication is definitely out there.
Why have they still not released a paper aside from a press release? I have to admit I still don't know how auspicious it is that running google hardware for three days apiece was able to find half-page long solutions, given that the promise has always been to solve the Riemann hypothesis with the click of a button. But of course I do recognize that it's a big achievement relative to previous work in automatic theorem proving.
I don't know why so few people realize this, but by solving any of the problems their performance is superhuman for most reasonable definitions of human.
Talking about things like solving the Reimman hypothesis in so many years assumes a little too much about the difficulty of problems that we can't even begin to conceive of a solution for. A better question is what can happen when everybody has access to above average reasoning. Our society is structured around avoiding confronting people with difficult questions, except when they are intended to get the answer wrong.
Well, multiply two large numbers instantly is a superhuman feat a calculator can do. I would hope we are going for a higher bar, like “useful”. Let’s see if this can provide proofs of novel results.
That superhuman capability of "multiplying two large numbers instantly" has transformed human society like not even plumbing has. I really can't see how this you could not consider this useful.
The ability to multiply two numbers superhumanly has already transformed society.
It's worth emphasizing that it's been possible for years to use an automatic theorem prover to prove novel results. The whole problem is to get novel interesting results.
We know that any theorem that is provable at all (in the chosen foundation of mathematics) can be found by patiently enumerating all possible proofs. So, in order to evaluate AlphaProof's achievements, we'd need to know how much of a shortcut AlphaProof achieved. A good proxy for that would be the total energy usage for training and running AlphaProof. A moderate proxy for that would be the number of GPUs / TPUs that were run for 3 days. If it's somebody's laptop, it would be super impressive. If it's 1000s of TPUs, then less so.
> We know that any theorem that is provable at all (in the chosen foundation of mathematics) can be found by patiently enumerating all possible proofs.
Which computer science theorem is this from?
It's a direct consequence of the format of a proof. They're finite-length sequences of a finite alphabet of symbols, so of course they're enumerable (the same algorithm you use to count works to enumerate them).
If you want a computer to be able to tell that it found a correct proof once it enumerates it, you need a bit more than that (really just the existence of automated proof checkers is enough for that).
I guess it is tautological from the definition of "provable". A theorem is provable by definition if there is a finite well-formulated formula that has the theorem as consequence (https://en.wikipedia.org/wiki/Theorem paragraph theorem in logic)
It’s just an obvious statement. If a proof exists, you will eventually get to it.
> A better question is what can happen when everybody has access to above average reasoning. Our society is structured around avoiding confronting people with difficult questions, except when they are intended to get the answer wrong.
What does this have to do with a hypothetical automatic theorem prover?
Logic is pretty much absent from our culture and daily life, but that could be due to its limited supply.
Being logical in social life is pretty much completely different from being logical in a mathematical argument, especially in a formal theorem proving environment. (Just try to write any kind of cultural proposition in a formal language!)
Google stopped publishing interesting AI work since they had their AI lead taken away by OpenAI, and mostly with tech that was pioneered, but not monetised by Google like transformers.
I imagine they are under pressure not to make this mistake again.