Comments Page - Do we understand how neural networks work?

« Back Do we understand how neural networks work?verysane.aiSubmitted by Philpax a day ago

KingMob a day ago
As a former neuroscientist, it's both amusing and sad to see AI people run into the same issues neuroscientists have grappled with for decades, as LLMs start to approach biological systems in complexity. (And mind you, LLMs are still smaller/simpler than the human brain.)
My hope is that the greater accessibility of neural networks over actual neurons will lead to some insights into biology. But I suspect they're still different enough, both in size, but more importantly in organization, that it won't be realized anytime soon.
The Golden Gate Claude is reminiscent of searches for the "grandmother" neuron. We know from direct brain stimulation that certain neurons are strongly associated with certain outputs...but we also know that a lot of knowledge representation is widely distributed. The whole thing is a mix of specialization and redundancy.
- nickm12 a day ago
  I was, for a time, a neuroscience major and have had this same thought. I'm concerned that we're treating these systems as engineered systems when they are closer to evolved or biological systems. They at at least different in that we can study them much more precisely than biological systems because their entire state is visible and we can run them on arbitrary inputs and measure responses.
  SEGyges 19 hours ago
  I agree, what we do is much closer to growing them than to engineering them. We basically engineer the conditions for growth, and then check the results and try again.
  My best argument that insights from neuroscience will transfer to neural networks, and vice versa:
  For sufficiently complex phenomena (e.g., language), there should only be one reasonably efficient solution to the problem, and small variations on that solution. So there should be some reversible mapping between any two tractable solutions to the problem that is pretty close to lossless, provided both solutions actually solve the problem.
  And, yeah, the main advantage of neural networks is that they're white-box. You can also control your experiments in a way you can't in the real world.
bob1029 19 hours ago
> You are training the neural network (a big block of matrices) using gradient descent (a trick that modifies the big block of matrices) to do something.
Things get even more mysterious (and efficient) if you introduce non-linearities, recurrence and a time domain. These mainstream ANN architectures are fairly pedestrian by comparison. You can visualize and walk through the statistical distributions. Everything is effectively instantaneous within each next token generation. It's hard to follow an action potential around in time because it's constantly being integrated with other potentials and its influence is only relevant for a brief moment.
d4rkn0d3z 17 hours ago
Of course we understand how they work, don't panic. Some of us are not even surprised at the results thus far, the good, the bad, and the ugly. We have understood how neural networks function for 50 years or more. I think the interdisciplinary nature of the technology is significant, NN draw from many areas of scientific study and if you are missing any piece of the puzzle then some mysteriousness arises for you. But take heart, as I said, there are people who are not surprised one iota by what is happening.
Dozens or perhaps hundreds of people have considered embarking on the brute force approach being pursued, nobody decided to try until a few years back because it is mathematically limited, costly, and might just make our problems much worse before it can make them better in any important way.
What is very salient is that 10 of the most powerful corporations and the leaders of the largest countries have decided on this route and they could force all software engineers to wear extra long, wide red ties if they wanted to so buckle up. It looks like we are riding this train all the way in the hope that cash + time will solve the truncation of the science.
- hodgehog11 16 hours ago
  As someone who works in the theory of deep learning, I'm curious to know what you mean by "understand how they work". Sure, we understand the broad strokes, and those tend to be good enough to chip away at making improvements and vague predictions of what certain things will do, but we still have a severe lack of understanding of their implicit biases which allow them to perform well and give rise to the scaling laws.
  We know a lot more about basically every other machine learning approach.
  d4rkn0d3z 15 hours ago
  Well, I'm thumb bashing on my phone so I can't get too far into the details as I see them. I'd read and respond to any papers you have?
  hodgehog11 14 hours ago
  Are you asking about open problems? Here are a couple:
  There are strong arguments that deep learning generalization (and robustness in a slightly different sense) can be successfully explained in terms of PAC-Bayes theory, for example: https://arxiv.org/abs/2503.02113
  This requires a strong understanding of the appropriate prior, which effectively quantifies implicit bias in the training setup / architecture setup. Early analyses suggest the bias is that of a minimal gradient norm (https://arxiv.org/pdf/2101.12176) but this is almost certainly not the case, e.g.(https://arxiv.org/abs/2005.06398). There are good empirical arguments that models are trained to achieve maximal compression (https://arxiv.org/abs/2211.13609). But we only know this in a vague sense. How are deep learning training procedures and architectures implicitly exhibiting this form of sparsity and compressing information? Why is this more effective for some optimizers/architectures than others? Why would minimum description length circuits be so effective at mimicking human intelligence? How does this form of regularisation induce the exponents in the neural scaling laws (so we can figure out how to best optimize for these exponents)? If we don't know the implicit bias and how it relates to the training procedure, we're just doing alchemy and effectively got lucky. Change the standard approach even slightly and things easily break. We only really know what to do by process of elimination.
  Then there are the LLM specific questions related to in-context learning (https://arxiv.org/abs/2306.00297), to help explain why next token prediction through transformer designs works so well. Does it implicitly act as an optimization procedure (with bias) relative to the context? What are the biases here? What is the problem being solved?
  These may sound specific, but they are just examples of some approaches to answer the big question of "why does this standard procedure work so well, and what can we do to make it work better". Without this, we're just following the heuristics of what came before without a guiding framework.
  Post-hoc, we can do a lot more with interpretability frameworks and linear probes and the like. But without the implicit biases, we are only getting a local picture, which doesn't help us understand what a model is likely to do before we run it. We need global information.
  These are just a few, and don't include the philosophical and theory of mind questions, and others from neuroscience...
markisus a day ago
I’m not sure that LLMs are solely autocomplete. The next token prediction task is only for pretraining. After that I thought you apply reinforcement learning.
- porridgeraisin 20 hours ago
  Nevertheless it is next token prediction. Each token it predicts is an action for the RL setup, and context += that_token is the new state. Solutions are either human labelled (RLHF) or to math/code problems (deterministic answer) and prediction == solution is used as the reward signal.
  Policy gradient approaches in RL are just supervised learning, when viewed through some lenses. You can search for karpathys more fleshed out argument for the same, I'm on mobile jow.
  SEGyges 20 hours ago
  My short explanation would be that even for RL, you are training on a next token objective; but the next token is something that has been selected very very carefully for solving the problem, and was generated by the model itself.
  So you're amplifying existing trajectories in the model by feeding the model's outputs back to itself, but only when those outputs solve a problem.
  This elides the kl penalty and the odd group scoring, which are the same in the limit but vastly more efficient in practice.
  astrange 19 hours ago
  Inference often isn't next token prediction though, either weakly (because of speculative decoding/multiple token outputs) or strongly (because of tool usage like web search).
  porridgeraisin 16 hours ago
  Well I may be misunderstanfing you, but speculative decoding is just using next token predictions from a few models(or many samples from one model) instead of just one sample. It is still next token prediction.
  Tool usage is also just next token prediction. You have it predict the next token of the syntax needed for tool use, and then it is fed the result of that in context which it then predicts the next token of.
  astrange 16 hours ago
  The text returned by the tool itself makes it not "next token prediction". Aside from having side effects, the reason it's helpful is that it's out of distribution for the model. So it changes the properties of the system.
  porridgeraisin 15 hours ago
  Ah ok, understood what you meant.
ninetyninenine a day ago
I've been saying this to people. Tons of people don't realize that we have no idea how these things work. So questions like is it AGI? It is conscious? Where the questions themselves contain words with fuzzy and ill defined meanings are pointless.
The core of the matter is: We don't know!
So when someone says chatGPT could be conscious. Or when someone says it can't be conscious. We don't actually know! And we aren't even fully sure about the definition of consciousness.
My problem with HN is that a ton of people here take the stance that we know how LLMs work and that we know definitively they aren't conscious and the people who say otherwise are alarmist and stupid.
I think the fact of the matter is, if you're putting your foot down and saying LLMs aren't intelligent... you're wildly illogical and misinformed about the status quo of Artificial intelligence and a huge portion of the mob mentality on HN thinks this way. It's like there's these buzz phrases getting thrown around saying that the LLM is just a glorified auto complete (which it is) and people latch on to this buzz phrases and their entire understanding of LLMs becomes basically a composition of these buzz concepts like "transformers" and "LLMs" and "chain of thought" when in actuality real critical thinking about what's going on in these networks tells you we don't UNDERSTAND anything.
Also the characterization in the article is mistaken. It says we understand LLMs in a limited way. Yeah sure. It's as limited as our understanding of the human brain. You know scientists found a way to stimulate the reward centers of the brain using electrodes and were able to cause a person to feel the utmost pleasure? The whole golden gate bridge thing is exactly that. You perturb something in the network and causes a semi predictable output. In the end we still generally don't get wtf is going on.
We literally understand nothing. What we do understand is so minuscule compared with what we don't understand that it's negligible.
- sirwhinesalot 20 hours ago
  This is such a weird take to me. We know exactly how LLMs and neural networks in general work.
  They're just highly scaled up versions of their smaller curve fitting cousins. For those we can even make pretty visualizations that show exactly what is happening as the network "learns".
  I don't mean "we can see parts of the brain light up", I mean "we can see the cuts and bends each ReLU is doing to the input".
  We built these things, we know exactly how they work. There's no surprise beyond just how good prediction accuracy gets with a big enough model.
  Deep Neural Networks are also a very different architecture from what is going on in our brains (which work more like Spiking Neural Networks) and our brains don't do backpropagation, so you can't even make direct comparisons.
  SEGyges 19 hours ago
  fortunately i wrote an entire post about what the difference is between the parts of this that it is easy to make sense of and the parts of it that it is prohibitively difficult to make sense of and it was posted on hackernews
  sirwhinesalot 19 hours ago
  Your article, unlike the bizarre desperate take from the poster above, is actually very good. We do not understand the features the neural net learns, that's 100% true (and really the whole point of them in the first place).
  For small image recognition models we can visualize them and get an intuition for what they are doing, but it doesn't really matter.
  For even smaller models, we can translate them to a classical AI model (like a mixed integer program as an example) and actually do various "queries" on the model itself to, e.g., learn that the network recognizes the number "8" by just checking 2 pixels in the image.
  None of this changes the fact that we know what these things are and how they work, because we built them. Any comparisons to our lack of knowledge of the human brain are ridiculous. LLMs are obviously not conscious, they don't even have real "state", they're an approximated pure function f(context: List<Token>) -> Token, that's run in a loop.
  The only valid alarmist take is that we're using black box algorithms to make decisions with serious real-world impact, but this is true of any black box algorithm, not just the latest and greatest ML models.
  dpoloncsak 13 hours ago
  Its a complex adaptive system, right? Isn't that the whole idea? We know how each part of the system works by itself. We know all inputs, and can measure outputs.
  I still (even if I actually understood the math) cannot tell you 'If you prompt 'x', the model will return 'y' with 100% confidence.
  sirwhinesalot 12 hours ago
  > If you prompt 'x', the model will return 'y' with 100% confidence.
  We can do this for smaller models. Which means it's a problem of scale/computing power rather than a fundamental limitation. The situation with the human brain is completely different. We know neurons exchange information and how that works, and we have a pretty good understanding of the architecture of parts of the brain like the visual cortex, but we have no idea of the architecture as a whole.
  We know the architecture of an LLM. We know how the data flows. We know what it is the individual neurons are learning (cuts and bends of a plane in a hyperdimensional space). We know how the weights are learned (backpropagation). We know the "algorithm" the LLM as a whole is approximating (List<Token> -> Token). Yes there are emergent properties we don't understand but the same is true of a spam filter.
  Comparing this to our lack of understanding of the human brain and discussing how these things might be "conscious" is silly.
  ninetyninenine 11 hours ago
  >Comparing this to our lack of understanding of the human brain and discussing how these things might be "conscious" is silly.
  Don't call my claim silly. I'm sick of your attitude. Why can't you have a civil discussion?
  Literally we don't know. You can't make a claim that it's silly when you can't even define what consciousness is. You don't know how human brains work, you don't know how consciousness forms, you don't know how emergence in LLMs work. So your claim here is logically just made up out of thin air.
  Sure we "understand" LLMs from the curve fitting perspective. But the entirety of why we use LLMs and what we use it for arises from the emergence which is what we don't understand. Curve fitting is like 1% of the LLM, it is the emergent properties we completely don't get (99%) and take advantage of on a daily basis. Curve fitting is just a high level concept that allows us to construct the algorithm which is the actual thing that does the hard work of wiring up the atomic units of the network.
  >Yes there are emergent properties we don't understand but the same is true of a spam filter.
  Yeah and so? Your statement proves nothing. It just illustrates a contrast in sentiment. The spam filter is a trivial thing, the human brain is not.
  We don't understand the spam filter. And this is the most interesting part of it all is that the SAME scaling problem that prevents us from understanding the spam filter can be characterized as the reason that prevents us from understanding BOTH the LLM and the human brain.
  Your statement doesn't change anything. It's just using sentiment to try to re-characterize a problem in a different light.
  ninetyninenine 12 hours ago
  https://youtu.be/qrvK_KuIeJk?t=284
  I don’t appreciate your comments. Especially rude to call me desperate and bizarre.
  Take a look at the above video where Geoffrey Hinton basically the god father of AI directly contradicts your statement.
  I sincerely hope you self reflect and are able to realize that you’re the one completely out of it.
  Realistically the differences get down to sort of a semantic issue. We both agree that there are things we don’t understand and things that we do understand. It’s just the overall aggregate generalization of this in your opinion comes down to: “we overall do understand” and mine is “we don’t understand shit”
  Again. Your aggregate is wrong. Utterly. Preeminent Experts are on my side. If we did understand LLMs we’d be able to edit the individual weights of each neuron to remove hallucinations. But we can’t. Like literally we know a solution to the hallucination problem exists. It’s in the weights. We know a certain configuration of weights can remove the hallucination. But even for a single prompt and answer pair we do not know how to modify the weights such that the hallucinations go away. We can’t even quantify, formally define or model what an hallucination is. We describe LLMs in human terms and we manipulate the thing through prompts and vague psychological methods like “chain of thought”.
  You think planes can work like this? Do we psychologically influence planes to sort of fly correctly?
  Literally. No other engineered system like this exists on earth where sheer lack of understanding is this large.
  sirwhinesalot 12 hours ago
  Sorry for the remark, it was indeed unnecessarily rude and I apologise.
  That said your appeal to authority means nothing to me. I can simply counter with another appeal to authority like Yann LeCun who thinks LLMs are an evolutionary dead end (and I agree).
  It matters not that we cannot comprehend them (in the sense of predicting what output it'll give for a given input). Doing that is their job. I also can't comprehend why a support vector machine ends up categorizing spam the way it does.
  In both cases we understand the algorithms involved which is completely different from our understanding of the brain and its emergent properties.
  ninetyninenine 12 hours ago
  Apology not accepted. Your initial take was to cast me as out of touch. With my appeal to authority you now realize that my stance occupies a very valid place even though you disagreed. Like it wasn’t just rude, statements like “bizarre” are inaccurate given that Geoffrey agrees with me. So even if I’m not offended by your rudeness it’s not a bizarre take at all. It’s a valid take.
  That being said. Yan Lecunn is not in agreement with you either. His only claim is that LLMs are not agi and that hallucinations on LLMs can never be removed.
  The debate here isn’t even about that. The debate here is that we don’t understand LLMs. Whether the LLM is agi or whether we can remove or never remove the hallucinations is COMPLETELY orthogonal.
  So you actually can’t counter with another appeal to authority. Either way I didn’t “just” appeal to authority. I literally logically countered every single one of your statements as well.
  sirwhinesalot 12 hours ago
  There's a huge jump from "we cannot predict the output of an LLM given its input" to "we don't understand LLMs", or that they might be conscious or that this is in any way equivalent to our lack of understanding of the human brain.
  We also don't understand (in that sense) any other ML model of sufficient size. It learning features we humans cannot come up with is its job. We can understand (in that sense) sufficiently small models because we have enough computational power to translate them to a classical AI model and query it.
  That means it is a problem of scale, not of some fundamental property unique to LLMs.
  The bizarre take is being spooked by this. It's been true of simpler models for a very long time. Not a problem.
  ninetyninenine 11 hours ago
  >There's a huge jump from "we cannot predict the output of an LLM given its input" to "we don't understand LLMs", or that they might be conscious or that this is in any way equivalent to our lack of understanding of the human brain.
  No it's not. There's huge similarities between artificial neural networks and the human brain. We not only understand atoms. We understand individual biological neurons. So the problem of understanding the human brain is in actuality ALSO a scaling problem. Granted I realize the human brain is much more complex in terms of network connections and how it rewires dynamically, but my point still stands.
  Additionally we can't even characterize the meaning of consciousness. Like you're likely thinking consciousness is some sort of extremely complex or very powerful concept. But the word is loaded and we don't know so much that we actually don't know this. Consciousness could be a very trivial thing, we actually have no idea.
  I agree that the brain is much more complex and much harder to understand and we understand much less. But this does not detract from the claim above that we fundamentally don't understand the LLM to such a degree that we can't even make a statement about whether or not an LLM is conscious or not. To reiterate PART of this comes from the fact that we ALSO don't understand what consciousness is itself.
  >The bizarre take is being spooked by this. It's been true of simpler models for a very long time. Not a problem.
  This is an hallucination by you. I'm not spooked at all. I don't know wwhere you're getting that from. My initial post, the tone was one of annoyance not "spooked". I'm annoyed by all the claims from people like you saying "we completely understand LLMs".
  I mean doesn't this show how similar you are to an LLM? You hallucinated that I was spooked when I indicated no such thing. I think here's a more realistic take: You're spooked. If what I said was categorically true, than you'd be spooked by the implications so part of what you do is to choose the most convenient reality that's within the realm of possibility such that you aren't spooked.
  Like I understand that classifying consciousness as this trivial thing that can possibly come about as an emergent side effect in an LLM could be a spooky thing. But think rationally. Given how much we don't know both about LLMs, human brains and consciousness, we in ACTUALITY don't know if this is what's going on. We can't make a statement either way. And this is the most logical explanation. It has NOTHING to do with being "spooked" which is an attribute that shouldn't be part of any argument.
  sirwhinesalot 9 hours ago
  Hackernews really isn't a good place for a serious discussion, so I'll just clarify my position.
  I think you're spooked for the same reason I think that all the "AI alarmists" whose alarmism is based on our lack of understanding of LLMs are spooked. That because we "lack understanding" it follows that AI is "out of our control" or is on the verge of becoming "conscious" or "intelligent", whatever that means.
  Except this isn't true to me. Yes, we can't predict how inputs will map to outputs, but that's nothing unexpected? This has been true of nearly every ML model in practical use (not just those based on neural nets) for a very long time.
  I don't perceive this as a "lack of understanding", in the same way I don't consider it a "lack of understanding" the inability to predict the output of a Support Vector Machine classifying email as spam, or not being able to predict how the coefficients of a radial basis function end up accurately approximating the behavior of a complex physical system. To me they're all a "lack of interpretability", which is a different thing.
  This is, to me, qualitatively different from our lack of understanding of the human brain. We know the algorithm an LLM is executing, because we set it up. We know how it learns, because we invented the algorithm that does it. We understand pretty well what's happening between the neurons because it's just a scaled up version of smaller models, whose behavior we have visualized and understand pretty well. We know how it "reasons" (in the sense of "thinking" models) because we set it up to "reason" in that matter from how we trained it.
  Our understanding of the human brain is not even close to this. We can't even understand the most basic of brains.
  Even postulating that LLMs are conscious, whatever that actually is in reality, is nonsensical. They're not even alive! What would "consciousness" even entail for a pure function? There's no reason to even bring that up other than to hype these things as more than what they are (be it positively or negatively).
  > I think the fact of the matter is, if you're putting your foot down and saying LLMs aren't intelligent... you're wildly illogical and misinformed about the status quo of Artificial intelligence
  They're just as intelligent as a chess engine is intelligent. They're algorithms.
  > Also the characterization in the article is mistaken. It says we understand LLMs in a limited way. Yeah sure. It's as limited as our understanding of the human brain.
  We understand enough about how they work that we know just forcing them to output more tokens leads to better results and we have a good intuition as to why (see: Karpathy's video on the subject). It's why when asked a math question they spit out a whole paragraph rather than the answer directly, and why "reasoning" is surprisingly effective (we can see from open models that reasoning often just spits out a giant pile of nonsense). More tokens = more compute = more accuracy. A bit similar to the number of noise removal steps in a diffusion model.
  ninetyninenine 8 hours ago
  >I think you're spooked for the same reason I think that all the "AI alarmists" whose alarmism is based on our lack of understanding of LLMs are spooked. That because we "lack understanding" it follows that AI is "out of our control" or is on the verge of becoming "conscious" or "intelligent", whatever that means.
  Yeah, well you made this shit up out of thin fucking air. I'm not spooked. We lack understanding of it, but it doesn't mean we can't use it. It doesn't mean there's going to be a skynet level apocalypse. You notice I didn't say anything like that? There's literally no evidence for it AND i never said anything like that. Here's what I think, I don't know how an airplane works. And I'm fine with it, I still can ride an airplane. I also don't know how an LLM works. I'm also fine with it. It just so happens, nobody knows how an LLM works. I'm also fine with that.
  This spooked bullshit came out of your own hallucination. You made that shit up. My initial post is NOT meant to be alarmist. It's meant to be fucking annoyed at people who want to utterly deny everything to be totally simple and we totally get it. The fact of the matter is, we may not understand it, but I don't think anything catastrophic is going to emerge from the fact we don't understand it. Even if the LLM is sentient I don't think there's much to fear.
  However this doesn't mean that what the alarmists say isn't real. We just don't know.
  >Except this isn't true to me. Yes, we can't predict how inputs will map to outputs, but that's nothing unexpected? This has been true of nearly every ML model in practical use (not just those based on neural nets) for a very long time.
  Doesn't change the fact we don't fucking know what's going on. Like I said. This is something you're spooked about IF what I said was true. I'm not spooked about it period. Your adding shit to the topic that's OFF topic.
  >I don't perceive this as a "lack of understanding", in the same way I don't consider it a "lack of understanding" the inability to predict the output of a Support Vector Machine classifying email as spam, or not being able to predict how the coefficients of a radial basis function end up accurately approximating the behavior of a complex physical system. To me they're all a "lack of interpretability", which is a different thing.
  This isn't a perception problem. It's not as if you perceive something in a different way suddenly your perception is valid. NO. We Categorically DO NOT understand it. Stop playing with words. Lack of understanding IS lack of interpretability. It's the same fucking thing. If you can't interpret what happened you don't understand what happened.
  Maybe what you're trying to say here is that we understand LLMs enough in such a way that you aren't spooked. Since you made up all that bullshit about me being spooked, I'm guessing that's what you mean. But the fact of the matter remains the same: We UNDERSTAND LESS about LLMs then what we currently know.
  >This is, to me, qualitatively different from our lack of understanding of the human brain. We know the algorithm an LLM is executing, because we set it up. We know how it learns, because we invented the algorithm that does it. We understand pretty well what's happening between the neurons because it's just a scaled up version of smaller models, whose behavior we have visualized and understand pretty well. We know how it "reasons" (in the sense of "thinking" models) because we set it up to "reason" in that matter from how we trained it.
  Sure there are differences. That's obvious. But the point is we STILL don't understand LLMs in essence. That is still true despite your comparison here.
  >Our understanding of the human brain is not even close to this. We can't even understand the most basic of brains.
  If we understand 1 percent of LLMs but only 0.1% of the human brain. That's a 10x dramatic increase in our understanding of LLMs OVER the brain. But it still doesn't change my main point: Overall we. don't. understand. how. LLMs. work. This is exactly the way I would characterize our overall understanding holistically.
  >Even postulating that LLMs are conscious, whatever that actually is in reality, is nonsensical. They're not even alive! What would "consciousness" even entail for a pure function? There's no reason to even bring that up other than to hype these things as more than what they are (be it positively or negatively).
  Your statement is in itself nonsensical because you don't even know what consciousness or being alive even means. Like there are several claims here made about things you don't know about: LLMs and human brains made using words you can't even define: "alive" and "conciousness". Like the rational point by point thing you need to realize is that you're not rationally constructing your claim from logic. You're not saying we know A therefore B must be true. <--- that is how you construct an argument.
  While I'm saying you're making claim A, using B, C and E but we don't know anything about B, C and E so your claim is baseless. You get it? We don't know.
  >They're just as intelligent as a chess engine is intelligent. They're algorithms.
  But you don't understand how the emergent of effects of the algorithm works so you can't make the claim that they are as intelligent as a chess engine. See? You make claim A and I said your claim A is based on fact B but B is something you don't know anything about. Can you counter this? No.
  >We understand enough about how they work that we know just forcing them to output more tokens leads to better results and we have a good intuition as to why (see: Karpathy's video on the subject). It's why when asked a math question they spit out a whole paragraph rather than the answer directly, and why "reasoning" is surprisingly effective (we can see from open models that reasoning often just spits out a giant pile of nonsense). More tokens = more compute = more accuracy. A bit similar to the number of noise removal steps in a diffusion model.
  This is some trivial ball park understanding that is clearly equivalent to overall NOT understanding. You're just describing something like curve fitting again.
  sirwhinesalot 7 hours ago
  > Maybe what you're trying to say here is that we understand LLMs enough in such a way that you aren't spooked. Since you made up all that bullshit about me being spooked, I'm guessing that's what you mean.
  Correct. I bundled you with the alarmists who speak in similar ways, as pattern matching brains tend to do. Not an hallucination in the LLM sense, just standard probability.
  > If we understand 1 percent of LLMs but only 0.1% of the human brain. That's a 10x dramatic increase in our understanding of LLMs OVER the brain. But it still doesn't change my main point: Overall we. don't. understand. how. LLMs. work. This is exactly the way I would characterize our overall understanding holistically.
  And it's not how I characterize it at all. What algorithm is your brain running right now? Any idea? We have no clue. We know the algorithm the LLM is executing: it's a token prediction engine running in a loop. We wrote it. We know enough about how it works to know how to make it better (e.g., Mixture of Experts, "Reasoning").
  This is not a "0.1x" or "10x" or whatever other quantitative difference, it's a qualitative difference in understanding. Not being able to predict the input-output relationship of any sufficiently large black-box algorithm does not give one carte-blanche to jump to conclusions regarding what they may or may not be.
  How large does a black-box model need to be before you entertain that it might be "conscious" (whatever that may actually be). Is a sufficiently large spam filter conscious? Is it even worth entertaining such an idea? Or is it just worth entertaining for LLMs because they write text that is sufficiently similar to human written text? Does this property grant them enough "weight" that questions regarding "consciousness" are even worth bringing up? What about a starcraft playing bot based on reinforcement learning? Is it worth bringing up for one? We "do not understand" how they work either.
  ninetyninenine 5 hours ago
  >Correct. I bundled you with the alarmists who speak in similar ways, as pattern matching brains tend to do. Not an hallucination in the LLM sense, just standard probability.
  First off what you did here is common for humans.
  Second off it's the same thing that happens for LLMs. You don't know fact from fiction, neither does the LLM, so it predicts something probable given limited understanding. It is not different. You made shit up. You hallucinated off of a probable outcome. LLMs do the same.
  Third, as a human, it's on you when you don't verify the facts. You make shit up on accident that's your fault and your reputation on the line. It's justified here to call you out for making crap up out of thin air.
  Either make better guesses or don't guess at all. For example this guess: "Maybe what you're trying to say here is that we understand LLMs enough in such a way that you aren't spooked." was spot on by your own admission.
  >And it's not how I characterize it at all. What algorithm is your brain running right now? Any idea? We have no clue. We know the algorithm the LLM is executing: it's a token prediction engine running in a loop. We wrote it. We know enough about how it works to know how to make it better (e.g., Mixture of Experts, "Reasoning").
  This has nothing to do with quantization, that's just an artifact of the example I'm using and is only there to illustrate relative differences in the amount we know.
  Your characterization is that we know MUCH more about the LLM than we do the brain. So I'm illustrating that, while, yeah EVEN though your characterization is true THE amount we know about the LLM is still miniscule. Hence the 10x improvement on 1% from 0.1%. In the end we still don't know shit, it's still at most 1% of what we need to know. Quantization isn't the point, it wasn't your point, it's not mine. It's here to illustrate proportion of knowledge WHICH was indeed your POINT.
  >How large does a black-box model need to be before you entertain that it might be "conscious" (whatever that may actually be). Is a sufficiently large spam filter conscious?
  I don't know. You don't know either. We both don't know. Because like I said we don't even know what the word means.
  >Is it even worth entertaining such an idea?
  Probably not for a spam box filter. But technically speaking We. don't. actually. know.
  However, qualitatively speaking it is worth Entertaining the idea for an LLM Given how similar it is to humans. We both understand WHY plenty of people are entertaining the idea. Right? you and I totally get it. What I'm saying is that GIVEN that we don't know either way, we can't dismiss what other people are entertaining.
  Also your method of rationalizing all of this is flawed. Like you, use comparisons to justify your thoughts. You don't want to think a spam filter is sentient so you think if the spam filter is comparable to an LLM then we must think an LLM isn't sentient. But that doesn't logically flow right? How is a spam filter similar to an LLM? There are differences right? Just because they share similarities doesn't make your argument suddenly logically flow. There are similarities between spam filters and humans too! We both use neural nets? Therefore since spam filters aren't sentient, humans aren't either? Do you see how this line of reasoning can be fundamentally misapplied everywhere?
  I mean the comparison logic is flawed, ON top of the fact that we don't even know what we're talking about... i mean... What is consciousness? And we don't in actuality understand the spam filter enough to know if it's sentient. I mean if ONE aspect of your logic made sense we could possibly say I'm just being pedantic that certain assumptions are given... but your logic is broken everywhere. Nothing works. So I'm not being pedantic.
  >Or is it just worth entertaining for LLMs because they write text that is sufficiently similar to human written text?
  Yes. Many people would agree. It's worth entertaining. This "worth" is a human measure. Not just qualitative, but also opinionated and it is of my opinion and many peoples opinion that "yes" it is worth it. Hence why there's so much debate around it. Even if you don't feel it's worth "entertaining" at least you have the intelligence to understand why so many people think it's worth it to discuss.
  >What about a starcraft playing bot based on reinforcement learning? Is it worth bringing up for one? We "do not understand" how they work either.
  Most people are of the opinion that "no" it is not worth understanding. It is better to ask this question of the LLM. Of course you bring up these examples because you think the comparison chains everything together. You think if it's not worth it for the spam filter it's not worth it in your mind to consider sentience for anything that is in your opinion "comparable" to it. And like I deduced earlier I'm saying, you're wrong, this type of logic doesn't work.
  polotics 16 hours ago
  Coud you provide a link so we can follow your thread of thought on this? It appears your article got submitted by another user than you.
  ninetyninenine 13 hours ago
  >This is such a weird take to me. We know exactly how LLMs and neural networks in general work.
  Why do you make such crazy statements when the article you’re responding to literally says the opposite of this. Like your statement is categorically false and both the article and I are in total contradiction to it. The rest of your post doesn’t even get into the nuances of what we do and don’t understand.
  > They're just highly scaled up versions of their smaller curve fitting cousins. For those we can even make pretty visualizations that show exactly what is happening as the network "learns".
  This proves nothing. Additionally is in direct contradiction to what the OP and I wrote. You describe the minuscule aspects that we do understand but neglect to describe the fact that we overall don’t understand. Saying we understand curve fitting is like saying we understand the human brain because we can completely understand how atoms work and the human brain is made up of atoms therefore we understand the human brain. No. We understand atoms. We don’t understand the human brain even though we know the brain is just a bunch of atoms. The generalization of this is that the amount we don’t understand eclipses by a massive amount the parts we do understand.
  Let me spell it out to your brain what this means overall: we don’t understand shit.
  > I don't mean "we can see parts of the brain light up", I mean "we can see the cuts and bends each ReLU is doing to the input".
  Even with direct insight on every single signal propagating through the feed forward network we still don’t understand it. Like the OPs article he says it’s a scaling problem. Again I don’t understand how come you’re only referring to the miniscule aspects we do understand and not even once mentioning the massive aspects we don’t understand.
  > We built these things, we know exactly how they work. There's no surprise beyond just how good prediction accuracy gets with a big enough model.
  No we don’t. Sometimes I find basic rationality and common sense just doesn’t work with people. What ends up working is citing experts in the field saying the exact fucking opposite of you: https://youtu.be/qrvK_KuIeJk?t=284
  That’s Geoffrey Hinton, the guy who made ML relevant again in the last decade. Like literally word for word saying exactly opposite of what you’re saying. The interviewer even said “we built the thing, we know how it works” and Geoffrey is like “no we don’t”. Bro. Wake up.
  > Deep Neural Networks are also a very different architecture from what is going on in our brains (which work more like Spiking Neural Networks) and our brains don't do backpropagation, so you can't even make direct comparisons.
  Bro we don’t understand deep neural networks. Throwing big words like back propagation and even understanding that it’s just the chain rule from calculus doesn’t mean shit. We overall don’t understand it because it’s a scaling problem. There’s no theory or model that characterizes how a signal propagates through a trained network. It’s like saying you know assembly language and therefore you understand all of Linux. No we understand assembly but these learning algorithms are what built Linux and thus we don’t understand it because we didn’t build it. We don’t understand the human brain and we don’t understand deep neural networks.
  You know what. Just take a look at what Geoffrey Hinton says. Like if you feel my take is so bizarre and felt the need to comment on it I hope his take rewires your brain and helps you realize you’re the one out of touch. Rationality rarely changes people’s minds, but someone with a higher reputation often is capable of actually getting a person out of their own biased headspace. So listen to what he says then couple it with the explanation here and completely flip your opinion. Or don’t. I rarely see people just do an opinion flip. Humans are just too biased.
Nevermark 18 hours ago
Here is a slightly sideways take on the question. Since some misunderstandings crop up over and over.
Things that are not true about neural networks and LLMs:
1. They are not stochastic parrots. Statistics can be used to judge model performances, as with any type of model whatsoever.
But that doesn’t make a model statistical. Neural network components are not statistically driven, they are gradient driven. They don’t calculate statistics, they learn topological representations of relationships.
2. LLMs are not just doing word prediction or text completion. Yes, that is the basic task they are trained on, but the “just” (that is often stated or implied) trivializes what the network actually has to learn to perform well.
Task type, and what must be learned to achieve success at that task, are two entirely different things.
To predict the kinds of human reasoning documented in writing in training sets, requires that kind of reasoning be learned. Not just some compressed generalization of people’s particular responses.
Simple proof that LLMs are not just compressing a lot of human behavior comes easily. Just ask an LLM to do something involving several topics unlikely to have been encountered together before, and their one-shot answer might not be God’s word on the issue, but is a far cry from the stumbling that good mimic could ever do. (Example task, ask for a Supreme Court brief to argue for rabbits rights based on sentient animal and native rights, with serious arguments, but written in Dr. Seuss prose by James Bond.)
3. LLMs do reason. Not like us or as well as us. But also, actually better than us in some ways.
LLMs are far superior to us at very wide, somewhat shallow reasoning.
They are able to one-shot weave together information from tens of thousands of disparate topics and ideas, on demand.
But they don’t notice their own blind spots. They miss implications. Those are things humans do quickly in continuous narrower deeper reasoning cycles.
They are getting better.
And some of their failures should be attributed to the strict constraint of one/few-shot(s) to well written responses, that we put on them.
We don’t hold humans to such strict standards. And if we did, we would also make a lot more obvious errors.
Wide & shallow reasoning in ways humans can’t match, is not a trivial success.
4. LLMs are very creative.
As with their reasoning, not like us: very wide in terms of instantly and fluidly weaving highly disparate information together. Somewhat shallow (but gaining) in terms of iteratively recognizing and self-correcting their own mistakes and oversights.
See random original disparate topic task above.
First, spend several hours performing that task oneself. Or take a day or two if need be.
Then, give the task to an LLM and wait a few seconds.
Compare. Ouch.
—
TLDR; somewhat off topic and high level, but when trying to understand models it helps to avoid seemingly endlessly repeated misunderstandings.
- cwmoore 10 hours ago
  “LLMs are far superior to us at very wide, somewhat shallow reasoning.”
  So how much will it cost to autocomplete Wikipedia?