Comments Page - Python performance myths and fairy tales

« Back Python performance myths and fairy taleslwn.netSubmitted by todsacerdoti a day ago

btown a day ago
I think an important bit of context here is that computers are very, very good at speculative happy-path execution.
The examples in the article seem gloomy: how could a JIT possibly do all the checks to make sure the arguments aren’t funky before adding them together, in a way that’s meaningfully better than just running the interpreter? But in practice, a JIT can create code that does these checks, and modern processors will branch-predict the happy path and effectively run it in parallel with the checks.
JavaScript, too, has complex prototype chains and common use of boxed objects - but v8 has made common use cases extremely fast. I’m excited for the future of Python.
- jerf 20 hours ago
  That makes it so that in absolute terms, Python is not as slow as you might naively expect.
  But we don't measure programming language performance in absolute terms. We measure them in relative terms, generally against C. And while your Python code is speculating about how this Python object will be unboxed, where its methods are, how to unbox its parameters, what methods will be called on those, etc., compiled code is speculating on actual code the programmer has written, running that in parallel, such that by the time the Python interpreter is done speculating successfully on how some method call will resolve with actual objects the compiled code language is now done with ~50 lines of code of similar grammatical complexity. (Which is a sloppy term, since this is a bit of a sloppy conversation, but consider a series "p.x = y"-level statements in Python versus C as the case I'm looking at here.)
  There's no way around it. You can spend your amazingly capable speculative parallel CPU on churning through Python interpretation or you can spend it on doing real work, but you can't do both.
  After all, the interpreter is just C code too. It's not like it gets access to special speculation opcodes that no other program does.
  Demiurge 17 hours ago
  I love this “real work”. Real work, like writing linked lists, array bounds checking, all the error handling for opening files, etc, etc? There is a reason Python and C both have a use case, and it’s obvious Python will never be as fast as C doing “1 + 1”. The real “real work” is in getting stuff done, not just making sure the least amount of cpu cycles are used to accomplish some web form generation.
  Anyway, I think you’re totally right, in your general message. Python will never be the fastest language in all contexts. Still, there is a lot of room for optimization, and given it’s a popular language, it’s worth the effort.
  jerf 16 hours ago
  I can't figure out what your first paragraph is about. The topic under discussion is Python performance. We do not generally try to measure something as fuzzy as "real work" as you seem to be using the term in performance discussions because what even is that. There's a reason my post referenced "lines of code", still a rather fuzzy thing (which I already pointed out in my post), but it gets across the idea that while Python has to do a lot of work for "x.y = z" for all the things that "x.y" might mean including the possibility that the user has changed what it means since the last time this statement ran, compiled languages generally do over an order of magnitude less "work" in resolving that.
  This is one of the issues with Python I've pointed out before, to the point I suggest that someone could make a language around this idea: https://jerf.org/iri/post/2025/programming_language_ideas/#s... In Python you pay and pay and pay and pay and pay for all this dynamic functionality, but in practice you aren't actually dynamically modifying class hierarchies and attaching arbitrary attributes to arbitrary instances with arbitrary types. You pay for the feature but you benefit from them far less often than the number of times Python is paying for them. Python spends rather a lot of time spinning its wheels double-checking that it's still safe to do the thing it thinks it can do, and it's hard to remove that even in JIT because it is extremely difficult to prove it can eliminate those checks.
  Demiurge 14 hours ago
  I understand what you're saying. In a way, my comment is actually off-topic to most of your comment. What I was saying in my first paragraph is that the words you use in your context of a language runtime in-effeciency, can be used to describe why these in-effeciences exist, in the context of higher level processes, like business effeciency. I find your choice of words amusing, given the juxtoposition of these contexts, even saying "you pay, pay, pay".
  mmcnl 15 hours ago
  You claimed churning through Python interpretation is not "real work". You now correctly ask the question: what is "real work"? Why is interpreting Python not real work, if it means I don't have to check for array bounds?
  coldtea 10 hours ago
  >Why is interpreting Python not real work, if it means I don't have to check for array bounds?
  Because other languages can do that for you too, much much faster...
  btown 17 hours ago
  To put it another way, I choose Python because of its semantics around dynamic operator definition, duck typing etc.
  Just because I don’t write the bounds-checking and type-checking and dynamic-dispatch and error-handling code myself, doesn’t make it any less a conscious decision I made by choosing Python. It’s all “real work.”
  kragen 16 hours ago
  Type checking and bounds checking aren't "real work" in the sense that, when somebody checks their bank account balance on your website or applies a sound effect to an audio track in their digital audio workstation, they don't think, "Oh good! The computer is going to do some type checking for me now!" Type checking and bounds checking may be good means to an end, but they are not the end, from the point of view of the outside world.
  Of course, the bank account is only a means to the end of paying the dentist for installing crowns on your teeth and whatnot, and the sound effect is only a means to the end of making your music sound less like Daft Punk or something, so it's kind of fuzzy. It depends on what people are thinking about achieving. As programmers, because we know the experience of late nights debugging when our array bounds overflow, we think of bounds checking and type checking as ends in themselves.
  But only up to a point! Often, type checking and bounds checking can be done at compile time, which is more efficient. When we do that, as long as it works correctly, we never† feel disappointed that our program isn't doing run-time type checks. We never look at our running programs and say, "This program would be better if it did more of its type checks at runtime!"
  No. Run-time type checking is purely a deadweight loss: wasting some of the CPU on computation that doesn't move the program toward achieving the goals we were trying to achieve when we wrote it. It may be a worthwhile tradeoff (for simplicity of implementation, for example) but we must weigh it on the debit side of the ledger, not the credit side.
  ______
  † Well, unless we're trying to debug a PyPy type-specialization bug or something. Then we might work hard to construct a program that forces PyPy to do more type-checking at runtime, and type checking does become an end.
  rightbyte 14 hours ago
  > and the sound effect is only a means to the end of making your music sound less like Daft Punk or something
  What do you mean. Daft Punk is not daft punk. Why single them out :)
  kragen 13 hours ago
  Well, originally I wrote "more like Daft Punk", but then I thought someone might think I was stereotyping musicians as being unoriginal and derivative, so I swung the other way.
  Calavar 16 hours ago
  I believe they are talking about the processor doing real work, not the programmer.
  Demiurge 15 hours ago
  Yeah, I get it, but I found the choice of words funny, because these words can apply in the larger context. It's like saying, Python transfers work from your man hours to cpu hours :)
  CraigJPerry 15 hours ago
  > And while your Python code is speculating about how this Python object will be unboxed
  This is wrong i think? The GP is talking about JIT'd code.
  dragonwriter 17 hours ago
  > After all, the interpreter is just C code too.
  What interpreter? We’re talking about JITting Python to native code.
  qaq 16 hours ago
  Welp there is Mojo so looks like soon you will not really need to care that much. Prob will get better performance than C too.
  jerf 16 hours ago
  I've been hearing promises about "better than C" performance from Python for over 25 years. I remember them on comp.lang.python, back on that Usenet thing most people reading this have only heard about.
  At this point, you just shouldn't be making that promise. Decent chance that promise is already older than you are. Just let the performance be what it is, and if you need better performance today, be aware that there are a wide variety of languages of all shapes and sizes standing by to give you ~25-50x better single threaded performance and even more on multi-core performance today if you need it. If you need it, waiting for Python to provide it is not a sensible bet.
  qaq 11 hours ago
  I am a bit older than Python :). I imagine creator of clang and LLVM has fairly good grasp on making things performant. Think of Mojo as Rust with better ergonomics and more advanced compiler that you can mix and match with regular python.
  sevensor 10 hours ago
  I maintain a program written in Python that is faster than the program written in C that it replaces. The C version can do a lot more operations, but it amounts to enumerating 2^N alternatives when you could enumerate N alternatives instead.
  Certainly my version would be even faster if I implemented it in C, but the gains of going from exponential to linear completely dominate the language difference.
  patmorgan23 44 minutes ago
  So you're saying two different programs implementing two different algorithms perform differently and that lets you draw a conclusion about how the underlying language/compliers/interpreters behave?
  Have you ever heard of a controlled variable?
  vhantz 3 hours ago
  Yeah let's just compare apple to oranges
  hnfong 14 hours ago
  You're probably right, Mojo seems to be more "python-like" than actually source-compatible with python. Bunch of features notably classes are missing.
  qaq 11 hours ago
  Give em a bit of time it's pretty young lang
  lenkite 16 hours ago
  Mojo feels less like a real programming language for humans and primarily a language for AI's. The docs for the language immediately dive into chatbots and AI prompts.
  qaq 11 hours ago
  I mean thats the use case they care about for obvious reasons but it's not the only use case
- DanielHB 21 hours ago
  The main problem is when the optimizations silently fail because of seemingly innocent changes and suddenly your performance tanked 10x. This is a problem with any language really (CPU cache misses are a thing afterall and many non-dynamic languages have boxed objects) but it is a much, much worse in dynamic languages like Python, JS and Ruby.
  Most of the time it doesn't matter, most high-throughput python code just invokes C/C++ where these concerns are not as big of a problem. Most JS code just invokes C/C++ browser DOM objects. As long as the hot-path is not in those languages you are not at such high risk of "innocent change tanked performance"
  Even server-side most JS/Python/Ruby code is just simple HTTP stack handlers and invoking databases and shuffling data around. And often large part of the process of handling a request (encoding JSON/XML/etc, parsing HTTP messages, etc) can be written in lower-level languages.
- fpoling 19 hours ago
  Although JS supports prototype mutations, the with operator and other constructs that make optimization harder, typical JS code does not use that. Thus JIT can add few checks for presence of problematic constructions to direct it to a slow path while optimizing not particularly big set of common patterns. And then the JS JIT does not need to care much about calling arbitrary native code as the browser internals can be adjusted/refactored to tune to JIT needs.
  With Python that does not work. There are simply more optimization-unfriendly constructs and popular libraries use those. And Python calls arbitrary C libraries with fixed ABI.
  So optimizing Python is inherently more difficult.
- josefx 19 hours ago
  > but v8 has made common use cases extremely fast. I’m excited for the future of Python.
  Isn't v8 still entirely single threaded with limited message passing? Python just went through a lot of work to make multithreaded code faster, it would be disappointing if it had to scrap threading entirely and fall back to multiprocessing on shared memory in order to match v8.
  zozbot234 18 hours ago
  Multithreaded code is usually bottlenecked by memory bandwidth, even more so than raw compute. C/C++/Rust are great at making efficient use of memory bandwidth, whereas scripting languages are rather wasteful of it by comparison. So I'm not sure that multithreading will do much to bridge the performance gap between binary compiled languages and scripting languages like Python.
  loeg 18 hours ago
  JS is single-threaded. Python isn't.
- mcdeltat 18 hours ago
  I wonder if branch prediction can still hide the performance loss when the happy path checks become large/complex. Branch prediction is a very low level optimisation. And if the predictor is right you don't get everything for free. The CPU must still evaluate the condition, which takes resources, albeit it's no longer on the critical path. However I'd think the CPU would stall if it got too far ahead of the condition execution (ultimately all the code must execute before the program completes). Perhaps given the nature of Python, the checks would be so complex that in a tight loop they'd exert significant resource pressure?
- nxobject 21 hours ago
  To be slightly flip, we could say that the Lisp Machine CISC-supports-language full stack design philosophy lives on in how massive M-series reorder buffers and ILP supports JavaScriptCore.
dgan 21 hours ago
"Rewrite the hot path in C/C++" is also a landmine because how inefficient the boundary crossing is. so you really need "dispatch as much as possible at once" instead of continuously calling the native code
- IshKebab 16 hours ago
  And it's not just inefficiency. Even with fancy FFI generators like PyO3 or SWIG, adding FFI adds a ton of work, complexity, makes debugging harder, distribution harder, etc.
  In my opinion in most cases where you might want to write a project in two languages with FFI, it's usually better not to and just use one language even if that language isn't optimal. In this case, just write the whole thing in C++ (or Rust).
  There are some exceptions but generally FFI is a huge cost and Python doesn't bring enough to the table to justify its use if you are already using C++.
- pavon 16 hours ago
  One use of Python as a "glue language" I've seen that actually avoids the performance problems of those bindings is GNU Radio. That is because its architecture basically uses python as a config language that sets up the computation flow-graph at startup, and then the rest of runtime is entirely in compiled code (generally C++). Obviously that approach isn't applicable to all problems, but it really shaped my opinion of when/how a slow glue language is acceptable.
  slt2021 14 hours ago
  This. Use python only for control flow, and offload data flow to a library that is better suited for this: written in C, uses packed structs, cache friendly, etc.
  if you want multiprocessing, use the multiprocessing library, scatter and gather type computation, etc
- didip 17 hours ago
  These days it's "rewrite in Rust".
  Typically Python is just the entry and exit point (with a little bit of massaging), right?
  And then the overwhelming majority of the business logic is done in Rust/C++/Fortran, no?
  01HNNWZ0MV43FF 16 hours ago
  With computer vision you end up wanting to read and write to huge buffers that aren't practical to serialize and are difficult to share. And even allocating and freeing multi-megabyte framebuffers at 60 FPS can put a little strain on the allocator, so you want to reuse them, which means you have to think about memory safety.
  That is probably why his demo was Sobel edge detection with Numpy. Sobel can run fast enough at standard resolution on a CPU, but once that huge buffer needs to be read or written outside of your fast language, things will get tricky.
  This also comes up in Tauri, since you have to bridge between Rust and JS. I'm not sure if Electron apps have the same problem or not.
  jononor 12 hours ago
  The "numpy" Sobel code is not that good, unfortunately - all the iteration is done in Python, so there is not much benefit from involving numpy. If one would use say scipy.convolve2d on a numpy.array, it would be much faster.
  aeroevan 15 hours ago
  In the data science/engineering world apache arrow is the bridge between languages, so you don't actually need to serialize into language specific structures which is really nice
- aragilar 21 hours ago
  Isn't this just a specific example of the general rule of pulling out repeated use of the same operation in a loop? I'm not sure calls out to C are specifically slow in CPython (given many operations are really just calling C underneath).
  Twirrim 19 hours ago
  The serialisation cost of translating data representations between python and C (or whatever compiled language you're using) is notable. Instead of having the compiled code sit in the centre of a hot loop, it's significantly better to have the loop in the compiled code and call it once
  https://pythonspeed.com/articles/python-extension-performanc...
  kragen 16 hours ago
  You don't have to serialize data or translate data representations between CPython and C. That article is wrong. What's slow in their example is storing data (such as integers) the way CPython likes to store it, not translating that form to a form easily manipulated in C, such as a native integer in a register. That's just a single MOV instruction, once you get past all the type checking and reference counting.
  You can avoid that problem to some extent by implementing your own data container as part of your C extension (the article's solution #1); frobbing that from a Python loop can still be significantly faster than allocating and deallocating boxed integers all the time, with dynamic dispatch and reference counting. But, yes, to really get reasonable performance you want to not be running bytecodes in the Python interpreter loop at all (the article's solution #2).
  But that's not because of serialization or other kinds of data format translation.
  morkalork 19 hours ago
  The overhead of copying and moving data around in Python is frustrating. When you are CPU bound on a task, you can't use threads (which do have shared memory) because of the GIL, so you end up using whole processes and then waste a bunch of cycles communicating stuff back and forth. And yes, you can create shared memory buffers between Python processes but that is nowhere near as smooth as say two Java threads working off a shared data structure that's got synchronized sprinkled on it.
  KeplerBoy 20 hours ago
  The key is to move the entire loop to a compiled language instead of just the inner operation.
  dgan 20 hours ago
  they are specifically slow. there was a project which measured FFI cost in different languages, and python is awfully bad
- ActorNightly 18 hours ago
  >how inefficient the boundary crossing is
  For 99.99% of the programs that people write, the modern M.2 NVME hard drives are plenty fast, and thats the laziest way to load data into a C extension or process.
  Then there is unix pipes which are sufficiently fast.
  Then there is shared memory, which basically involves no loading.
  As with Python, all depends on the setup.
  zahlman 17 hours ago
  The problem isn't loading the data, but marshalling it (i.e, transforming it into a data structure that makes sense for the faster language to operate on, and back again). Or if you don't transform (or the data is special-cased enough that no transformation makes sense) then the available optimizations become much more limited.
  jononor 13 hours ago
  There are several datastructures for numeric data that do not need marshalling, and are suitable for very efficient interoperetion between Python and C/C++/Rust etc. Examples include array.array (in standard library), numpy.array, and PyArrow.
  ActorNightly 16 hours ago
  Thats all just design. Nothing having to do with particular language.
nu11ptr 20 hours ago
The primary focus here is good and something I hadn't considered: python memory being so dynamic leads to poor cache locality. Makes sense. I will leave that to others to dig into.
That aside, I was expecting some level of a pedantic argument, and wasn't disappointed by this one:
"A compiler for C/C++/Rust could turn that kind of expression into three operations: load the value of x, multiply it by two, and then store the result. In Python, however, there is a long list of operations that have to be performed, starting with finding the type of p, calling its __getattribute__() method, through unboxing p.x and 2, to finally boxing the result, which requires memory allocation. None of that is dependent on whether Python is interpreted or not, those steps are required based on the language semantics."
The problem with this argument is the user isn't trying to do these things, they are trying to do multiplication, so the fact that the lang. has to do all things things in the end DOES mean it is slow. Why? Because if these things weren't done, the end result could still be achieved. They are pure overhead, for no value in this situation. Iow, if Python had a sufficiently intelligent compiler/JIT, these things could be optimized away (in this use case, but certainly not all). The argument is akin to: "Python isn't slow, it is just doing a lot of work". That might be true, but you can't leave it there. You have to ask if this work has value, and in this case, it does not.
By the same argument, someone could say that any interpreted language that is highly optimized is "fast" because the interpreter itself is optimized. But again, this is the wrong way to think about this. You always have to start by asking "What is the user trying to do? And (in comparison to what is considered a fast language) is it fast to compute?". If the answer is "no", then the language isn't fast, even if it meets the expected objectives. Playing games with things like this is why users get confused on "fast" vs "slow" languages. Slow isn't inherently "bad", but call a spade a spade. In this case, I would say the proper way to talk about this is to say: "It has a fast interpreter". The last word tells any developer with sufficient experience what they need to know (since they understand statically compiled/JIT and interpreted languages are in different speed classes and shouldn't be directly compared for execution speed).
- dragonwriter 9 minutes ago
  > They are pure overhead, for no value in this situation. Iow, if Python had a sufficiently intelligent compiler/JIT, these things could be optimized away (in this use case, but certainly not all).
  Hence, Numba.
- ActivePattern 19 hours ago
  A “sufficiently smart compiler” can’t legally skip Python’s semantics.
  In Python, p.x * 2 means dynamic lookup, possible descriptors, big-int overflow checks, etc. A compiler can drop that only if it proves they don’t matter or speculates and adds guards—which is still overhead. That’s why Python is slower on scalar hot loops: not because it’s interpreted, but because its dynamic contract must be honored.
  pjmlp 19 hours ago
  In Smalltalk, p x * 2 has that flow that as well, and even worse, lets assume the value returned by p x message selector, does not understand the * message, thus it will break into the debugger, then the developer will add the * message to the object via the code browser, hit save, and exit the debugger with redo, thus ending the execution with success.
  Somehow Smalltalk JIT compilers handle it without major issues.
  ActivePattern 19 hours ago
  Smalltalk JITs make p x * 2 fast by speculating on types and inserting guards, not by skipping semantics. Python JITs do the same (e.g. PyPy), but Python’s dynamic features (like __getattribute__, unbounded ints, C-API hooks) make that harder and costlier to optimize away.
  You get real speed in Python by narrowing the semantics (e.g. via NumPy, Numba, or Cython) not by hoping the compiler outsmarts the language.
  afiori 8 hours ago
  Python'a JIT could do the same, it could check if __getattribute__() is the default implementation and replace its call with p x directly. This would work only for classes that have not been modified at runtime and that do not implement a custom __getattribute__
  pjmlp 18 hours ago
  People keep forgetting about image based semantics development, debugger, meta-classes, messages like becomes:,...
  There is to say everything dynamic that can be used as Python excuse, Smalltalk and Self, have it, and double up.
  tekknolagi 17 hours ago
  If I may toot my own horn: https://bernsteinbear.com/blog/typed-python/
  cma 18 hours ago
  edit and continue is available on lots of JIT-runtime languages
  nu11ptr 19 hours ago
  First, we need to add the word 'only': "not ONLY because it’s interpreted, but because its dynamic contract must be honored." Interpreted languages are slow by design. This isn't bad, it just is a fact.
  Second, at most this describes WHY it is slow, not that it isn't, which is my point. Python is slow. Very slow (esp. for computation heavy workloads). And that is okay, because it does what it needs to do.
- andylei 20 hours ago
  The previous paragraph is
  > Another "myth" is that Python is slow because it is interpreted; again, there is some truth to that, but interpretation is only a small part of what makes Python slow.
  He concedes its slow, he's just saying it's not related to how interpreted it is.
  nu11ptr 20 hours ago
  I would argue this isn't true. It is a big part of what makes it slow. The fastest interpreted languages are one to two orders of magnitude slower than for example C/C++/Rust. If your language does math 20-100 times slower than C, it isn't fast from a user perspective. Full stop. It might, however, have a "fast interpreter". Remember, the user doesn't care if it is a fast for an interpreted language, they are just trying to obtain their objective (aka do math as fast as possible). They can get cache locality perfect, and Python would still be very slow (from a math/computation perspective).
  nyrikki 18 hours ago
  The 200-100 times slower is a bit cherry picked, but use case does matter.
  Typically from a user perspective, the initial starting time is either manageable or imperceptible in the cases of long running services, although there are other costs.
  If you look at examples that make the above claim, they are almost always tiny toy programs where the cost of producing byte/machine code isn't easily amortized.
  This quote from the post is an oversimplification too:
  > But the program will then run into Amdahl's law, which says that the improvement for optimizing one part of the code is limited by the time spent in the now-optimized code
  I am a huge fan of Amdahl's law, but also realize it is pessimistic and most realistic with parallelization.
  It runs into serious issues when you are multiprocessing vs parallel processing due to preemption, etc .
  Yes you still have the costs of abstractions etc...but in today's world, zero pages on AMD, 16k pages and a large number of mapped registers on arm, barrel shifters etc... make that much more complicated especially with C being forced into trampolines etc...
  If you actually trace the CPU operations, the actual operations for 'math' are very similar.
  That said modern compilers are a true wonder.
  Interpreted language are often all that is necessary and sufficient. Especially when you have Internet, database and other aspects of the system that also restrict the benefits of the speedups due to...Amdahl's law.
  nu11ptr 17 hours ago
  I'm not so much cherry picking as I am specifically talking compute (not I/O,stdlib) performance. However, when measured for general purpose tasks, that would involve compute and things like I/O, stdlib performance, etc., Python on the whole is typically NOT 20-100x times slower for a given task. Its I/O layer is written in C like many other languages, so the moment you are waiting on I/O you have leveled the playing field. Likewise, Python has a very fast dict implementation in C, so when doing heavy map work, you also amortorize the time between the (brutally slow) compute and the very fast maps.
  In summary, it depends. I am talking about compute performance, not I/O or general purpose task benchmarking. Yes, if you have a mix of compute and I/O (which admittedly is a typical use case), it isn't going to be 20-100x slower, but more likely "only" 3-20x slower. If it is nearly 100% I/O bound, it might not be any slower at all (or even faster if properly buffered). If you are doing number crunching (w/o a C lib like NumPy), your program will likely be 40-100x slower than doing it in C, and many of these aren't toy programs.
  nyrikki 16 hours ago
  Even with compute performance it is probably closer than you expect.
  Python isn't evaluated line-by-line, even in micropython, which is about the only common implementation that doesn't work in the same way.
  Cython VM will produce an AST of opcodes, and binary operations just end up popping off a stack, or you can hit like pypy.
  How efficiently you can keep the pipeline fed is more critical than computation costs.
  int a = 5; int b = 10; int sum = a + b;
  Is compiled to:
  MOV EAX, 5 MOV EBX, 10 ADD EAX, EBX MOV [sum_variable]
  In the PVM binary operations remove the top of the stack (TOS) and the second top-most stack item (TOS1) from the stack. They perform the operation, and put the result back on the stack.
  That pop, pop isn't much more expensive on modern CPUs and some C compilers will use a stack depending on many factors. And even in C you have to use structs of arrays etc... depending on the use case. Stalled pipelines and fetching due to the costs is the huge difference.
  It is the setup costs, GC, GIL etc... that makes python slower in many cases.
  While I am not suggesting it is as slow as python, Java is also byte code, and often it's assumptions and design decisions are even better or at least nearly equal to C in the general case unless you highly optimize.
  But the actual equivalent computations are almost identical, optimizations that the compilers make differ.
  andylei 14 hours ago
  i'll answer your argument with the initial paragraph you quoted:
  > A compiler for C/C++/Rust could turn that kind of expression into three operations: load the value of x, multiply it by two, and then store the result. In Python, however, there is a long list of operations that have to be performed, starting with finding the type of p, calling its __getattribute__() method, through unboxing p.x and 2, to finally boxing the result, which requires memory allocation. None of that is dependent on whether Python is interpreted or not, those steps are required based on the language semantics.
  immibis 14 hours ago
  Typically a dynamic language JIT handles this by observing what actual types the operation acts on, then hardcoding fast paths for the one type that's actually used (in most cases) or a few different types. When the type is different each time, it has to actually do the lookup each time - but that's very rare.
  i.e.
  if(a->type != int_type || b->type != int_type) abort_to_interpreter();
  result = ((intval*)a)->val + ((intval*)b)->val;
  The CPU does have to execute both lines, but it does them in parallel so it's not as bad as you'd expect. Unless you abort to the interpreter, of course.
- rstuart4133 12 hours ago
  > The problem with this argument is the user isn't trying to do these things,
  I'd argue differently. I'd say the the problem isn't that the user is doing those things, it's that the language doesn't know what he's trying to do.
  Python's explicit goal was always ergonomics, and it was always ergonomics over speed or annoying compile time error messages. "Just run the code as written dammit" was always the goal. I remember when the never class model was introduced, necessitating the introduction of __get_attribute__. My first reaction as a C programmer was "gee you took a speed hit there". A later reaction was to use it to twist the new system into something it's inventors possibly never thought of. It was a LR(1) parser, that let you write the grammars as regular Python statements.
  While they may not have thought abusing the language in that particular way, I'm sure the explicit goal was to create a framework that any idea to be expressed with minimal code. Others also used to hooks they provided into the way the language builds to create things like pydantic and spyne. Spyne for example lets you express the on-the-wire serialisation formats used by RPC as Python class declarations, and then compile them into JSON, xml, SOAP of whatever. Sqlalchamey lets you express SQL using Python syntax, although in a more straightforward way.
  All of them are very clever in how they twist the language. Inside those frameworks, "a = b + c" does not mean "add b to c, and place the result in a". In the LR(1) parser for example it means "there is a production called 'a', that is a 'b' followed by a 'c'". 'a' in that formulation holds references to 'b' and 'c'. Later the LR(1) parser will consume that, compiling it into something very different. The result is a long way from two's compliment addition.
  It is possible to use a power powerful type systems in a similar way. For example I've seen FPGA designs expressed in Scalar. However, because Scalar's type system insists on knowing what is going on at compile time, Scalar had a fair idea of what the programmer is building. The compile result isn't going to be much slower than any other code. Python achieved the same flexibility by abandoning type checking at compile time almost entirely, pushing it all to run time. Thus the compiler has no idea of what going to executed in the end (the + operation in the LR parser only gets executed once for example), which is what I said above "it's that the language doesn't know what the programmer is trying to do".
  You argue that since it's an interpreted language, it's the interpreters jobs to figure out what the programmer is trying to do at run time. Surely it can figure out that "a = b + c" really is adding two 32 bit integers that won't overflow. That's true, but that creates a low of work to do at run time. Which is a round about way of saying the same thing as the talk: electing to do it at run time means the language chose flexibility over speed.
  You can't always fix this in an interpreter. Javascript has some of the best interpreters around, and they do make the happy path run quickly. But those interpreters come with caveats, usually of the form "if you muck around with the internals of classes, by say replacing function definitions at run time, we abandon all attempts to JIT it". People don't typically do such things in Javascript, but as it happens, Python's design with it's meta classes, dynamic types created with "type(...)", and "__new__(..)" almost could be said encourage that coding style. That is, again, a language design choice, and it's one that favours flexibility over speed.
Mithriil 17 hours ago
> His "sad truth" conclusion is that "Python cannot be super-fast" without breaking compatibility.
A decent case of Python 4.0?
> So, maybe, "a JIT compiler can solve all of your problems"; they can go a long way toward making Python, or any dynamic language, faster, Cuni said. But that leads to "a more subtle problem". He put up a slide with a trilemma triangle: a dynamic language, speed, or a simple implementation. You can have two of those, but not all three.
This trilemma keeps getting me back towards Julia. It's less simple than Python, but much faster (mitigated by pre-compilation time), and almost as dynamic. I'm glad this language didn't die.
- zahlman 17 hours ago
  > A decent case of Python 4.0?
  I think "Python 4.0" is going to have to be effectively a new language by a different team that simply happens to bear strong syntactic similarities. (And at least part of why that isn't already happening is that everyone keeps getting scared off by the scale of the task.)
  Thanks for the reminder that I never got around to checking out Julia.
  olejorgenb 15 hours ago
  Isn't that kinda what Mojo is?
  wraptile 5 hours ago
  closed sourced proprietary language will never be able to succeed Python here.
  zahlman 15 hours ago
  I haven't tried it, but that matches my understanding, yeah.
  Personally I'd be more interested in designing from scratch.
- rybosome 14 hours ago
  Yeah, this is a case of "horses for courses", as you suggest.
  I love Python. It's amazing with uv; I just implemented a simple CLI this morning for analyzing data with inline dependencies that's absolutely perfect for what I need and is extremely easy to write, run, and tweak.
  Based on previous experience, I would not suggest Python should be used for an API server where performance - latency, throughput - and scalability of requests is a concern. There's lots of other great tools for that. And if you need to write an API server and it's ok not to have super high performance, then yeah Python is great for that, too.
  But it's great for what it is. If they do make a Python 4.0 with some breaking changes, I hope they keep the highly interpreted nature such that something like Pydantic continues to work.
- Alex3917 17 hours ago
  > A decent case of Python 4.0?
  I definitely agree with this eventually, but for now why not just let developers set `dynamic=False` on objects and make it opt in? This is how Google handles breaking Angular upgrades, and in practice it works great because people have multiple years to prepare for any breaking changes.
- rirze 16 hours ago
  If Julia fixes it package manager problems (does it still take a while to load imports?), I think it could become popular.
  casparvitch 3 hours ago
  I think you're referring to the TTFP (time to first plot) issue (the package manager is top notch). TTFP has been drastically improved with a bunch of optimisations, and then you can pre-compile your project to keep it fast e.g. between running your script with different params.
ehsantn 12 hours ago
The article highlights important challenges regarding Python performance optimization, particularly due to its highly dynamic nature. However, a practical solution involves viewing Python fundamentally as a Domain Specific Language (DSL) framework, rather than purely as a general-purpose interpreted language. DSLs can effectively be compiled into highly efficient machine code.
Examples such as Numba JIT for numerical computation, Bodo JIT/dataframes for data processing, and PyTorch for deep learning demonstrate this clearly. Python’s flexible syntax enables creating complex objects and their operators such as array and dataframe operations, which these compilers efficiently transform into code approaching C++-level performance. DSL operator implementations can also leverage lower-level languages such as C++ or Rust when necessary. Another important aspect not addressed in the article is parallelism, which DSL compilers typically handle quite effectively.
Given that data science and AI are major use cases for Python, compilers like Numba, Bodo, and PyTorch illustrate how many performance-critical scenarios can already be effectively addressed. Investing further in DSL compilers presents a practical pathway to enhancing Python’s performance and scalability across numerous domains, without compromising developer usability and productivity.
Disclaimer: I have previously worked on Numba and Bodo JIT.
- echoangle 12 hours ago
  Was this comment written by an LLM?
Ulti 21 hours ago
Feel like Mojo is worth a shoutout in this context https://www.modular.com/mojo Solves the issue of having a superset of Python in syntax where "fn" instead of "def" functions are assumed static typed and compilable with Numba style optimisations.
- _aavaa_ 21 hours ago
  Mojo NOT being open-source is a complete non-starter.
  Ulti 20 hours ago
  More of a question of /will/ Mojo eventually be entirely open source, chunks of it already are. The intent from Modular is eventually it will be, just not everything all at once and not whilst they're internally doing loads of dev for their own commercial entity. Which seems fair enough to me. Importantly they have open sourced lots of the stdlib which is probably what anyone external would contribute to or want to change anyway? https://www.modular.com/blog/the-next-big-step-in-mojo-open-...
  _aavaa_ 20 hours ago
  When it has become open source I will consider building up expertise and a product on it. Until it has happened there are no guarantees that it will.
  Ulti 20 hours ago
  Well the "expertise" is mostly just Python thats sort of the value prop. But yeah building an actual AI product ontop I'd be more worried about the early stage nature of Modular rather than the implementation is closed source.
  _aavaa_ 20 hours ago
  Sure, that’s the value prop of numba too. But reality is different.
  alankarmisra 21 hours ago
  Genuinely curious; while I understand why we would want a language to be open-source (there's plenty of good reasons), do you have anecdotes where the open-sourceness helped you solve a problem?
  yupyupyups 21 hours ago
  Not the OP, but I have needed to patch Qt due to bugs that couldn't be easily worked around.
  I have also been frustrated while trying to interoperate with expensive proprietary software because documentation was lacking, and the source code was unavailable.
  In one instance, a proprietary software had the source code "exposed", which helped me work around its bugs and use it properly (also poorly documented).
  There are of course other advantages of having that transparancy, like being able to independently audit the code for vulnerabilities or unacceptable "features", and fix those.
  Open source is oftentimes a prerequisite for us to be able to control our software.
  _aavaa_ 20 hours ago
  It has helped prevent problems. I am not worried about a python suddenly adding a clause stating that I can’t release a ML framework…
  Philpax 17 hours ago
  In the earlier days of rustc, it was handy to be able to look at the context for a specific compiler error (this is before the error reporting it is now known for). Using that, I was able to diagnose what was wrong with my code and adjust it accordingly.
nromiun a day ago
I really hope PyPy gets more popular so that I don't have to argue Python is pretty fast for the nth time.
Even if you have to stick to CPython, Numba, Pythran etc, can give you amazing performance for minimal code changes.
mrkeen a day ago
I didn't read with 100% focus, but this lwn account of the talk seemed to confirm those myths instead of debunking.
- postexitus 21 hours ago
  A more careful reading of the article is required.
  The first myth is "Python is not slow" - it is debunked, it is slow.
  The second myth is ""it's just a glue language / you just need to rewrite the hot parts in C/C++" - it is debunked, just rewriting stuff in C/Rust does not help.
  The third myth is " Python is slow because it is interpreted" - it is debunked, it is not slow only because it is interpreted.
  akkad33 13 hours ago
  > The first myth is "Python is not slow" - it is debunked, it is slow
  This is strange. Most people in programming community know python is slow. If it has any reputation, it's that it is quite slow
  IshKebab 14 hours ago
  In fairness I wouldn't really call those "myths", just bad defences of Python's slowness. I don't think the people saying them really believe it - if it came to life or death. They just really like Python and are trying to avoid the cognitive dissonance of liking a really slow language.
  Like, I wouldn't say it's a "myth" that Linux is easy to use.
  ActorNightly 17 hours ago
  >just rewriting stuff in C/Rust does not help.
  Except it does. The key is to figure out which part you actually need to go fast, and write it in C. If most of your use case is dominated by network latency.
  Overall, people seem to miss the point of Python. The best way to develop software is "make it work, make it good, make it fast" - the first part gets you to an end to end prototype that gives you a testable environment, the second part establishes the robustness and consistency, and the third part lets you focus on optimizing the performance with a robust framework that lets you ensure that your changes are not breaking anything.
  Pythons focus is on the first part. The idea is that you spend less time making it work. Once you have it working, then its much easier to do the second part (adding tests, type checking, whatever else), and then the third part. Now with LLMs, its actually pretty straightforward to take a python file and translate it to .c/.h files, especially with agents that do additional "thinking" loops.
  However, even given all of that, in practice you often don't need to move away from Python. For example, I have a project that datamines Strava Heatmaps (i.e I download png tiles for entire US). The amount of time that it took me to write it in Python in addition to running it (which takes about a day) is much shorter than it would have taken me to write it in C++/Rust and then run it with speedup in processing.
  mrkeen 18 hours ago
  Thanks! As a Python outsider, I was primed for a Python insider to be trying to change my views, not confirm them, and I did indeed misread.
  zahlman 17 hours ago
  My impression is that GvR conceded a long time ago that Python is slow, and doesn't particularly care (and considers it trolling to keep bringing it up). The point is that in the real world this doesn't matter a lot of the time, at least as long as you aren't making big-O mistakes — and easier-to-use languages make it easier to avoid those mistakes.
  For that matter, I recently saw a talk in the Python world that was about convincing people to let their computer do more work locally in general, because computers really are just that fast now.
- diegocg a day ago
  Yep, for me it confirms all the reasons why I think python is slow and not a good language for anything that goes beyond a script. I work with it everyday, and I have learned that I can't even trust tooling such as mypy because it's full of corner cases - turns out that not having a clear type design in a language is not something that can be fundamentally fixed by external tools. Tests are the only thing that can make me trust code written in this language
  jdhwosnhw 19 hours ago
  > Yep, for me it confirms all the reasons why I think python is slow
  Yes, that is literally the explicit point of the talk. The first myth of the article was “python is not slow“
quantumspandex a day ago
So we are paying 99% of the performance just for the 1% of cases where it's nice to code in.
Why do people think it's a good trade-off?
- Krssst 20 hours ago
  Performance is worthless if the code isn't correct. It's easier to write correct code reasonably quickly in Python in simple cases (integers don't overflow like in C, don't wrap around like in C#, no absurd implicit conversions like in other scripting languages).
  Also you don't need code to be fast a lot of the time. If you just need some number crunching that is occasionally run by a human, taking a whole second is fine. Pretty good replacement for shell scripting too.
  pavon 16 hours ago
  But many of the language decisions that make Python so slow don't make code easier to write correctly. Like monkey patching; it is very powerful and can be useful, but it can also create huge maintainability issues, and its existence as a feature hinders making the code faster.
  Spivak 18 hours ago
  I mean you can see it with your own experience, folks will post a 50 line snippet of ordinary C code in an blog post which looks like you're reading a long dead ancient language littered with macros and then be like "this is a lot to grok here's the equivalent code in Python / Ruby" and it's 3 lines and completely obvious.
  Folks on HN are so weird when it comes to why these languages exist and why people keep writing in them. For all their faults and dynamism and GC and lack of static typing in the real world with real devs you get code that is more correct written faster when you use a higher level language. It's Go's raison d'etre.
- nromiun a day ago
  Because it's nice to code in. Not everything needs to scale or be fast.
  Personally I think it is more crazy that you would optimize 99% of the time just to need it for 1% of the time.
  nomel 12 hours ago
  That’s why Python is the second best language for everything.
  The amount of complexity you can code up in a short time, that most everyone can contribute to, is incredible.
  BlackFly 20 hours ago
  It isn't an either or choice. The people interested in optimizing the performance are typically different people than those interested in implementing syntactic sugar. It is certainly true that growing the overall codebase risks introducing tensions for some feature sets but that is just a consideration you take when diligently adding to the language.
- pjmlp a day ago
  Because many never used Smalltalk, Common Lisp, Self, Dylan,... so they think CPython is the only way there is, plus they already have their computer resources wasted by tons of Electron apps anyway, that they hardly question CPython's performance, or lack thereof.
  wiseowise 19 hours ago
  Has it ever crossed your mind that they just like Python?
  laichzeit0 3 hours ago
  I use Python 90% of my day and I can't say I like it or hate it or care about it at all. I use it because it has all the libraries I need, and LLMs seem to know it pretty well too. It's a great language for people that don't actually care about programming languages and just want to get stuff done.
  pjmlp 19 hours ago
  And slow code, yes it has cross my mind.
  Usually they also call Python to libraries that are 95% C code.
  Fraterkes 18 hours ago
  The hypocrisy gets even worse: the C code then gets compiled to assembly!
  pjmlp 3 hours ago
  Except C developers actually acknowledge that, they don't call libraries written in Assembly, C code.
  nromiun 3 hours ago
  Sure they do.
  https://github.com/OpenMathLib/OpenBLAS https://github.com/FFmpeg/FFmpeg
  Plenty of assembly in those projects but no mention of it in the README. Most C projects don't acknowledge the assembly they use.
- jonathrg a day ago
  It's much more than 1%, it is what enables commonly used libraries like pytest and Pydantic.
- bluGill 20 hours ago
  Most of the time you are waiting on a human or at least something other than the cpu. Most of the time more time is spent by the programmer writing the code than all the users combined waiting for the program to run.
  between those two, most often performance is just fine to trade off.
- lmm a day ago
  Because computers are more than 100x faster than they were when I started programming, and they were already fast enough back then? (And meanwhile my coding ability isn't any better, if anything it's worse)
- Hilift a day ago
  It isn't. There are many things Python isn't up to the task. However, it has been around forever, and some influential niche verticals like cyber security Python was as or more useful than native tooling, and works on multiple platforms.
- Mawr 18 hours ago
  I don't think anyone aware of this thinks it's a good tradeoff.
  The more interesting question is why the tradeoff was made in the first place.
  The answer is, it's relatively easy for us to see and understand the impact of these design decisions because we've been able to see their outcomes over the last 20+ years of Python. Hindsight is 20/20.
  Remember that Python was released in 1991, before even Java. What we knew about programming back then vs what we know now is very different.
  Oh and also, these tradeoffs are very hard to make in general. A design decision that you may think is irrelevant at the time may in fact end up being crucial to performance later on, but by that point the design is set in stone due to backwards compatibility.
- dgfitz a day ago
  I can say with certainty I’ve never paid a penny. Have you?
ic_fly2 21 hours ago
It’s a good article on speed.
But honestly the thing that makes any of my programs slow is network calls. And there a nice async setup goes a long way. And then k8 for the scaling.
- nicolaslem 20 hours ago
  This. I maintain an ecommerce platform written in Python. Even with Python being slow, less than 30% of our request time is spent executing code, the rest is talking to stuff over the network.
- stackskipton 19 hours ago
  SRE here, that horizontal scaling with Python has impacts as it’s more connections to database and so forth so you are impacting things even if you don’t see it.
  wussboy 6 hours ago
  That’s why we use Firestore
  ic_fly2 17 hours ago
  Meh, even with basic async I’ve been able to overload azure’s premium ampq offering memory capacity.
  But yes managing db connections is a pain. But I don’t think it’s any better in Java (my only other reference at this scale)
- gen220 16 hours ago
  I think articles like this cast too wide a net when they say "performance" or "<language> is fast/slow".
  A bunch of SREs discussing which languages/servers/runtimes are fast/slow/efficient in comparable production setups would give more practical guidance.
  If you're building an http daemon in a traditional three-tiered app (like a large % of people on HN), IME, Python has quietly become a great language in that space, compared to its peers, over the last 8 years.
NeutralForest a day ago
Cool article, I think a lot of those issues are not Python specific so it's a good overview of whatever others can learn from a now 30 years old language! I think we'll probably go down the JS/TS route where another compiler (Pypy or mypyc or something else) will work alongside CPython but I don't see Python4 happening.
- tweakimp a day ago
  I thought we would never see the GIL go away and yet, here we are. Never say never. Maybe Python4 is Python with another compiler.
  pjmlp 21 hours ago
  It required Facebook and Microsoft to change the point of view on it, and now the Microsoft team is no more.
  So lets see what remains from CPython performance efforts.
- ngrilly 12 hours ago
  I’m not sure I understand the reference to JS/TS: TS is only a type checker and has zero effect on runtime performance.
adsharma 19 hours ago
The most interesting part of this article is the link to SPy. Attempts to find a subset of python that could be made performant.
- ajross 18 hours ago
  Honestly that seems Sisyphean to me. The market doesn't want a "performant subset". The market is very well served by performant languages. The market wants Python's expressivity. The market wants duck typing and runtime-inspectable type hierarchies and mutable syntax and decorators. It loves it. It's why Python is successful.
  My feeling is that numba has exactly the right tactic here. Don't try to subset python from on high, give developers the tools[1] so that they can limit themselves to the fast subset, for the code they actually want. And let them make the call.
  (The one thing numba completely fails on though is that it insists on using its own 150+MB build of LLVM, so it's not nearly as cleanly deployable as you'd hope. Come on folks, if you use the system libc you should be prepared to use the system toolchain.)
  [1] Simple ones, even. I mean, to first approximation you just put "@jit" on the stuff you want fast and make sure it sticks to a single numeric type and numpy arrays instead of python data structures, and you're done.
  adsharma 16 hours ago
  My cursory reading is that SPy is generous in what it accepts.
  The subset I've been working with is even narrower. Given my stance on pattern matching, it may not even be a subset.
  https://github.com/py2many/py2many/blob/main/doc/langspec.md
  zozbot234 18 hours ago
  > The market wants duck typing and runtime-inspectable type hierarchies and mutable syntax and decorators. It loves it.
  These features have one thing in common: they're only useful for prototype-quality throwaway code, if at all. Once your needs shift to an increased focus on production use and maintainability, they become serious warts. It's not just about performance (though it's obviously a factor too), there's real reasons why most languages don't do this.
  ajross 18 hours ago
  > These features have one thing in common: they're only useful for prototype-quality throwaway code, if at all.
  As a matter of practice: the python community disagrees strongly. And the python community ate the world.
  It's fine to have an opinion, but you're not going to change python.
  Philpax 17 hours ago
  The existence of several type-checkers and Astral's largely-successful efforts to build tooling that pulls Python out of its muck seems to suggest otherwise.
  Better things are possible, and I'm hoping that higher average quality of Python code is one of those things.
  adsharma 16 hours ago
  That assumes python is one monolithic thing and everyone agrees what it is.
  True, the view you express here has strong support in the community and possibly in the steering committee.
  But there are differing ideas on what python is and why it's successful.
  ajross 15 hours ago
  > That assumes python is one monolithic thing and everyone agrees what it is.
  It's exactly the opposite! I'm saying that python is BIG AND DIVERSE and that attempts like SPy to invent a new (monolithic!) subset language that everyone should use instead are doomed, because it won't meet the needs of all the yahoos out there doing weird stuff the SPy authors didn't think was important.
  It's fine to have "differing ideas on what python is", but if those ideas don't match those of all of the community, and not just what you think are the good parts, it's not really about what "python" is, is it?
teo_zero 20 hours ago
I don't know Python so well as to propose any meaningful contribution, but it seems to me that most issues would be mitigated by a sort of "final" statement or qualifier, that prohibits any further changes to the underlying data structure, thus enabling all the nice optimizations, tricks and shortcuts that compilers and interpreters can't afford when data is allowed to change shape under their feet.
- Fraterkes 18 hours ago
  I assume people dislike those kinds of solutions because the extreme dynamism is used pretty rarely in a lot of meat and potatoes python scripst. So a lot of “regular” python scripts would have to just plaster “final” everywhere to make it as fast as it can be.
  At that point youd maybe want to have some sort of broader way to signify which parts of your script are dynamic. But then, youd have a language that can be dynamic even in how dynamic it is…
taeric 17 hours ago
Is amusing to see the top comment on the site be about how Common LISP approached this. And hard not to agree with it.
I don't understand how we had super dynamic systems decades ago that were easier to optimize than people care to understand. Heaven help folks if they ever get a chance to use Mathematica.
hansvm 19 hours ago
In the "dynamic" section, it's much worse than the author outlines. You can't even assume that the constant named "10" will point to a value which behaves like you expect the number 10 to behave.
- zahlman 16 hours ago
  I guess you mean "N". 10 is a literal, not a name. The part "N cannot be assumed to be ten, because that could be changed elsewhere in the code" implies well enough that the change could be to a non-integer value. (For that matter, writing `N: int = 10` does nothing to fix that.)
  hansvm 16 hours ago
  No, I mean the literal. CPython is more flexible than it has any right to be, and you're free to edit the memory pointed to by the literal 10.
  zahlman 16 hours ago
  Care to show how you believe this can be achieved, from within Python?
  hansvm 16 hours ago
  import ctypes ten = 10 addr = id(ten) class PyLongObject(ctypes.Structure): _fields_ = [ ("ob_refcnt", ctypes.c_ssize_t), ("ob_type", ctypes.c_void_p), ("ob_size", ctypes.c_ssize_t), ("ob_digit", ctypes.c_uint32 * 1), ] long_obj = PyLongObject.from_address(addr) long_obj.ob_digit[0] = 3 assert 10 == 3 # using an auxiliary variable to prevent any inlining # done at the interpreter level before actually querying # the value of the literal `10` x = 3 assert 10 * x == 9 assert 10 + x == 6
  zahlman 15 hours ago
  Okay, but this is going out of one's way to view the runtime itself as a C program and connect to it with the FFI. For that matter, the notion that the result of `id` (https://docs.python.org/3/library/functions.html#id) could sensibly be passed to `from_address` is an implementation detail. This is one reason the language suffers from not having a formal specification: it's unclear exactly how much of this madness alternative implementations like PyPy are expected to validate against. But I think people would agree that poking at the runtime's own memory cannot be expected to give deterministic results, and thus the implementation should in fact consider itself free to assume that isn't happening. (After all, we could take that further; e.g. what if we had another process do the dirty work?)
  hansvm 15 hours ago
  Except, that sort of thing is important in places like gevent, pytest, and numba, and that functionality isn't easy to replace without a lot of additional language/stdlib work (no sane developer would reach for it if other APIs sufficed).
  The absurd example of overwriting the literal `10` is "obviously" bad, but your assertion that the interpreter should be able to assume nobody is overwriting its memory isn't borne out in practice.
  zahlman 15 hours ago
  > Except, that sort of thing is important in places like gevent, pytest, and numba
  What, mutating the data representation of built-in types documented to be immutable? For what purpose?
pu_pe 19 hours ago
Python and other high-level languages may actually decrease in popularity with better LLMs. If you are not the one programming it, might as well do it in a more performant language from the start.
- richard_todd 18 hours ago
  In my workflows I already tend to tell LLMs to write scripts in Go instead of python. The LLM doesn't care about the increased tediousness and verbosity that would drive me to Python, and the result will be much faster.
  Philpax 17 hours ago
  I saw a short post to this effect here: https://solmaz.io/typed-languages-are-better-suited-for-vibe...
abhijeetpbodas 21 hours ago
An earlier version of the talk is at https://www.youtube.com/watch?v=ir5ShHRi5lw (I could not find the EuroPython one).
- fragebogen 19 hours ago
  Here's a newer one https://www.youtube.com/watch?v=1uFMW0IcZuw
coldtea 10 hours ago
One big Python performance myth is the promise made several years ago that Python will get 5x faster in the next 5 years. So far the related changes have brought not even 2x gains.
actinium226 15 hours ago
A lot of the examples he gives, like the numpy/calc function, are easily converted to C/C++/Rust. The article sort of dismisses this at the start, and that's fine if we want to focus on the speed of Python itself, but it seems like both the only solution and the obvious solution to many of the problems specified.
1vuio0pswjnm7 13 hours ago
"He started by asking the audience to raise their hands if they thought "Python is slow or not fast enough";"
Wrong question
Maybe something like, "Python startup time is as fast as other interpreters"
Comparatively, Python (startup time) is slow(er)
lkirk 15 hours ago
For me, in my use of Python as a data analysis language, it's not python's speed that is an annoyance or pain point, it's the concurrency story. Julia's built in concurrency primatives are much more ergonomic in my opinion.
pabe 19 hours ago
The SPy demo is really good in showing the the difference in performance between Python and their derivative. Well done!
writebetterc a day ago
Good job on dispelling the myth of "compiler = fast". I hope SPython will be able to transfer some of its ideas to CPython with time.
- nromiun a day ago
  You would think Luajit would have convinced people by now. But most people still think you need a static language and an AOT compiler for performance.
  pjmlp 21 hours ago
  Also Smalltalk (Pharo, Squeak, Cincom, Dolphin), Common Lisp (SBCL, Clozure, Allegro, LispWorks), Self,....
  But yeah.
- mrkeen a day ago
  Where was this dispelled?
fumeux_fume 18 hours ago
Slow or fast ultimately matter in the context for which you need to use it. Perhaps these are only myths and fairly tales for an incredibly small subset of people who value execution speed as the highest priority, but choose to use Python for implementation.
ntoll 21 hours ago
Antonio is a star. He's also a very talented artist.
Redoubts 16 hours ago
Wonder if mojo has gotten anywhere further, since they’re trying to bring speed while not sacrificing most of the syntax
https://docs.modular.com/mojo/why-mojo/#a-member-of-the-pyth...
pjmlp a day ago
Basically, leave Python for OS and application scripting tasks, and as BASIC replacement for those learning to program.
- aragilar 20 hours ago
  And yet, most of what people end up doing ends up being effectively OS and application scripting. Most ML projects are really just setting up a pipeline and telling the computer to go and run it. Cloud deployments are "take this yaml and transform it some other yaml". In as much as I don't want to use Fortran to parse a yaml file, I don't really want to write an OS (or a database) in Python. Even something like django is mostly deferring off tasks to faster systems, and is really about being a DSL-as-programming-language while still being able to call out to other things (e.g. ML code).
  pjmlp 20 hours ago
  I would rather use Fortran actually, not all of us are stuck with Fortran 77.
  Ironically Fortran support is one of the reasons CUDA won over OpenCL.
  Having said that, plenty of programming languages with JIT/AOT toolchains have nice YAML parsers, I don't see the need to bother with Python for that.
meinersbur a day ago
Is it just me or does the talk actually confirm all its Python "myths and fairy tales"?
- xg15 18 hours ago
  Well, the fairy tale was that Python was fast, or "fast enough" or "fast if we could compile it and get rid of the GIL".
- daneel_w 21 hours ago
  It confirms that Python indeed has poor executional performance.
crabbone 19 hours ago
Again and again, the most important question is "why?" not "how?". Python isn't made to be fast. If you wanted a language that can go fast, you needed to build it into the language from the start: give developers tools to manage memory layout, give developers tools to manage execution flow, hint the compiler about situations that present potential for optimization, restrict dispatch and polymorphism, restrict semantics to fewer interpretations.
Python has none of that. It's a hyper-bloated language with extremely poor design choices all around. Many ways of doing the same thing, many ways of doing stupid things, no way of communicating programmer's intention to the compiler... So why even bother? Why not use a language that's designed by a sensible designer for this specific purpose?
The news about performance improvements in Python just sound to me like spending useful resources on useless goals. We aren't going forward by making Python slightly faster and slightly more bloated, we just make this bad language even harder to get rid of.
- Danmctree 18 hours ago
  The frustrating thing is that the math and AI support in the python ecosystem is arguably the best. These happen to also be topics where performance is critical and where you want things to be tight.
  c++ has great support too but often isn't usable in communities involving researchers and juniors because it's too hard for them. Startup costs are also much higher.
  Ans so you're often stuck with python.
  We desperately need good math/AI support in faster languages than python but which are easier than c++. c#? Java?
game_the0ry 20 hours ago
I know I am going to get some hate for this from the "Python-stans" but..."python" and "performance" should never be associated with each other, and same for any scripting/interpreted programming language. Especially if it has a global interpreter lock.
While performance (however you may mean that) is always a worthy goal, you may need to question your choice of language if you start hitting performance ceilings.
As the saying goes - "Use the right tool for the job." Use case should dictate tech choices, with few exceptions.
Ok, now that I have said my piece, now you can down vote me :)
- throwaway6041 19 hours ago
  > the "Python-stans"
  I think the term "Pythonistas" is more widely used
  > you may need to question your choice of language if you start hitting performance ceilings.
  Developers should also question if a "fast" language like Rust is really needed, if implementing a feature takes longer than it would in Python.
  I don't like bloat in general, but sometimes it can be worth spinning up a few extra instances to get to market faster. If Python lets you implement a feature a month earlier, the new sales may even cover the additional infrastructure costs.
  Once you reach a certain scale you may need to rewrite parts of your system anyway, because the assumptions you made are often wrong.
  game_the0ry 15 hours ago
  > Developers should also question if a "fast" language like Rust is really needed...
  Agreed.
- danielrico 19 hours ago
  That's used by some people as excuse to write the most inefficient code.
  Ok, you are not competing with c++, but also you shouldn't be redoing all the calculations because you haven't figured the data access pattern..
- ahoka 19 hours ago
  Have you read the fine article?
- ActorNightly 16 hours ago
  >"Use the right tool for the job."
  Python + C covers pretty much anything you really ever need to build, unless you are doing something with game engines that require the use of C++/C#. Rust is even more niche.
- wiseowise 19 hours ago
  Do you get off from bashing on languages or what?
2d8a875f-39a2-4 a day ago
Do you still need an add-on library to use more than one core?
- BlackFly a day ago
  Latest version officially supports full-threaded mode: https://docs.python.org/3.14/whatsnew/3.14.html#whatsnew314-...
- franktankbank a day ago
  Eh? Multiprocessing has existed since 2.X days.
tuna74 16 hours ago
In computing terms, saying something is "slow" is kind of pointless. Saying something is "effective" or "low latency" provides much more information.
robmccoll 21 hours ago
Python as a language will likely never have a "fast" implementation and still be Python. It is way too dynamic to be predictable from the code alone or even an execution stream in a way that allows you to simplify the actual code that will be executed at runtime either through AOC or JIT. The language is itself is also quite large in terms of syntax and built-in capability at this point which makes new feature-conplete implementations that don't make major trade offs quite challenging. Given how capable LLMs are at translating code, it seems like the perfect time to build a language with similar syntax, but better scoped behavior, stricter rules around typing, and tooling to make porting code and libraries automated and relatively painless. What would existing candidates be and why won't they work as a replacement?
- BlackFly 21 hours ago
  The secret as stated is the comlexity of a JIT. In practice, that dynamism just isn't used much in practice and in particular in optimization targets. The JIT analyses the code paths, sees that no writes to the target are possible so treats it as a constant.
  Java has similar levels of dynamism-with invokedynamic especially, but already with dynamic dispatch-in practice the JIT monomorphises to a single class even though by default classes default to non-final in Java and there may even be multiple implementations known to the JVM when it monomorphises. Such is the strength of the knowledge that a JIT has compared to a local compiler.
  pjmlp 19 hours ago
  Yes, Java syntax might look like C++, but the execution semantics are closer to Objective-C and Smalltalk, which is why adopting StrongTalk JIT for Java Hotspot was such a win.
- acmj 20 hours ago
  Pypy is 10x faster and is compatible with most cpython code. IMHO it was a big mistake not to adopt JIT during the 2-to-3 transition.
  cestith 18 hours ago
  That “most” is doing a big lift there. At some point you might consider that you’re actually programming in the language of Pypy and not pure Python. It’s effectively a dialect of the language like Turbo Pascal vs ISO Pascal or RPerl instead of Perl.
  cma 18 hours ago
  Most is more CPython code than python 3 was compatible with. But the port of the broken code was likely much easier than if it had moved to a JIT at the same time too.
  rirze 16 hours ago
  Isn't there an incoming JIT in 3.14?
- pjmlp 21 hours ago
  Self and Smalltalk enter the room.
  As for the language with similar syntax, do you want Nim, Mojo or Scala 3?