Hacker News new | comments | show | ask | jobs | submitlogin
Making Python Programs Blazingly Fast (martinheinz.dev)
223 points by gilad 5 months ago | hide | past | web | 215 comments | favorite

This is a terrible article. How this has gotten this kind of traction is unexplainable to me.

>This is the program I will be using for demonstration purposes

>Never comes up again for the rest of the post.


>Let's show you how to profile code.

>Also, here's a bunch of unprofiled suggestions with such precise and helpful comments as "slow" and "fast".


>Python haters always say, that one of reasons they don't want to use it, is that it's slow. Well, whether specific program - regardless of programming language used - is fast or slow is very much dependant on developer who wrote it and their skill and ability to write optimized and fast programs.

This is so ridiculous it's honestly laughable. It's such an obvious falsehood that the only explanations is either the person is truly this clueless, or else they are wilfully spewing bullshit. A bare metal language like C/C++ will of course let you do things faster than a heavy dynamic language like Python.

The mental gymnastics people do to justify not learning another tool. You know what they say, if all you have is a hammer, everything looks like a nail.

>First rule of optimization is to not do it.

If this person is representative, this explains why computers are hundreds of times faster but most software feels slower than in 1999.

>If this person is representative, this explains why computers are hundreds of times faster but most software feels slower than in 1999.

I think they are representative of a lot of developers. With the continued pace of chip development for the last 35 years, there hasn't been a continuing need to program for performance - in general, unless programmers did something very dumb or were dealing with large amounts of data, they could just write the way they wanted and let the hardware handle making their program fast.

Contrast this with early computer games - to get the best performance some games would actually boot your computer without an OS, sacrificing some convenience to get the last few percent of speed needed out of the system because it was the only way to out perform the competition.

One reason there's such opportunity in the present state of CPU technology (clock speeds have halted at about 4 Ghz in favor of more cores) is that few people remember how to program for performance, and those that do are handicapped by a bloated OS built for profit rather than value.

The real reason is in numbers - how many people that could program a PC starting from bootstrap in assembly are out there, and again how many programmers that can paste Java(EE|script)? code together to make something work are out there?

The world needs tons of software, and the vast majority of that software just needs to do some things right some of the time, and an average Java EE developer toiling away in a cubicle is good enough to deliver it.

Writing efficient software is a HARD problem, and it doesn't make economical sense to actually write efficient software, it makes sense to write just good enough software and throw hardware at it. For the price of a developer-year you can provision hundreds of machines to run that piece of code.

I was rubbing my hands together thinking "oh boy, I already know about speeding up with PyPy, I wonder what other tricks I don't know?" The article was sorely disappointing.

> So, let's prove some people wrong and let's see how we can improve performance of our Python programs and make them really fast!

> [sets the stage with a program that takes 11 seconds to run]

> This is more about general ideas and strategies, which when used, can make a huge impact on performance, in some cases up to 30% speed-up.

...that's it? up to 30% speed-up is "blazingly fast"? On a program that takes 11 seconds to run, that still takes 8 seconds... I was expecting speed-ups that took execution to milliseconds or microseconds.

Seeing as the interpreter takes about 60-70 ms to just start up and do nothing, I don't think we're going to be seeing any benchmarks in the microseconds.

    $ time python -c pass

    real    0m0.062s
    user    0m0.011s
    sys     0m0.042s
That was the quickest of three runs on one of my servers.

What's going on with your server? My 11-year-old Mac Pro runs python in 32 ms!

Kidding aside, even just running an empty C program that does nothing still takes 5ms, so not just a Python problem...

Title buzzword optimization, people upvote things without reading any further than the link name.

FWIW, I upvote things like this on occasion because I often learn more from comments/debate on articles I disagree with than from those that I solidly agree with already. Sadly HN's ability to have productive debate seems to be in decline, but that's another topic.

I don't know why but Python attracts some real zealots. I think everyone has had this conversation about Python's speed with someone (I have, unfortunately, had it many times)...if your program is slow, it is always your fault, Python is perfectly optimised and there is no reason to use any other language...I don't get it. Why does Python need to be all things to all people (srs, I want to understand)?

Perhaps Python doesn't need to be all things to all people, but it does need to be performant because people really shouldn't start projects with Python unless they are sure their project will never have performance requirements that Python can't deliver on, and they understand the limits of the "just rewrite it in C!" or "just use multiprocessing!" advice. Since it's generally unpredictable whether a project's performance requirements will exceed Python's ability or whether the eventual bottlenecks will be amenable to the aforementioned advice, using Python is generally a bad idea; however, many people don't understand the significant caveats to C/multiprocessing and therefore continue to happily start projects in Python (and paint themselves into corners that are expensive to back out of).

> The mental gymnastics people do to justify not learning another tool. You know what they say, if all you have is a hammer, everything looks like a nail.

This is so important to listen to .

Some people just can't accept there is a right tool for a job. And filling that hammer down to a needle is not something to be proud of.

The title is bad; the article doesn't deliver on the promise of making Python programs "blazingly" fast.

The first example given (the exponential function) is basically the worst scenario, because it's a purely numerical computation expressed in pure Python code. Whereas Python's performance is okay-ish for I/O or calling C modules.

From doing Project Euler solutions, I have ample evidence that for pure numerics (e.g. int, float, array), Java is anywhere from 10× to 30× faster than pure Python code executed in CPython. https://www.nayuki.io/page/project-euler-solutions#benchmark...

I believe it is basically impossible for Python to win back all that performance loss without adopting radical and jarring features like static typing, machine-sized integers, and no more "every number is a full-fledged object".

Lisp, Smalltalk, Dylan, JavaScript prove otherwise regarding performance of dynamic languages.

The problem is that projects like PyPy remain on the sidelines, instead of being fully embraced by the community.

That's because PyPy has poor support for C-API exentions, which is what all the performance-oriented python packages rely on.

Most people doing serious numeric work don't care about the speed of the Python interpreter because all the heavy lifting is done by optimized libraries like Numpy and TensorFlow.

Then maybe the statement should have been "I believe it is basically impossible for Python to win back all that performance loss without adopting radical and jarring features like dropping the legacy C interface to Python objects."

The fastest dynamic languages I see are those which have no C interface at all. How many of the performance optimizations in a modern JavaScript engine would be possible if it had to support accessing every property of every object as a C string at any point in a program? JS has exactly none of the features that comment claims would be necessary for performance.

> dropping the legacy C interface to Python objects.

This is nonsense. No language ever will have linear algebra or numeric implementations faster than Fortran/C implementations of BLAS, LAPACK etc. Not being able to make ffi calls makes python essentially unusable for most of its niche uses.

> JS has exactly none of the features that comment claims would be necessary for performance.

I don't see people doing ML, scientific computing etc in JS.

You're confusing "no FFI whatsoever" with "a sane FFI". Python's legacy interface is insane--it exposes virtually all details of the CPython interpreter to C extensions such that any deviation from CPython's implementation is effectively a breaking change.

C++ is pretty close to it.

Which will never change, because instead of improving Python, one writes C instead.

I could even use them from TCL if bothered to do so, thus Python's benefits as programming language are meaningless.

Tensorflow has multiple language support for example.

> Python's benefits as programming language are meaningless.

Being the most flexible and user-friendly interface to the most powerful libraries written in other languages isn't meaningless.

That isn't unique to Python.

Because of the web, Google and other companies have poured tons of resources into improving the speed of JavaScript. I wonder where Python would be if the same level of resources had been devoted to it.

There was no Web for pouring resources into Lisp, Scheme, Smalltalk, or even Dylan considering its short life.

Python already has it, PyPy, but it tends to be ignored by many.

Interestingly you didn't mention Lua. The best Common Lisp, Smalltalk, and JS systems are always about a factor of 2 to 10 slower than C. This is not very good evidence for your assertion. (I suspect that's true of Dylan too, but haven't tried it.) By contrast, LuaJIT often beats GCC on performance. This is excellent evidence for your assertion.

My point was about being fast enough, faster than CPython currently is by a factor of hundreds, not about beating C with its UB optimization tricks.

LuaJIT is stuck on a 2017 release, and an old Lua version, is it ever going to be updated?

Typically I measure CPython at about 40× slower than C, and SBCL, V8, and the like at about 2–10× slower than C. What cases are you seeing where those runtimes are hundreds of times faster than CPython?

People don't normally update to new versions of Lua, because they're not backwards compatible; it's not like Python or JS. WoW is still using Lua 5.1, as is MediaWiki. It's unlikely that this will ever change.

Probably at some point someone will continue LuaJIT development. It seems as likely that they will diverge in a different direction as follow PUC.

When those CPython applications are actually pure Python, without any call to C helper libraries.

Chez scheme is just as dynamic as python and is about as fast as C# running on mono IIRC. My scheme was probably worse than my C# was when I did project Euler though.

Chez does unboxed integer arithmetic (but not floats) and does not have to do any OO-like dispatch, and is also probably one of the best language implementations there are.

"Chez scheme is just as dynamic as python"

Is it? Python is really, really dynamic, which contributes to its slowness. You can directly change an instance's __class__ attribute. You can add properties to classes dynamically, changing how fundamentals of how attributes get looked up at run time. You can write a new class, using a new metaclass, and then set an existing instance to the new class.

A great deal of why Python is so slow is that it is really too dynamic. A language doesn't really want to be "as dynamic as Python".

Nothing that Common Lisp or Smalltalk aren't capable of doing as well, and they have quite capable JITs.

As an example, Smalltalk becomes: message completely replaces one object representation by another one.

Chez lacks a built in OOP system, but there is nothing prohibiting you from adding something like CLOS which does all python does and more (much faster than python).

Most modern lisp compilers do a lot of different things to make CLOS fast, though, prefilling caches and all that for you. Not only that, you can connect to a running program and redefine it while it is running.

"Chez lacks a built in OOP system,"

In that case, the answer is actually no to my question. Yes, of course you could program in that level of dynamicness, because you could in any language, but it will then slow you down. No sensible CLOS would be as dynamic as Python.

Like I said, in a lot of ways, you don't want to be as dynamic as Python, and I advise against language advocates seeing the phrase "Python is more dynamic than your language" as a cue to jump up and start insisting that they are just as dynamic as Python. Even in hindsight, I'd say the level of dynamicness in Python was a mistake. You don't need it to have a nice, usable, dynamic language, but it has been a ball & chain around its legs in terms of performance for decades.

To be clear, this isn't a criticism of dynamic languages as a concept. I have criticisms, but these aren't it. This is a criticism of Python specifically. A dynamic language can be pretty nice with, let's say, two or three layers of dynamicness, but Python has four or five. If you follow the full process that Python has to go through to resolve "x.y", including all possible points where you might have done something to affect the result, it's crazy overkill. In Guido's defense, when he was writing it way back when, that wasn't clear. There wasn't a lot of highly-relevant prior art to look at for that style language.

CLOS is just as dynamic as Python _and_ fast, at least in SBCL and LispWorks. CLOS is probably the most expressive object system you can find, and if you wanted it fast you would restrict it somewhat to allow for at least some of the dispatch to happen at compile time :)

It’s hard to find a Scheme implementaron that isn’t significantly faster than Python... all while being at least as dynamic and more expressive.

I think a lot of CS students had their brains melted by tough classes where Scheme was the vehicle. Thus, you have a population of users that either are anti-evangelists because they have PTSD, or you have evangelists who exist on a plane above the lumpenproletariat like me, which can contribute to the false notion that Scheme is intrinsically esoteric.

I never got a formal CS education, so I don’t have PTSD from Scheme. Also, I am not a super brain, so I guess I haven’t felt compelled to become expert at the hard concepts that Scheme enables.

People fixate on the S-expression syntax (“all those parentheses!”; counter argument “way fewer commas!”). But I think the real issue for Scheme is the lack of libraries that do hard things for “normals” like me.

If I’m strictly honest, I’m more productive in Python than Scheme. This is not because Python is easier. It’s because the Python community has attracted the CS grads who grokked enough of the hard stuff to make libraries that abstract away stuff.

There’s no reason people can’t write Scheme like they write Python. That is, people don’t need to do all the possible stuff in Scheme all the time. Truthfully, Scheme is at least as easy as Python.

Scheme just needs more smart normals writing libraries for mediocre normals like me for Scheme to become popular. Maybe take a domain approach. I feel like adapting R’s tidyverse to Chez is an easy target. Scheme could be the data scientist’s goto. Maybe show people how easy it is to build self-contained serverless apps in the cloud.

If there were a Scheme community that earnestly tackled any domain with the idea of making it accessible to practitioners within the domain, I think it could get real traction.

And it would be fast. Much faster than Python.

A nice show of Chez Scheme speed is Idris 2. Edwin is very impressed with the performance; it is very fast. And imho it is a lovely language with, as you rightfully say, one of the best implementations out there. Extremely portable (cpu/OS). And quite readable too.

Scheme is relentlessly monomorphic, and most implementations don't provide much reflection, preferring compile-time macros. (I don't know what Chez’s reflection facilities are like; can you do the equivalent of def __getitem__ or x.__setattr__ = y?) These attributes make Scheme much easier than Python to implement efficiently.

My experience is that simple programs usually take about twice as much code in Scheme. This is related to the polymorphism thing, but also I think Python supports imperative programming better. Python is built around autogrowing hash tables (“dicts”) and autogrowing arrays (“lists”); standard Scheme provides neither, preferring alists and cons lists.

In guile I would do

   (define-method (get (lst <list>) (n <integer>)) 
     (list-ref lst n))
in chez I would probably use chez soop, which is hideously underdocumented and probably bitrotten by now, or any of the other object systems available.

It would of course slow things down since I doubt chez optimizes that so you get runtime dispatch. SBCL however has an amazing object systems that does all of python's Oop with pretty good speed.

You are right that python is better at imperative programming, but that is also very much a matter fo taste. I always get a sour feeling in my mouth when using python because it is almost exclusively about mutability.

(hashtables are a part of r6rs, btw. Not very pleasant to use since: (hashtable-ref h key default-value))

C# running on mono is in my experience 10 times slower than equivalent C++ even for simple things like adding numbers in an array.

That story of mine is old, but I have some newer anecdata: porting all my old project Euler solutions to SBCL and getting help by some better lispers than me to optimize it made SBCL come out close, but almost universally slower than Chez if someone really wants I can see if I find the code on any of my external HDDs.

A number quoted is that SBCL preforms roughly on par with, but slightly slower than, java. For what it is worth, the computer benchmarks game confirms that: https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

That was with optimize level 3, which burns on (car 1), and then you might as well use C, but the language at least allows for those speeds.

Ooh! I don't think I've ever seen someone claim Chez Scheme produced faster code than SBCL before. That's interesting.

It is hard to claim, because there are hardly any benchmarks. I just had the opportunity to move between the implementations and found chez to be faster for the kind of things I was doing, which was numerical stuff and loops.

SBCL is also an amazing project, and lately the GC story has gotten better IIRC.

So will Racket become super super fast ?

"… Racket CS available, a beta version of the Racket on Chez Scheme implementation."


Probably not, but the code is a hell of a lot cleaner! I looked through it and thought that even I could work on it. The new macro expander is even possible to follow :)

You could just use Julia which basically gives you a Python like language but with C like performance. Julia will kick Java to the curb on numerics because it was actually built with that in mind.

And you get all of this WITHOUT static typing and yes in Julia everything is still a full-fledged object. There is no difference between numbers and other objects in Julia. But numerics is still fast.

I tried really hard to like Julia and just couldn't get behind it. We used it for a research paper in the area of optimization and by the end of the project I really regretted it. Performance and built in numerics libraries were great, but I feel the language ergonomics are pretty bad. I hate Python just as much as the next developer, but I do find it much easier to use than Julia for research type work.

My biggest qualm with Julia (and maybe this speaks to my inexperience with the language) is that it isn't always obvious when Julia is going to make a copy. We spent about an hour working through some code that was very slow (props to Julia's profiling tools) but couldn't figure out _why_ it was slow. It turned out that despite our best efforts, Julia was still copying a vector despite us using pre-allocated scratch space for the work.

From my point of view, if I am comparing algorithms then Python's performance doesn't really matter and it's ergonomics win. If performance matters I'd just use C++ or Rust.

Interesting how we can have such vastly different experiences.

When Julia makes a copy is pretty straightforward and natural IMHO. I would have been curious to see an example of the code you used where a copy was made without you knowing.

I started with Python, but I find Julia better is almost every single way I can think of. Like even if Julia was slower than Python I would have picked it because I find it so much nicer to use.

I wrote an article here about some of the observations I had about using Python after coming back to it from Julia:


There are some exchanges further down. Would have been interesting to hear your feedback on some of those things.

> My biggest qualm with Julia (and maybe this speaks to my inexperience with the language) is that it isn't always obvious when Julia is going to make a copy.

If you're talking about slices of an array, those always create a copy unless created with the @views macro (or the equivalent function call).

Julia is dead, but few realize it.

It's not as nice as Python, nor as fast as C++. And much less supported (tools/libraries/...) than both.

So it sits in this awkward middle between Python and C++, basically sucking at both and excelling at none.

This is serious nonsense! Julia is way nicer than Python. If Python was faster than Julia, I would still have picked Julia, unless I really needed the performance.

With Julia I get first class meta programming. I get awesome multiple dispatch. I get environments and package system really well integrated. I get awesome integration with the shell. Better module system. More natural syntax for arrays. Much better system for closures. Better named functions.

REPL programming in Julia is just light years ahead of anything in Python. The OOP design of Python really kills the REPL experience.

Unless you are a very skilled C++ programmer, Julia is going to outperform you as the program gets larger. C++ programmers are going to get themselves tangled up when trying to run multi-threaded code, running on multiple machines on GPUs and specialized hardware. Julia does this effortlessly.

C++ cannot do JIT, hence as soon as you deal with complicated machine learning algorithms with custom kernels, C++ is going to tie itself into a knot.

Why do you think large Astronomy projects like Celeste and the next major climate models are built in Julia and not C++? Because developers realized that when you need to run massive calculations on super computers on hundreds of thousands of cores, C++ is going to get in the way.

As for libraries and tools. All the Python tools I have tried to match my Julia tools have just sucked. Julia tools often excel over much older Python tools.

Library development moves much faster on Julia than Python. It is not hamstrung by relying on complicated C++ code based. Also Julia libraries integrate very well, while Python libraries are often their own deserted island. That means a few Julia libraries can do what must be accomplished with dozens of Python libraries.

I don't think it's dead. You are right on the other points. I think that Julia could have its day, it's having a lot of trouble getting traction. I mean I remember 20 years ago being laughed at for using Python for a web app, but now look at python.

I wouldn't claim that it's dead - it just fails to deliver on its main promise and the developer experience is still quite poor.

In which way does it 'fail to deliver on its main promise' and how is the developer experience 'quite poor'?

Does Julia have the number of libraries that Python does?

I think for pure numerics using numpy basically recovers all performance loss. It's when you get tricky control-flow that you run into real trouble.

Typically straightforward Numpy gets me from 40× slower than straightforward C to 5× slower. Tricky Numpy (output arguments, conversions to lower precision, weird SciPy functions) gets me another factor of 2. C SIMD (intrinsics or GCC’s portable vector types) gets me a factor of 4 faster than straightforward C. Presumably CUDA would buy me another factor of 100 but I haven't tried it and I haven't tried Numba either.

You're probably expecting a bit much from CUDA. If you have heavily optimized CPU code running on a high-core count Xeon, it's probably more like 2-3x. The reason why CUDA is so popular is because it makes that comparatively easy to achieve. Optimizing x86 to the last 10% is a dark art only very few programmers are capable of, while writing decent GPU code is IMO an order of magnitude easier, i.e. just a craft.

Main difference being: High memory bandwidth vs. heavy usage of cache, unified programming model for vectorization and multicore parallelism

Btw GPU has its own cache and all problems related to it. You still have to watch what/how you do things on GPU to be cache friendly

Yes, but it's much less emphasized on GPU I think. If you have a data parallel algorithm, as long as you design the array ordering to allow coalesced access, the memory architecture will usually already allow better performance than what you can hope to get from CPU even with heavily cache optimized code that's basically unmaintainable (as it will likely perform much differently on the next architecture).

Thanks! But what if I don't have any kind of Xeon, but do have an NVIDIA card?

without lots of CPU cores and with a high-end NVIDIA card your speedup expectations just can become a bit higher. Typically 100x when comparing GPU-friendly algos to unoptimized (but native) CPU code or 10x when comparing it to decently optimized code running on slower CPUs.

Generally I think a performance/cost comparison is more useful: Take the price of the GPU and compare it to something with equivalent cost in CPU+RAM.

You don’t need any kind of Xeon - most of recent i5/i7/i9 cores have AVX and even AVX2 support.

> Typically straightforward Numpy gets me from 40x slower than straightforward C to 5x slower.

I find this hard to believe. What kind of numerical work are you doing? Even something as simple as matrix-matrix multiplication should be hard to beat with C, unless your C code is using a cache efficient algorithm.

Branch heavy code, for example trading order book updating.

People always say "use numpy", but that is only possible if your algorithm can be described in terms of vectorized operations. For many kinds of processing, the only alternative is C/C++ (through Cython)

That was my hunch.

> People always say "use numpy", but that is only possible if your algorithm can be described in terms of vectorized operations. For many kinds of processing, the only alternative is C/C++ (through Cython


I think using numpy is always good first step after just trying to improve the algorithm. Numpy will be less effort than going to cython. After that cython is a good next step. I seriously don't know any situation where I would do the kind of micro-optimizations mentioned in the article.

> (through Cython)

My personal experience is that you can actually get another factor of 2 or 3 speed-up by ditching Cython and using actual C instead (I think it's because optimizers have a hard time cleaning up the C that Cython produces), even if you've turned off thing's like bounds checking.

> I find this hard to believe

I guess you haven't tried it, then. But your lack of knowledge is not a reasonable justification for attacking my integrity.

> Even something as simple as matrix-matrix multiplication

That's the best case for Numpy, not the worst. SGEMM is indeed just as fast when invoked from Numpy as when invoked from C, at least for large matrices.

Using GPUs involves significant tradeoffs that go way beyond writing SIMD code. (mostly batching and off device transfers)

CPU SIMD code can be trivially mixed with non SIMD code but mixing GPU and CPU code may negate the benefits.

Indeed, numpy is awesome. I used to do my computational experimenting (some of them including neural networks) the plain way (using classic data structures, the languages' built-in arithmetic operators and functional facilities) in many different languages. Once I've tried Python with numpy my mind was blown, it's so much faster than anything. Now I feel like I enjoyed writing functional code more but given the performance difference I can hardly imagine coming back. So the very reason I use Python is performance.

> I believe it is basically impossible for Python to win back all that performance loss without adopting radical and jarring features like static typing, machine-sized integers, and no more "every number is a full-fledged object".

Pypy is often quite a bit faster than CPython for real-world programs so clearly some improvement is possible.

It is a shame that Python became so popular. There are a number of choices that are just as “easy” (or better) and an order of magnitude more efficient.

Two options you can try guiding a budding python user to are Nim and F#

Both rather dead because you get to chase Microsoft, and even Mono cannot achieve that.

Short: they're not portable beyond Windows.

Nim is not related to Microsoft at all, and runs anywhere GCC can run.

F# runs on Windows/Linux/Mac standard since .NET Core, and Mono runs it anywhere, Nintendo Switch, IOS, Android, playstation.

> So, let's prove some people wrong and let's see how we can improve performance of our Python programs and make them really fast!

I have to say, the desperate lengths Python programmers will go to to use it for things it was not meant for rather than learn or use other languages is one of the aspects I most dislike about it. However fast you make it, the same effort would have made it that much faster again in a performant language.

I love Python as a glue language. So much heavy lifting done in numpy or opencv or whatnot. But Python as the interface makes it trivial to explore, experiment, and glue together a workflow, especially when the solution is unclear.

Then at some point if Python isn't needed because you know exactly what you want your software to do, rewrite it in C++ or whatever.

Also with CFFI and other interoperable libraries, it's really quite easy to write some heavy work in a more appropriate language and call into it.

For that kind of workflow you would be far better off with e.g. Julia. You get the same advantages as Python as having a language you can experiment with until you find a solution. Only difference is the optimization step later does not involve having to rewrite in another language.

If you already know Python, and Python packages already do all you will ever need then sure stick with that. But I don't get why people would go to such lengths to avoid using a new language. Being proficient in Julia is a lot less work than maintaining proficiency in Python and C++.

The last time I checked, using Julia was clunky at best with ridiculously high jit-compile times, packages that refused to build on my machine etc.. What is more, many of the "best" Julia libraries were seemingly just linked-in Python code.

I don't mean to discredit the advantages Julia clearly has over Python, but these are just the kinds of problems that make people like me stick with tried and tested last-gen languages like Python.

Did you ever check after 1.0 was released? In the earlier days it was a lot of problems with packages. Totally agree. JIT compile times are much better now.

A lot of the issues are simply that people have not learned a sensible workflow with Julia. Python guys have a lot of habits that don't translate well to Julia. I know because I work daily with two hardcore python guys. I notice all the time how we approach problems in very different ways.

Python guys seem to love making lots of separate little programs they launch from the shell. Or they just relaunch whole programs all the time.

In Julia in contrast you focus on packages from the get go and you work primarily inside the Julia REPL. You run Revise.jl package which picks up all the changes you make to your Julia package.

I guess it just depends on the workflows you are used to. For me it is the opposite. Whenever I have to jump into our Python code base I absolutely hate it. It is very unnatural for me to work in the Python way. I also find Python code kind of hard to read compared to Julia code.

But I know Python coders have the opposite problem. Basically Python guys look a lot at module names when reading code. Julia developers look more at types. The difference makes some sense since you don't really write types in Python code.

I found that the new Python type annotation system helped me feel at home in Python.

OMG. Until now I didn't realize Cobol is more popular than Haskell and Delphi. I had just read there are ~2.5 Cobol programmers left in the world, who are old like Gandalf, enjoy ridiculously high salaries and can't retire because nobody learns Cobol any more while the world needs somebody to maintain those legacy systems that still run its economy. While Delphi used to be the PHP of the desktop world until very recently (and could hardly be expected to decline so fast) and Haskell seems to be the computer science lingua franca.

>The more a language tutorial is searched, the more popular the language is assumed to be. It is a leading indicator. The raw data comes from Google Trends.

That's the problem, nobody is actively learning Cobol these days and nobody knows how much is running out there in the wild, because none of the big banks or credit companies will actually admit it.

You joke but the jobs are still out there, still being posted and companies are still hiring. https://www.wellsfargojobs.com/job/irving/apps-systems-engin...

For the pretty much perfect language, try Erlang or Elixir. The runtime guarantees are like no other languages, and it can easily drop down to C, C++, Rust and several others when needed.

Neither of those is arguably very expressive as a language though. It's the runtime and the built-in libraries around it (OTP) that holds the power.

What do you mean by "expressive"?

Able to abstract concisely at a high level, provides decent facilities for modeling business logic in a succinct yet correct manner etc.

Elixir and Erlang both lack a type system and maintain a focus on keeping the language constructs simple rather than providing various high level abstractions some other more expressive languages have.

static typing (especially HKT), macros, or laziness? The BEAM is amazing, but Erlang is kind of meh. Also the BEAM is very slow so you end up writing C to get any real work done. I haven't used Elixer, but I suspect it suffers from the same problems. When the JVM finally supports preemptive multi-tasking, Erlang and the BEAM will become completely unnecessary.

But this is a benefit not of python but of the ecosystem around it. We should be making those libraries, as easy to use, in other languages.

> [...]the desperate lengths Python programmers will go to to use it for things it was not meant for rather than learn or use other languages[...]

I don't think it's an issue of not wanting to learn other languages.

If you really like screwdrivers and go to screwdriving events, have a collection of screwdrivers, you may end up living in a echo chamber where you and your peers are convinced using a screwdriver is a good tool to plant a nail.

I think this is exactly how R's accreted features to make it do things that it was never designed to do. When the "tool you use" becomes the "kind of developer you are," things get a little restrictive.

It's also how languages like Scala become "everything and the kitchen sink" multi-paradigm monsters. People making the tool want it to do everything, so that they can get all the developers, so they make it a functional object-oriented imperative procedural declarative hodgepodge of 15 different barely-mutually-operable sub-languages, and you get a mess.

You are right, the article doesn't deliver and its title is just clickbait. This is the kind of shallow content that's flooding dev.to nowadays.

I love Python and you are 100% correct.

> However fast you make it, the same effort would have made it that much faster again in a performant language.

If you're trying to make something go 'really fast' these days, that means either (a) some kind of vectorization, or (b) pushing work onto a GPU. In either case Python is unlikely to perform meaningfully worse than any other language, since the host language isn't doing much anyway.

> First rule of optimization is to not do it.

This is an unfortunately common misunderstanding of the phrase: "premature optimization is the root of all evil."

Optimization is a crucial part of developing successful software. It can be harmful to get overzealous with certain types of optimization, however basic wins like using string builder primitives or formatted strings from the outset is hardly premature. Some optimizations can only be realized at the early conceptual stages too; going for those early on isn't always premature.

Yeah, I see that a lot. People always leave off the last part of the quote, which was the actual point.

“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.”

So annoying. I would get people telling me to not make an obvious performance improvement that adds no complexity to the code and which is obvious. Yet some basically insist on using the least performant solution possible as somehow being good software engineering. It is insane how rule bound people can get. No wonder religion exits. People just live inventing rules and forcing others to follow them.

I think you’re right, programmers (and people) can be blinded by rules. Knuth never meant to suggest people should use the least performant solution available. He’d be horrified by some of the things people justify with his quote. The idea was always to make good engineering choices from the start, including using tools and techniques that are known to give good performance, but to wait to get low-level and measure and count cycles until the code is mostly done being written and not going to change significantly later. Choosing a less performant option because it’s slower is a bad choice. Choosing a slower option that is easier to refactor as you go, when it’s clear refactoring will happen, over a faster solution that will make refactoring harder, that is a perfectly acceptable engineering decision.

there's tons of dogma in programming. likely more than in most professions because it's not really a scientifically driven field. e.g. I'd love to just put all Emacs and VI zealots together in a room and show them Engelbart's 1968 demo, Clockwork Orange style.

Hahaha that is a good one. I remember as a C++ programmer there was several occasions where I saw a goto statement would have given the cleanest and most maintainable code (typically exiting deeper loops). Yet I always picked more convoluted solutions because I knew what an immense shit-storm I would have cause if I had checked in code with just a single goto statement.

It would not have mattered that I could have provided a rational explanation for why that was a rational choice in that instance. They would have just kept reciting scripture and called me a heretic.

Meanwhile people will let you you commit the worse most unmaintainable code, as long as it doesn't break any the 10 commandments of coding or whatever the equivalent would be.

I actually almost wanted to mention goto as an example of this kind of dogma.

Good to include the whole quote.

I wonder if using string builders is really a critical 3%... and how many people who do practice premature optimization actually measure if their optimization of choice is in their program's critical 3%.

String builders are often not a critical optimization, however the additional cognitive burden on the reader is nearly zero. In some languages, string builders can even overload operator += which makes the type of the object the only visible distinction outside of the final conversion to string.

In languages that have immutable strings, a chain of `+=` operators is basically O(n^2) vs O(n) for a string builder. For how easy the optimization is, there's little excuse to not use them for any bulk append operations.

The standard approach to this in Python is

    rv = []
    for x in y:
        if g(x):
    return ''.join(rv)
This gives you the same O(N²) to O(N) speedup you would get from a StringBuilder.

More recently, though, I've often been preferring the following construction instead:

    for x in y:
        yield f(x)
        if g(x):
            yield h(x)
This is sometimes actually faster (when you can pass the result to somefile.writelines, for example, which does not append newlines to the items despite its name) and is usually less code. If you want to delegate part of this kind of string generation to another function, in Python 3.3+, you can use `yield from f(x)` rather than `for s in f(x): yield s` or the just `yield f(x)` you use if `f` returns a string, and the delegation is cleaner and more efficient than if you're appending to a list and the other function is internally joining a list to give you a string.

However, if you're optimizing a deeply nested string generator, you're better off using the list approach and passing in the incomplete list to callee functions so they can append to it. Despite the suggestive syntax, at least last time I checked, `yield from` doesn't directly delegate the transmission of the iterated values; on this old netbook, it costs about 240 ns per item per stack level of `yield from`. (By comparison, a simple Python function call and return takes about 420 ns on the same machine.)

But if you really wanted your code to run fast you wouldn't have written it in Python anyway. You'd've used JS, LuaJIT, or Golang. Or maybe Scheme. Or C or Rust. But not Python.

Okay, so I checked this for JavaScript, and it's not actually true -- in Chrome, a vanilla += is faster than pushing into an array and joining.


This is why you really should always benchmark. In my view, "premature optimization" is not so much about optimizing too early in a project, it's about writing code a particular way you assume will make it faster without testing first.

So that means JS strings aren't truly immutable in a modern environment (which is fine). The runtime environment is internally using an approach similar to a string builder, which is a good optimization.

I agree that you shouldn't operate on assumptions alone for a decision like whether or not you should use a string builder. That's where prior experience should come in to play to guide your decisions. For instance, I am not a JS developer, so I have no prior experience to inform a decision to use a builder vs concat in JS.

I cited that case in particular since the slowness of concatenation was called out in the article, and in some languages it actually does make a huge difference at a very small complexity cost.

In the martial arts movie trope, the student becomes a master upon realizing that "the rules" don't always apply to all situations.

I don't know what language you are using, but what performance gain does using string builders give you?

when concatenating strings, stringbuffer can be orders of magnitude faster. one c# example from google search "stringbuilder vs string benchmark" from codinggame.com measures 2.5 minutes for string += and 99ms for stringbuilder.

Maybe for Python a stringbuilder is faster, but for JavaScript, += is faster.


Don't assume a certain way of coding is faster because you read it on the Internet, actually profile your code.

> Maybe for Python a stringbuilder is faster, but for JavaScript, += is faster.

I am amazed and saddened, that in 2020, concatenating strings, regardless of the form or language, is not blazing fast across all environments.

sry i meant stringbuilder, not stringbuffer.

They allow you to change the complexity from O(n^2) to O(n).

None of the performance tuning suggestions are benchmarked, and I find it hard to believe these would ever make a substantial difference. They could make a statistically significant difference, maybe, but local variables vs class attributes? You should show how much of a time saver this is, because I can't envision a realistic scenario where this is worth the developer time.

The runtime cost of instance attribute access rather than local variable access can account for a quarter of a program’s run time; I just tried it on my phone:

    Python 3.7.4 (default, Jul 28 2019, 22:33:35)           
    Type 'copyright', 'credits' or 'license' for more information
    IPython 7.8.0 -- An enhanced Interactive Python. Type '?' for help.                                                                                                     
    In [1]: class X:                                           
       ...:     def y(z):                                      
       ...:         return z.a + z.a + z.a + z.a + z.a
       ...:     def w(z):     
       ...:         a = z.a                                    
       ...:         return a+a+a+a+a

    In [2]: x = X()

    In [3]: x.a = 3                                     

    In [4]: x.y()                                           
    Out[4]: 15                                          

    In [5]: x.w()                                  
    Out[5]: 15
    In [6]: %timeit x.y()                               
    1.7 µs ± 11.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)                                                                                              

    In [7]: %timeit x.w()                                  
    1.11 µs ± 5.95 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
This is often surprising to novices in Python, but attribute access involves a hash table lookup.

Note that here we are comparing two instance attribute accesses against seven, not zero against five. Evidently each of them cost about 118 ns, so if we could reduce them to zero, the method call and return and four additions would cost only 870 ns, which is closer to half the runtime than ¾.

Moral: benchmark before pooh-poohing a hotspot.

Also though note that several thousand instructions is a pretty heavy price to pay for four integer additions.

> can account for a quarter of a program’s run time

It can but it will more likely account for much much less than that, unless all your programs are massive loops that do little more than access the same attribute repeatedly.

Compared to __slots__ (also Python 3.7.4)

Using your definition of class X

  %timeit x.w()
  313 ns ± 18.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Add __slots__

  class X:
    __slots__ = ('a')
    def w(z):
      a = z.a
      return a+a+a+a+a

  %timeit x.w()
  271 ns ± 7.13 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
About 14% less time.

Your computer is evidently faster than my phone; how fast was y() for you?

Also you are missing a comma in your would-be tuple.

For what it's worth I got about 196 and 131 for the original y and w, and after adding __slots__ (with comma) I got 186 and 123.

The fact that the difference is 10ms and 8ms respectively suggests that the speedup of attribute access isn't what's showing up in your measurements. In one case we access the slot "a" once; in the other case we access it five times. How can that be a 20% difference?

Yes, but TFA begins with a discussion of benchmarking methods and it would just take some copy and pasting to figure out if you're right.

Edit: I tested it and there is a difference

I picked the example you mentioned, defined the regex constant as "s" and the line constant as "asas", increased the iteration count from 10^4 to 10^6 to make the difference more noticable (got no noticable difference w/o that change) and measured the programs with time in Termux on my Nexus 5.

Real time for fast: 3.150, real time for slow: 3.623. Times average of three runs ("time python fast|slow.py"). I ran once with same setup and threw out results before the runs.

Edit2: actually you mentioned a different example, my bad. I didn't measure any others because vim with a touchscreen keyboard is a PITA, no idea if the one you referring is true.

Edit3: times in seconds

Maybe coming from a diff place? It seemed fair: I spent an hour or two today basically undoing some canonical high-level Python/Pandas abstractions into canonical numpy+C, which was exactly this kind of stuff. We normally do GPU offloading (rapids.ai / blazingsql), though today's kernel was the uncomfortable inbetween where latency mattered but the task is too small for the offloading communication hit. We do the same kind of stuff (and a bit cleaner / more predictable) in JS with typed arrays.

I wouldn't call that blazingly fast python though, that's barely approaching C, which is also slow. Fast is maxing out SIMD + cores + GPUs + bandwidth, so should aim for ~20X+ faster than regular C...

Yeah, that annoys. Shows many optimization techniques, doesn't even apply them to original problem and presents no benchmark.

You can't even know if you are dealing with micro-optimization or getting any evidence that stuff like that helps.

Interesting article. While I definitely think you should be profiling your code to figure out the hot spots, cProfile has some limitations for profiling: cProfile doesn't give you line numbers, doesn’t work with threads, and significantly slows your program down.

I wrote a tool py-spy (https://github.com/benfred/py-spy) that is worth checking out if you’re interesting in profiling python programs. Not only does it solve those problems with cProfile - py-spy also lets you generate a flamegraph, profile running programs in production, works with multiprocess python applications, can profile native python extensions etc.

Have you looked at Yappi[0]? I use it in combination with kcachegrind[1] (call graph viewer) and the combination has been extremely useful in eliminating bottlenecks across entire programs.

Side note: I also used pyreverse, now part of pylint, to diagram entire projects and get a class hierarchy. It helped tremendously in refactoring and decoupling code through whole projects, finding redundancies, and have a better architecture.

I'll have a look at py-spy. Thanks for that.

[0]: https://pypi.org/project/yappi/

[1]: https://kcachegrind.github.io/html/Home.html

Shoutout to Ben, py-spy is an amazing profiler. I believe cProfile has certain limitations and doesn't fully understand deep call stacks. py-spy does not have that limitation. It also offers multiple output formats (especially flamegraph and speedscope format, https://www.speedscope.app/) which make it so much nicer to identify bottlenecks.

At our company, py-spy has helped us a lot for our line-of-business application. I'm not affiliated with Ben in any way, but he deserves some praise for his work on py-spy.

> Python haters always say

Stopped here immediately. I have been writing software for more than 20 years, mainly in C++ and Python. No professional would start this kind of discussion with this childish attitude (apart from the fact, that content-wise the problem was beaten to death for decades).

There kind of are hater of all categories though. But then again you also have Python fanboys who will do just about anything to avoid using something that isn't Python.

As a Julia developer I see this a lot. You point out Julia advantages and the Python guy will respond with: Oh I can do that in Python to if I use package X, Y, Z combined with feature A, B, C. Basically their response to a simple well engineered feature is a complete mess of a solution. But hey they prefer that because they can still stick the label Python on top of it.

I admit I also get set in my ways, but at least I like to think that when I dismiss another language it is not for purely silly reasons.

This happens with every language that reaches popularity. That’s because it’s typically easier for individuals to engineer solutions with tools they know well, even if suboptimal, than it is to become proficient with new ones that might or might not deliver better results in the end. No community is immune to this, even outside IT.

I’m pretty sure you’ll also occasionally bang nails for which Julia is a poor hammer, you just don’t realize it.

Oh I definitely know there are things Julia is not good at. It is just that Julia does not get in my way as frequently as many other languages.

But I kind of keep a collection of favorite languages under my belt which cover different areas. My favorites are probably Julia, Go, Swift, Python, Lua and LISP in that order.

If I need more low level style coding I would go with Go (pun not intended). Swift is nice if you want to actually want to make GUI applications and something that is quite robust. The type system in Swift is quite good at catching many problems.

There is a cost to switching and introducing new technologies.

Otherwise you end up with a project which uses all of Python/Perl/Java/Julia/MATLAB/R/C++/Fortran/Rust/Go, because hey, for this particular problem X has the best solution so lets use that.

I think it is valid though to start transitioning to a better language for a project when initial experiments show it is superior for the task.

People are often WAY WAY too reluctant to rewrite code. Instead they spend years maintaining garbage.

I remember rewriting an iOS app from Objective-C to Swift. Everybody thought it was a waste of time and should not be done. People tend to only think about what is of immediate benefit.

I only rewrote the most important parts. About 30% remained in Objective-C. Once it was in Swift lots of developers suddenly started getting interested in joining. They loved working with Swift and made lots of contributions.

But then they hit the Objective-C parts and where bummed out. All the guys who had said rewriting to Swift was a waste of time was now complaining about the existence of Objective-C code and that we had to get rid of it.

So I rewrote the rest. The point is that, people seldom realize how much benefit a better language can be until they actually start working on a code based written in a better language. Then they will often start hating the very code base they had previously defended.

Think of the millions of lines of Cobol code stuck on mainframes which is almost impossible to maintain today. We are stuck with that because at every juncture where there was a chance to upgrade and switch to a more modern technology, somebody made a variant of your argument.

Sure you cannot have free for all. But it most be possible to have a sensible process where you experiment with some alternatives. Evaluate the pros and cons and then switch to the better choice.

I think, you unterestimate infrastructure (API-wise/tooling/people etc.) here. As it happens, I am myself a beginner in Julia. Not the language is the hard part, but the glueing with my current working environment. And as a beginner I don't speak or even dare to think about commercial projects. This is quite non-trivial and a wonderful sophisticated one-liner quickly becomes far costlier, considering all constraints - even in the long run.

> I like to think that when I dismiss another language it is not for purely silly reasons.

One should also to choose a suiting language, if you do not want to risk falling victim to the same attribute.

Ugh, the whole “python is slow - but it’s great for piping C libraries” trade off has been discussed a gazillion times before.

This article is written by someone who obviously doesn’t know much about CS.

Please HN community, try to not upvote these, it’s a waste of time for all of us.

The only Python programs that can be called "blazingly" fast compared to equivalent programs in performant languages are either spending all their time in I/O, or all spending all their time in C. Python is a nice language and with some tricks you might speed it up by a factor 2-10, but writing the same program in, say, Java, will often be 50-100x faster.

Exactly. And then after the "blazingly" fast clickbait title we read:

"I'm (mostly) not going to show you some hacks, tricks and code snippets that will magically solve your performance issues. This is more about general ideas and strategies, which when used, can make a huge impact on performance, in some cases up to 30% speed-up."

Python CPU-bound programs are e.g. 30 times slower than C or Java, and "up to 30% speedup" makes them still 20 times slower which is really far from "blazingly".

An example (best scores):


Java 6.83

Python 259.50

Yep, and if you a dynamic language:

Node 6.72

Or just a modern language with nice quality of life features:

Rust 1.72

(All of these are seconds)

C has 1.64, even better, I’ve intentionally omitted that, but that is the reference for cases “it doesn’t have to be new fashion but has to be fast.”

Yes, but writing C is a completely different, and much more onerous experience. You have no sensible strings or hashmaps in the standard library, and using external libraries is a massive pain. That's not even to mention Undefined Behaviour and memory safety issues.

For most data-munging programs, python, node, java, and rust code will be roughly similar (Java and Rust will make you annotate types). I've been amazed at the performance you can get from Rust code that looks practically identical to the equivalent JavaScript.

See the C source I don’t find it less readable than Rust version.


> ...or all spending all their time in C...

Python performance varies the most in pl benchmark game.


Yeah and if you look at the Python program with the best performance compared to C you'll see that it spends all its time in gmpy2, which is exactly the same library C uses. Python still manages to be 2x slower.

Difficult to show bi-modal in a box plot.

The article has some embarrassing errors, and its advice is not going to make your Python programs blazingly fast, but it's a good start.

Resuming a generator in CPython is a lot faster than creating a whole new function call, and especially a whole new method call, contrary to what the article said. But often enough it's faster to just eagerly materialize a list result.

Some other good tips: %timeit, ^C, sort -nk3, Numpy, Pandas, _sre, PyPy, native code. In more detail:

• For benchmarking, use %timeit in IPython. It's much easier and much more precise than time(1). For super lazy benchmarking use %%time instead.

• The laziest profiler is to interrupt your program with ^C. If you do this twice and get the same stack trace, it's a good bet that's where your hotspot is. cProfile is better, at least for single-threaded programs. Others here suggest line_profiler.

• If you have output from the profile or cProfile module saved in a file, you can use the pstats module to re-sort it by different fields. But you probably don't, you have some text it output. The shell command `sort -nk3` will re-sort it numerically by column 3, which is close enough. In Vim you can highlight the output and type !sort -nk3, while in Emacs it's M-| sort -nk3.

• You can probably speed up a pure Python program by a factor of 10 with Numpy or Pandas. If it's not a numerical algorithm, it may not be obvious how, but it's usually feasible. It requires sort of turning the whole problem sideways in your mind. You may not appreciate the effort when you are attempting to modify the code.

• The _sre module is blazingly fast for finite state machines over Unicode character streams. It can be worth it to transmogrify your problem into a regular expression if you can.

• PyPy is probably faster. Use it if you can.

• The standard advice is to rewrite your hotspots in C once you've found them. Maybe this should be updated; Cython, Rust, and C++ are all reasonable alternatives, and for invoking the C etc., you have available cffi and ctypes now. In Jython this is all much simpler because you can easily invoke code in Java, Kotlin, or Clojure from Jython. An underappreciated aspect of this is that using native code can save you a lot of memory as well as instructions, and that may be more important. Consider trying __slots__ first if you suspect this may be the case.

> The laziest profiler is to interrupt your program with ^C. If you do this twice and get the same stack trace, it's a good bet that's where your hotspot is.

I do that sometimes, but it has some pitfalls. If most of the time is spent inside a C module (for instance in numpy), then the interrupt won't be caught before the C module is exited, which can lead to a wrong stacktrace.

Excellent point!

> If it's not a numerical algorithm, it may not be obvious how, but it's usually feasible. It requires sort of turning the whole problem sideways in your mind.

At this stage why are you even using python anyway? The code isn’t going to be very pythonic or readable and the effort would in my opinion be better spent on C++ or Rust.

You might be right.

These are all trivial micro-optimizations.

“If you want your code to run faster, you should probably just use PyPy.” — Guido van Rossum


Until you need a module that requires the C api: at which point pypy becomes useless

They're developing a replacement API to address those issues.


That is when you see the light and switch to Julia ;-)

Cython is alternative option for speeding up specific parts of code. There is also numba and hope modules providing JIT decorators.

Personally, I’ve tried pypy without issues, out of curiosity, but in about 15 years of using python never ran into python code as being the performance bottleneck. There are too highly performant modules for everything.

Do you have advice for finding code snippets that would most benefit from being re-written in C and called with Cython? I know how to find slow functions in Python, but obviously not every slow function will be a good candidate for rewriting.

In my experience, most pure python code will be 10 to 100x faster if it’s rewritten in C++. So I just profile it as usual using cProfile, try to make algorithmic improvements (eg caching) and then, if I need another order of magnitude, rewrite it in C++.

So basically anything where the hot path is in pure Python, rather than a standard library method.

Fourth time this has been posted in 12 days. My comment from 12 days ago is at https://news.ycombinator.com/item?id=21930569 . I pointed out that kernprof profiling shows that 99+% of the time is spent in

    s += num / fact
so none of the techniques describe give blinding speedup. I also suggest pre-compiling the regex.

Pre-compiling regex doesn't actually give you any real performance benefit as Python3 caches it internally anyway. See https://docs.python.org/3/library/re.html#re.compile

Sure. My original, linked-to comment ended:

> Now, re.findall() does cache the last 100 or so regexps, so it probably won't re-evaluate the regex each time. But really, pre-compute that regex with "_my_pattern = re.compile(regex) ... _my_pattern.findall()" and avoid even that cache lookup.

cpburns2009 says its 512 these days, which doesn't change the essence of my comment.

The cache is of a limited unknown size. It's more reliable to explicitly precompile your regular expressions.

EDIT: The cache appears to be 512 on Python 3.6 so maybe precompiling isn't necessary unless you frequently use a large number of regular expressions.

The lrucache recommendation could help if this was turned into a recursive / dynamically programmed approach.

I regret reading this article and I think the title is clickbait. I was hoping for something like PyPy or Unladen Swallow, etc. The equivalent programs in TFA will be blazingly faster if ported simply to other languages.

No need to port, just run them with PyPy to make them multiple times faster. As usual.

> Don't Access Attributes (example `import re; re.findall(...)` vs `from re import findall; findall(...)`

I find it a good habit to always import modules and almost never (sane exclusions apply) import individual functions from them. If I use something frequently, I'd alias it for clarity (`import sqlalchemy as sa`)

The reason is that otherwise, patching with mocks becomes somewhat tricky, as you'll have to patch functions in each individual importer module separately. Here's an example: https://stackoverflow.com/a/16134754/116546

Maybe that's wrong but my idea is that I don't want to assume which module calls some specific function but just mock the thing (e.g. make sure Stripe API returns a mock subscription - no matter where exactly it's called from). Then, if I refactor things and move a piece of code around (e.g. extract working with Stripe to a helper module), my unit tests just continue to work.


> Based on recent tweet from Raymond Hettinger, the only thing we should be using is f-string, it's most readable, concise AND the fastest method.

I love f-strings, but to best of my knowledge, one can't use f-strings for i18n/l10n, so all end-user-facing texts still have to use `%` or `format`. E.g. `_("Hello, %(name)").format(name=name)`.

FWIW I came to the same conclusion as yours for the exact same reason (mocking). So there are at least two of us :p

A 30% speed-up in Python is still dog-slow. This is a terrible article, he doesn't even talk about his "example." it's like he gave up 1/10th of the way through the post.

The article does not seems to work for me. I only get "undefined" as contewnt. Looking at the network-debugger in Firefox the call to load article seems to be blocked due to CORS. (it tries to do a call on port 1234 for some reason)

Just read the three top comments and their threads. There was absolutely no meaningful discussion or worthwhile contributions in any of them, just fans of less popular languages mostly venting their resent.

The weirdest thing is that they aren't even using python nor it seems that they're being forced to use it currently, making all this... Ranting (there's literally no other word for this) all the more inexplicable.

I don't understand it; I've been using Go for a year now at work. I hate pretty much everything about it, yet I haven't ranted about it in an article about the language for about that time. There's just no point to it.

This pretty much sums up the whole thread. Lots of new people are doubting Python in cases where Python is being heavily used in megacorps. Python is special for its community, libraries and the huge amount of work that it's built on top of. And wake me up when some of these obscure languages that are being mentioned here take over python.

But Python zealots can be annoying. That's true for any language. Personally I don't like python's asynchronous programming paradigm. Objectively Go does it better than Python.

Anybody with experience able to chime in on a question? So, at a high-level, I am looking at using Python at my workplace. We are a weird amalgamation of a Java and Microsoft shop, using Java and Kotlin for 'critical' systems, while heavily relying on SQL Server/SSIS/SSRS for all our back-office processing (batch jobs, reporting, ETL etc). This is the stuff my team is responsible for, and we are constantly hitting the limitations of this stack. My feeling is that Python brings enough to the table as a general purpose language to be a good fit for our use-cases. Simple automation of file io, analytics and reporting, small footprint web frameworks (flask), big data tools like Spark, libraries like Pandas, PyTorch etc. Also, I don't have time to learn idiomatic Scala. It's not about laziness, its just that I feel Python brings enough to the table to be useful, while still being productive and readable. Then I read threads like this and start second-guessing myself. I see some red-flags for sure, but I'm just looking for some validation here. Basically, we have a lot that needs fixing, we need to do it quickly, and I'm wondering if Python can work. We are certainly in the realm of 'big-data', and are currently handling everything with procedural SQL, some Java apps that need refactoring, Perl scripts and scheduled tasks on Win Server, and a bloated, poorly implemented Java Web App to provide a front-end to our poorly maintained, non-normalized database.

Back on my TCL days, I learned to never rely again on a language without JIT/AOT toolchain.

So unless you are into adopting PyPy, you will be better off with JVM and .NET stacks.

Plenty of languages to choose from, while benefiting from their performance and tooling.

I should note, I'm not particularly concerned with performance. We already have fairly optimized DB code, views, sprocs, indexes etc. This layer is currently sufficient for our needs. So ideally, we would still continue to leverage the SQL-Server. What we need, is to extract business logic from the DB, into application code which is testable. All of this processing is 'batch', we also have options for deploying (Azure, PCF) which can handle issues of scale. I'm more concerned with getting it right, than making it fast. I'm not very experienced with C#, but have experience with Java/Spring web development, and have yet to find any frameworks that allow for rapid development akin to flask or rails. Java/Kotlin is great for back-end dev with spring-boot, but full-stack... not so much. Also, I don't want to manage the complexities of any front-end JS framework-du-jour. I know React, Angular and some Vue. I'm very much of the YAGNI philosophy when it comes to front-end (at least for Enterprise apps). PyPy is a viable option, as I don't see any immediate need to call into C (although this assumption is likely to come back to bite me).

Grails, although it has gone out of fashion.

meh. I’m not trying to sound cultish, but if you’re not at least familiar with some of the packages I mentioned... Python is different than TCL. Python isn’t growing in popularity for nothing. At the end of the day, I just want tools that get out of my way, while keeping the loc I’m responsible for maintaining to be small and easy to grok.

Python is growing in fashion because those Fortran and C++ GPGPU libraries happen to have Python bindings out of the box, whereas other languages are only getting them now.

That and has replaced Java in many introduction to programming courses.

Which is good, when learning to programm performance isn't a concern as such.

I know Python since Zope was the only reason to use it, so around Python 1.5 or something.

Other than replacing what I used Perl for, regarding UNIX shell scripting, I never used Python in any scenario where performance might come into play.

There are plenty of options that beat Python's LOC, while providing an AOT/JIT toolchain out of the box.

I personally dislike the use of caching to increase performance. It is very easy to slap on caching and then the benchmarks say the problem is fixed but you will end up with unpredictability. You can no longer know how much memory your program is using and you don't know if a given function call is the source of a bottleneck or not. Your profiler will show a single hot function when the cache is empty but all the other calls that happen after caching become invisible.

There are some interesting things in here I wasn't aware of. That being said, you should really be timing individual functions by using line_profiler, otherwise even if you find a slow function you won't have any idea what part is making it slow. Often it's extremely counter intuitive. E.g. compiling regular expressions can be hundreds of times slower than executing them.

I’m currently working on a lib that allows choosing the best implementation of a method based on the current browser/os.

Performance varies wildly for basic coding decisions across platforms. Especially diff combinations of browser + os.

Im deciding on a name still, was thinking concepts like ‘popular’ from the song by Nada Surf, or photo finish (horse racing), or something like unfortunate/wheel of unfortune, poking fun at the need to have this lib.

Here's a messy example that shows this issue (try it in diff browsers).


>Generators are not inherently faster as they were made to allow for lazy computation, which saves memory rather than time. However, the saved memory can be cause for your program to actually run faster. How? Well, if you have large dataset and you don't use generators (iterators), then the data might overflow CPUs L1 cache, which will slow down lookup of values in memory significantly.

Can someone chime in about the L1 cache? The claim is made without measurements, so I am skeptical.

I think you'd be better of offloading the hot part to C++.


Honestly the quality bar for most things written is python is pretty low so anything that can help people improve is fine. So kudos to the author.

The only way to make Python programs "blazingly" fast is to not use the Python interpreter at all in the hot path.

Almost everything the Python interpreter does is ridiculously slow, even for an interpreted language. The language design[1] prevents fast implementations[2].

[1] Restricted subsets of Python do not count

[2] No, PyPy is not fast. It is slow, even for a JIT.

> The language design prevents fast implementations.

Apparently the fact that the complete world may change at every given moment, and every single operation requires method calls, doesn't impact the existence of reasonable good JIT compilers for Smalltalk, in fact they are the genesis for Java JITs.

Interesting post, but the code examples are completely unreadable on firefox + windows, because of the CSS color: #333 on .hljs class.


> Use Functions. This might seem counter intuitive, as calling function will put more stuff onto stack and create overhead from function returns, but it relates to previous point. If you just put your whole code into one file without putting it into function, it will be much slower because of global variables. Therefore you can speed up your code just by wrapping whole code in main function and calling it once, like so.

Wow, this is the one I couldn't expect. I always wrap the scripts in the main function out of pure perfectionism (or perhaps that's OCD) but the fact a script without it is going to run slower seems counter-intuitive and should really be among the first things taught.

> should really be among the first things taught.

No, it shouldn't. You don't teach a language by discussing micro-optimizations, especially when you're talking about Python.

Some of these optimisations are very similar to what you used to do in JavaScript with slower JS engines. Caching a value in a variable name than constantly accessing a property.

Blazingly fast? Not the words I would use..

Write CPython with an emphasis on C. Then get the speed gains you need.

Aren't there some thumb rules for writing fast python code?

Surprised PyPy was not mentioned.

never got pass the spinner. is this an insider joke? the spinner was pretty fast

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact