For scientists who are looking to take their Python code to the next level, options are abound. I’ve decided to put down in writing a short summary of what I’ve learned so far.
Now that the disclaimers are out of the way, here’s the shortlist, in no particular order:
Great if you have only pure Python code, have to deal with things in Python objects, and import other modules that are only pure Python.
My use cases: Agent-based simulations, re-implementing random draws from a variety of distributions.
Upsides: Speedups can be massive; in my hands anywhere between 8-100X. PyPy is a really simple drop-in replacement for the regular CPython that we’re all used to.
Downsides: Have to download an alternate Python interpreter. I have yet to successfully use PyPy as my
jupyter backend. Need to tweak environment variables to get
pip to install packages/modules such that they’re accessible by
Great if you’re writing things that are, by nature, arrays of stuff, with operations that can be vectorized across all rows (or columns).
My use cases: Matrix math.
scipy are well-supported, community-favoured, thoroughly tested and all-round great standard scientific computing libraries to use. Easily installable via
Downsides: Very few, except that Python overhead is incurred every time a function is called, so one has to get used to avoid calling same function over and over in a
Exciting stuff from Continuum Analytics developer team! Basically a JIT compiler (not unlike PyPy) but for numeric code instead. Can speed up
numpy code, according to their claims.
My use cases: Matrix math and numeric code.
Upsides: Uses the CPython interpreter that we’re all used to. Speedups gained by simply adding a function decorator. Also easily installable by
Downsides: Doesn’t work with Python objects very well right now.
Only for the brave of heart! A simple way to continue writing Python-ish code that then gets compiled to C, which is blazing fast.
My use cases: Not sure yet. It is good, don’t get me wrong.
Upsides: I’ve done a head-to-head
numpy vs. Cython version of what I’m trying to do. 25X speed-up. And again, easily installable by
conda. (It’s probably clear by now that I’m a Continuum Analytics fanboi.)
Downside: Have to learn a new-ish language with some intricacies in order to really get the massive speed-ups. I initially could only write Cython code that didn’t break, but would end up with little speed-up (1-1.5X max). Going faster needed a lot more tweaking, probably lots more practice.
How did I pick up each of these? It was some mixture of following online tutorials, trying to modify examples myself, reading Stack Overflow, and coding with another advanced person who was willing to teach me stuff. I found the last one to be the most effective of them all, as the back-and-forth is great for learning.
Hope this post helps you decide what to try next for speeding up your code!