written by Eric J. Ma on 2016-06-07 | tags: python programming optimization
I highlight the use of PyPy, numba
, scipy
, numpy
, and Cython as strategies for speeding up your Python code.
For scientists who are looking to take their Python code to the next level, options are abound. I’ve decided to put down in writing a short summary of what I’ve learned so far.
Disclaimers:
Now that the disclaimers are out of the way, here’s the shortlist, in no particular order:
Great if you have only pure Python code, have to deal with things in Python objects, and import other modules that are only pure Python.
My use cases: Agent-based simulations, re-implementing random draws from a variety of distributions.
Upsides: Speedups can be massive; in my hands anywhere between 8-100X. PyPy is a really simple drop-in replacement for the regular CPython that we’re all used to.
Downsides: Have to download an alternate Python interpreter. I have yet to successfully use PyPy as my jupyter
backend. Need to tweak environment variables to get pip
to install packages/modules such that they’re accessible by pypy
.
numpy
and scipy
Great if you’re writing things that are, by nature, arrays of stuff, with operations that can be vectorized across all rows (or columns).
My use cases: Matrix math.
Upsides: numpy
and scipy
are well-supported, community-favoured, thoroughly tested and all-round great standard scientific computing libraries to use. Easily installable via conda
!
Downsides: Very few, except that Python overhead is incurred every time a function is called, so one has to get used to avoid calling same function over and over in a for
loop.
numba
Exciting stuff from Continuum Analytics developer team! Basically a JIT compiler (not unlike PyPy) but for numeric code instead. Can speed up numpy
code, according to their claims.
My use cases: Matrix math and numeric code.
Upsides: Uses the CPython interpreter that we’re all used to. Speedups gained by simply adding a function decorator. Also easily installable by conda
!
Downsides: Doesn’t work with Python objects very well right now.
Only for the brave of heart! A simple way to continue writing Python-ish code that then gets compiled to C, which is blazing fast.
My use cases: Not sure yet. It is good, don’t get me wrong.
Upsides: I’ve done a head-to-head numpy
vs. Cython version of what I’m trying to do. 25X speed-up. And again, easily installable by conda
. (It’s probably clear by now that I’m a Continuum Analytics fanboi.)
Downside: Have to learn a new-ish language with some intricacies in order to really get the massive speed-ups. I initially could only write Cython code that didn’t break, but would end up with little speed-up (1-1.5X max). Going faster needed a lot more tweaking, probably lots more practice.
How did I pick up each of these? It was some mixture of following online tutorials, trying to modify examples myself, reading Stack Overflow, and coding with another advanced person who was willing to teach me stuff. I found the last one to be the most effective of them all, as the back-and-forth is great for learning.
Hope this post helps you decide what to try next for speeding up your code!
@article{
ericmjl-2016-how-code,
author = {Eric J. Ma},
title = {How to speed up scientific Python code},
year = {2016},
month = {06},
day = {07},
howpublished = {\url{https://ericmjl.github.io}},
journal = {Eric J. Ma's Blog},
url = {https://ericmjl.github.io/blog/2016/6/7/how-to-speed-up-scientific-python-code},
}
I send out a newsletter with tips and tools for data scientists. Come check it out at Substack.
If you would like to sponsor the coffee that goes into making my posts, please consider GitHub Sponsors!
Finally, I do free 30-minute GenAI strategy calls for teams that are looking to leverage GenAI for maximum impact. Consider booking a call on Calendly if you're interested!