written by Eric J. Ma on 2019-05-11 | tags: pycon software development sprint open source
This year was the first year that I decided to lead a sprint! The sprint I led was for
pyjanitor, a package that I developed with my colleague, Zach Barry, and a remote collaborator in NYC, Sam Zuckerman (whom I've never met in person!). This being the first sprint I've ever led, I think I was lucky to stumble upon a few ideas that made for a productive, collaborative, and most importantly, fun sprint.
I'm going to deliver a talk on
pyjanitor later in the year, so I'll save the details for that talk. The short description of
pyjanitor is that if you have
pandas one-liners that are difficult to remember, they should become a function in
pyjanitor; if you have a 10-liner that you always copy/paste from another source, they should become a function in
Code sprints are a part of PyCon, and it's basically one to four days of intense and focused software development on a single project.
Project sprint leads first pitch their projects at the "Sprintros", where they indicate what days they will be sprinting, and at what times. The next day, we indicate on which rooms our projects will be sprinting. Volunteers from the conference, who would like to make an open source project contribution, then identify projects they would like to come sprint with.
In some senses, there's no way for a sprint leader to know how popular their sprint will be a priori. We have to be prepared to handle a range of scenarios from sprinting with just one other person to sprinting with a crowd.
In preparation for the sprint, I absorbed many lessons learned over the years of sprinting on others' projects.
The most obvious one was to ensure that every sprinter had something to do right from the get-go. Having a task from the get-go keeps sprinters, especially newcomers, engaged right from the beginning. This motivated the requirement to make a doc fix before making a code fix. (You can read more below on how we made this happen.) I wrote out this requirement in a number of places, and by the time the sprint day rolled by, this rolled off pretty naturally.
The second thing that I did to prep was to triage existing issues and label them as being beginner- or intermediate-friendly, and whether they were doc, infrastructure, or code contributions.
Those two things were the my highest priority preparation for the sprint, and I think that helped a ton.
This sprint, I gave the structure some thought, and settled on the following: Before making a code contribution, I required a docs contribution.
Docs contributions could be of any scale:
I think this worked well for the following reasons:
pyjanitoron docs is an egalitarian way of on-boarding every newcomer, beginner and experienced individuals alike. Nobody gets special treatment.
For each individual's contribution, I asked them to first raise an issue on the GitHub issue tracker describing the contribution that they would like to make, and then clearly indicate in the comments that they would like to work on it. Then, they would go through the process of doing the documentation fix, from forking the repository, cloning it locally, creating a new branch, making edits, committing, pushing, and PR-ing.
If two people accidentally ended up working on the same docs issue, I would assess the effort of the later one, and if it was substantial enough, I would allow them to consider it done, and move onto a different issue.
Going forth, as the group of contributors expands, I will enforce this "docs-first" requirement only for newcomer sprinters, and request experienced ones to help manage the process.
Once the docs contributions were done, sprinters were free to either continue with more docs contributions, or provide a code contribution.
Code contributions could be of one of the following:
The process for doing this was identical to docs: raise an issue, claim it, and then make a new branch with edits, and finally PR it.
For both days, we had more than 10 people sprint on
pyjanitor. Many who were present on the 1st day (and didn't have a flight to catch) came back on the 2nd day. As such, I actually didn't get much coding done. Instead, I took on the following roles:
If I get to sprint with other experienced contributors at the sprints, I would definitely like to have some help with the above.
I tried out a few ideas, which I hope made the sprints just that little bit more human-friendly.
I think the applause is the most encouraging part of the process. Having struggled through a PR, however big or small, and having group recognition for that effort, is super encouraging, especially for first-time contributors. I think we need to encourage this more at sprints.
The only things about the pyjanitor project that I'm unwilling to give up on are: good documentation of what a function does, that a function should do one thing well, and that it be method-chainable. Everything else, including functionality, is an open discussion that we can have!
One PR I particularly enjoyed was that from Lucas, who PR'd in a logo for the project on the docs page. He had the idea to take the hacky broomstick I drew on the big sticky note (as a makeshift logo), redraw it as a vector graphic in Inkscape, and PR it in as the (current) official logo on the docs.
More broadly, I deferred to the sprinters' opinions on docs, because I recognized that I'd have old eyes on the docs, and wouldn't be able to easily identify places where the docs could be written more clearly. Eventually, a small, self-organizing squad of 3-5 sprinters ended up becoming the unofficial docs squad, rearranging the structure of the docs, building automation around it, and better organizing and summarizing the information on the docs.
In more than a few places, if there were a well-justified choice for the API (which really meant naming the functions and keyword arguments), I'd be more than happy to see the PR happen. Even if it is evolved away later, the present codebase and PRs that led to it provided the substrate for better evolution of the API!
This year, I switched from Travis CI to Azure Pipelines. In particular, I was attracted to the ability to build on all three major operating systems, Windows, macOS, and Linux, on the free tier.
Microsoft had a booth at PyCon, in which Steve Dowell led an initiative to get us set up with Azure-related tools. Indeed, as a major sponsor of the conference, this was one of the best swag given to us. Super practical, relationship- and goodwill-building. Definitely not lesser than the Adafruit lunchboxes with electronics as swag!
Naturally, not everything was smooth sailing throughout. I did find myself a tad expressing myself in an irate fashion at times with the amount of context switching that I was doing, especially switching between talking to different sprinters one after another. (I am very used to long stretches hacking on one problem.) One thing future sprinters could help with, which I will document, is to give me enough ramp-up context around their problem, so that I can quickly pinpoint what other information I might need.
The other not-so-smooth-sailing thing was finding out that Azure sometimes did not catch errors in a script block! My unproven hypothesis at this point is that if I have four commands executed in a script block, and if any of the first three fail but the last one passes, the entire script block will behave as if it passes. This probably stems from the build system looking at only the last exit code to determine exit status. Eventually, after splitting each check into individual steps, linting and testing errors started getting caught automatically! (Automating this is much preferred to me running the
black code formatter in my head.)
Though the above issue is fixed, I think I am still having issues getting
black to work on the Windows builds. Definitely looking forward to hearing from Azure devs what could be done here!
I'm sure there's ways I could have made the sprint a bit better. I'd love to hear them if there's something I've missed! Please feel free to comment below.
I would like to thank all the sprinters who joined in this sprint. Their GitHub handles are below:
And as always, big thanks to my collaborators on the repository:
I send out a newsletter with tips and tools for data scientists. Come check it out at Substack.
If you would like to receive deeper, in-depth content as an early subscriber, come support me on Patreon!