Grammatically composing network visualizations

written by Eric J. Ma on 2021-04-02 | tags: data visualization open source python matplotlib

This past month, I've been revamping the nxviz package, a package I created in graduate school to build network visualizations on top of NetworkX and Matplotlib. (You can see the work being done in this PR branch!)

It's been a bit neglected! The biggest reason, I think, is the lack of a grammar for composing network visualizations, which then leads to hairball being created. Sometimes the hairballs are beautiful; most of the time they are uninformative. Flowing from a lack of grammar means the architecture of the codebase is also a bit messy, with one example being annotations being too tightly coupled to a particular plot object.

Network visualization workflow

To revamp the nxviz API, I started by structuring the API around the following workflow for constructing graph visualizations. As such, I decided to refactor the API, structuring it around what I know about the grammar of graphics.

Firstly, because the core idea of nxviz is rational graph visualizations that prioritize node placements, the refactor started there, with node layout algorithms that group and sort nodes according to their properties.

Once we have node placement figured out, we can now proceed to node styling. Mapping data to aesthetic properties of glyphs is a core activity in data visualization; in nxviz, the styling that can be controlled by data are the node size (radius), color, and transparency.

Following that, we need algorithms for drawing edges. Do we simply draw lines? Or do we draw Bezier curves instead? If we draw curves, what are the anchor points that we need? Sane defaults are provided for each of the plot families in nxviz.

As with nodes, we then add in aesthetic styling that are mapped to a given edge's metadata. In nxviz, the edge styling that can be controlled by data are their line width, color, and transparency.

We then add in annotations onto the graph. Are there groups of nodes? If so, for each node layout type, we can add in group annotations in sane locations. Are colors mapped to a quantitative or qualitative variable? Legends or color bars can be annotated easily.

Finally, in some cases, it may be helpful to highlight a node, a particular edge, or edges attached to a node; we might use such highlights in a presentation, for example. The final part of the refactor was to add in functions for adding in each of these highlights in an easy fashion.

Summarized, the flow looks something like this:

  1. Layout nodes.
  2. Style nodes according to node properties.
  3. Connect nodes by drawing in edges.
  4. Style edges according to edge properties.
  5. Annotate visualization with auxiliary information.
  6. Add highlights.

Handling node layouts with cloned node axes

Some node layouts involve cloning the axis on which the nodes live. Examples of this include the matrix plot, in which we lay out the nodes on the x-axis, and then clone them onto the y-axis, canonically in the exact same order (though not always). These were much easier to write once the components underneath were laid out in a sane fashion.

Supporting data structures

Some data structures were immensely important in the API redesign, and turned out to be foundational for almost everything.

Every graph's node set can be represented as a data table. In the node table, nodes are the index, while the metadata are the columns. In the edge table, two special columns, source and target, indicate the nodes between which an edge exists, and the rest of the columns are the metadata.

A node position dictionary that maps node ID to (x, y) coordinates was also something extremely important. This is calculated by the node layout algorithm and used by the edge drawing algorithms and elsewhere in the library. Worrying about the positions of nodes is a page taken out of NetworkX's drawing functionality directly, and I have to give the developers much of the credit for that.

The global stateful nature of Matplotlib turned out to be a blessing in disguise. Rather than having a user pass around an axes object all of the time, in nxviz we simply assume that there is a globally available axes object.

Optimizing for composability and workflow

In redesigning the nxviz API, I wanted to make sure that the workflow of creating a network visualization, done in a principled fashion, was something that the API would support properly. As such, the API is organized basically around the workflow described above. A lesson I learn over and over in writing software is that once the workflow is clear, the API that's needed to support it also becomes quite clear. And when does the workflow become clear? Usually, that clarity comes when the important logical steps that are composable with one another are well-defined, and the core data structures that support that workflow are present.

Bringing a grammar to network visualization

The Grammar of Graphics, ggplot, Seaborn, Altair, and Holoviews were all inspirations for nxviz's functional and declarative API. Network visualization appears to be an under-developed area of the data visualization ecosystem, and I'm hoping that this grammar of network visualization helps bring clarity!

With thanks

I wanted to give a heartfelt shoutout to my Patreon and GitHub supporters, Hector, Carol, Brian, Fazal, Eddie, Rafael, and Mridul, who all have had an early preview of how the new nxviz API.