Eric J Ma's Website

Principles of Network Visualization

written by Eric J. Ma on 2016-07-16 | tags: data visualization data science nxviz graph visualization network science

I use networks as part of my research, and visualization is a very important part of it. The problem with network visualizations is that most people default to node-link diagrams that end up looking like hairballs.

Thankfully, if we put some thought into visualizing networks rationally, we can get aesthetic, beautiful and simultaneously informative network visualizations that are much more appealing than hairballs. My time at SciPy 2016, hearing the keynote by Brian Granger about Altair, and talking with Aric Hagberg, has given me a ton of inspiration on how rational network visualizations can be done right.

What does it mean to be drawn "rationally"? I think it means following the principles of good data visualization - in which size, shape, location, colour and transparency are all used to convey meaning. As applied to network visualizations, the key is in placing and styling the nodes in some "rational" or "logical" fashion:

  1. Spatial location (i.e. (x, y) coordinates) follows some regular organization (e.g. in a shape, or along lines), based on grouping or ordering.
  2. Node colour follows some grouping, or ordering, or scaling. (Note - it is in that order - colour doesn’t work well for scale.)
  3. Highlights, such as borders, can be used to accentuate nodes of interest.

Once the (x, y) coordinates for each node are computed, drawing edges is a much easier matter. Meaning can be attached to edges by:

  1. Filtering out edges that are unnecessary to display.
  2. Adjusting line thickness to convey quantitative data.
  3. Colouring lines to convey grouping of edges.
  4. Highlights, such as using different line styles, thickness, or colour (as long as they are not used to convey other things), to accentuate edges of interest.

Finally, it may be better to show panels of subsets of the graph, rather than the whole graph itself, rather than shove the entire graph into one visualization. This is because extracting subsets of edges (and even just the relevant nodes) and displaying them on separate plots reduces visual clutter.

Hive Plots and Circos Plots are two examples of network visualizations done well, for they conform to these principles. They will both be available in the nxviz package that I will be co-developing, and I hope to use nxviz to highlight these details. I think there are other creative layouts as well, such as Arc Plots and Parallel Line Plots that can help us gain a more rational view of the structure of our network data.

From an API design perspective, the "grammar"-based API that specifies how things are coloured makes a ton of sense. For example, a HivePlot API may look as such:

h = HivePlot(nodes, edges, node_groupby=groupby_key, node_colorby=colorby_key, edge_colorby=colorby_key)

The user need not specify how individual nodes are coloured, as long as the groupby_key is present in the metadata. Likewise for node colours and edge colours.

Implementing this into nxviz, I think, is going to be a really challenging, fun and rewarding thing to do!

I send out a newsletter with tips and tools for data scientists. Come check it out at Substack.

If you would like to receive deeper, in-depth content as an early subscriber, come support me on Patreon!