Before We Start!

  1. Look at the instructions on the whiteboard.
  2. Github repository for these notebooks: github.com/ericmjl/Network-Analysis-Made-Simple/
    1. Please clone the repository if you'd like to do the hands-on coding activities.
  3. Some of the coding activities are going to be hard! Be ready to discuss the problem with your fellow Pythonistas, or even better, pair code.
  4. If you are using legacy Python, you may wish to pair up.

Quiz!

In the list comprehension:

[s for s in my_fav_things if s[‘name’] == ‘raindrops on roses’]

What are a plausible data structure for s and my_fav_things?

In [1]:
my_fav_things = []
my_fav_things.append({'name': 'raindrops on roses', 'line': 1})
my_fav_things.append({'name': 'whiskers on kittens', 'line': 1})
my_fav_things.append({'name': 'bright copper kettles', 'line': 2})

[s for s in my_fav_things if s['name'] == 'raindrops on roses']
Out[1]:
[{'line': 1, 'name': 'raindrops on roses'}]

Prerequisites

Use the checkenv.py script provided in the repository to determine whether you need to install any new dependencies. You may do so while we quickly go through some background information.

Network Basics

All your relational problems are belong to networks.

Networks, a.k.a. graphs, are an immensely useful modelling tool to model complex relational problems.

Networks are comprised of two main entities:

  • Nodes: commonly represented as circles. In the academic literature, nodes are also known as "vertices".
  • Edges: commonly represented as lines between circles

Edges denote relationships between the nodes.

The heart of a graph lies in its edges, not in its nodes. (John Quackenbush, Harvard School of Public Health)

In a network, if two nodes are joined together by an edge, then they are neighbors of one another.

There are generally two types of networks - directed and undirected. In undirected networks, edges do not have a directionality associated with them. In directed networks, they do.

Examples of Networks

  1. Facebook's network: Individuals are nodes, edges are drawn between individuals who are FB friends with one another. undirected network.
  2. Air traffic network: Airports are nodes, flights between airports are the edges. directed network.

Can you think of any others?

Take-Homes

It is my hope that when you leave this tutorial, practically, you will be equipped to:

  • Use NetworkX to construct graphs in the Jupyter environment.
  • Visualize network data using node-link diagrams, heat maps, Circos plots and Hive plots.
  • Write basic algorithms to find structures and paths in a graph.
  • Compute network statistics.

Take-Homes (cont'd)

From a broader perspective, I hope you will be able to:

  • Think in terms of "interactions" between entities, and not just think about the entities themselves.
  • Think through statistical problems in network analysis.

Tutorial Format

  • Student notebooks for coding exercises.
  • Instructor versions for reference.
  • Feel free to skip ahead of myself if I'm too slow for you.

Credits

Much of this work is inspired by Prof. Allen Downey (Olin College of Engineering) and Prof. Jukka-Pekka Onnela (Harvard School of Public Health).

Statistics methods are inspired by Dr. Jake Vanderplas, UW.

Hive and Circos Plots' original inventor is Martin Krzywinsky of the BC Genome Sciences Center.

Circos plots were implemented with help from Justin Zabilansky (MIT).

Many thanks to the PyCon Rehearsal class for providing feedback on the material prior to PyCon 2015.

Thank you all who attended actual iterations of this tutorial, at

  • SciPy 2016 & 2017 (Austin, TX)
  • PyCon 2016 & 2017 (Portland, OR)
  • PyCon 2015 (Montreal)
  • Data Science for Social Good (Boston)
  • PyData NYC 2015 (New York City)

The Data

In this tutorial, we have a number of data sets that have been downloaded from the Konect network analysis repository.

In [2]: