SciPy 2019 Network Analysis Made Simple
Title: Network Analysis Made Simple
Abstract:
Through the use of NetworkX's API, tutorial participants will learn about the basics of graph theory and its use in applied network science. Starting with a computationally-oriented definition of a graph and its associated methods, we will build out into progressively more advanced concepts (path and structure finding, and graph theory's relation to linear algebra), as well as an overview of scalable alternatives to NetworkX.
Keywords:
- data science
- network analysis
- network science
- graph theory
- graph analytics
Tutorial Topic: Applied Network Science
Detailed Description of Tutorial:
Content Updates from Prior Years:
- Updated last section to highlight the use of linear algebra in applied network science, particularly how linear algebra is used in deep learning applications, and how linear algebra can be used to accelerate certain graph operations.
- We will also discuss cuGraph as one possible alternative for scalability.
Outline:
Part 1: Introduction (30 min)
- Networks of all kinds: biological, transportation.
- Representation of networks, NetworkX data structures
- Basic quick-and-dirty visualizations
Part 2: Hubs and Paths (40 min)
- Finding important nodes; applications
- Pathfinding algorithms and their applications
- Hands-on: implementing path-finding algorithms
- Visualize degree and betweenness centrality distributions
Part 3: Cliques, Triangles & Structures (40 min)
- Definition of cliques
- Triangles as the simplest complex clique, applications
- Using path-finding algorithms to find structures in a graph
- Open triangles as recommender systems.
Part 4: Bipartite Graphs (30 min)
- Definition of bipartite graphs, applications
- Constructing bipartite graphs in NetworkX
- Summary statistics of bipartite graphs
Part 5: Linear Algebra and Graphs (40 min)
- Graphs as matrices: adjacency and node feature matrices
- Message passing operations and how it is used in graph deep learning
- Speed vs. code readability tradeoffs when using matrix operations
Student’s Python Knowledge Level: Intermediate
Short Bio + Teaching Experience:
Eric is a data scientist at the Novartis Institutes for Biomedical Research. There, he conducts biomedical data science research, with a focus on using Bayesian statistical methods in the service of making medicines for patients. Prior to Novartis, he was an Insight Health Data Fellow in the summer of 2017, and defended his doctoral thesis in the spring of 2017.
Eric is also an open source software developer, and has led the development of nxviz, a visualization package for NetworkX, and pyjanitor, a clean API for cleaning data in Python. In addition, he has made contributions to a range of open source tools, including PyMC3, matplotlib, bokeh, and CuPy.
His personal life motto is found in the Luke 12:48.
This tutorial has been taught at prior SciPy, PyData, and PyCon conferences.
- PyCon 2018: https://www.youtube.com/watch?v=HkbMUrgzwMs
- SciPy 2018: https://www.youtube.com/watch?v=K5xiFDClgjo
- PyData NYC 2015: https://www.youtube.com/watch?v=wcrwASR5DCQ
Setup Instructions:
All setup instructions can be found on the GitHub repository: https://github.com/ericmjl/Network-Analysis-Made-Simple
For those who do not want to fiddle with setup, a pre-built Binder is available.
Skills Needed:
NumPy
Matplotlib
Summary for Topic:
Have you ever wondered about how those data scientists at Facebook and LinkedIn make friend recommendations? Or how epidemiologists track down patient zero in an outbreak? If so, then this tutorial is for you. In this tutorial, we will use a variety of datasets to help you understand the fundamentals of network thinking, with a particular focus on constructing, summarizing, and visualizing complex networks. Finally, at the end of the tutorial, allow yourself to be entertained and hopefully also enlightened by the link between applied network science and linear algebra!
Network Analysis Made Simple