Chapter 1: Introduction to Graphs
from IPython.display import YouTubeVideo YouTubeVideo(id="k4KHoLC7TFE", width="100%")
In our world, networks are an immensely useful data modelling tool to model complex relational problems. Building on top of a network-oriented data model, they have been put to great use in a wide variety of settings.
A formal definition of networks
Before we explore examples of networks, we want to first give you a more formal definition of what networks are. The reason is that knowing a formal definition helps us refine our application of networks. So bear with me for a moment.
In the slightly more academic literature, networks are more formally referred to as graphs.
Graphs are comprised of two sets of objects:
- A node set: the "entities" in a graph.
- An edge set: the record of "relationships" between the entities in the graph.
For example, if a node set n is comprised of elements:
Then, the edge set e would be represented as tuples of pairs of elements:
If you extracted every node from the edge set e, it should form at least a subset of the node set n. (It is at least a subset because not every node in n might participate in an edge.)
If you draw out a network, the "nodes" are commonly represented as shapes, such as circles, while the "edges" are the lines between the shapes.
Examples of Networks
Now that we have a proper definition of a graph, let's move on to explore examples of graphs.
One example I (Eric Ma) am fond of, based on my background as a biologist, is a protein-protein interaction network. Here, the graph can be defined in the following way:
- nodes/entities are the proteins,
- edges/relationships are defined as "one protein is known to bind with another".
A more colloquial example of networks is an air transportation network. Here, the graph can be defined in the following way:
- nodes/entities are airports
- edges/relationships are defined as "at least one flight carrier flies between the airports".
And another even more relatable example would be our ever-prevalent social networks! With Twitter, the graph can be defined in the following way:
- nodes/entities are individual users
- edges/relationships are defined as "one user has decided to follow another".
Now that you've seen the framework for defining a graph, we'd like to invite you to answer the following question: What examples of networks have you seen before in your profession?
Go ahead and list it out.
Types of Graphs
As you probably can see, graphs are a really flexible data model for modelling the world, as long as the nodes and edges are strictly defined. (If the nodes and edges are sloppily defined, well, we run into a lot of interpretability problems later on.)
If you are a member of both LinkedIn and Twitter, you might intuitively think that there's a slight difference in the structure of the two "social graphs". You'd be absolutely correct on that count!
Twitter is an example of what we would intuitively call a directed graph. Why is this so? The key here lies in how interactions are modelled. One user can follow another, but the other need not necessarily follow back. As such, there is a directionality to the relationship.
LinkedIn is an example of what we would intuitively call an undirected graph. Why is this so? The key here is that when two users are LinkedIn connections, we automatically assign a bi-directional edge between them. As such, for convenience, we can collapse the bi-directional edge into an undirected edge, thus yielding an undirected graph.
If we wanted to turn LinkedIn into a directed graph, we might want to keep information on who initiated the invitation. In that way, the relationship is automatically bi-directional.
Edges define the interesting part of a graph
While in graduate school, I (Eric Ma) once sat in a seminar organized by one of the professors on my thesis committee. The speaker that day was John Quackenbush, a faculty member of the Harvard School of Public Health. While the topic of the day remained fuzzy in my memory, one quote stood out:
The heart of a graph lies in its edges, not in its nodes. (John Quackenbush, Harvard School of Public Health)
Indeed, this is a key point to remember! Without edges, the nodes are merely collections of entities. In a data table, they would correspond to the rows. That alone can be interesting, but doesn't yield relational insights between the entities.