# Chapter 1: Introduction to Graphs

## Introduction

```
from IPython.display import YouTubeVideo
YouTubeVideo(id="k4KHoLC7TFE", width="100%")
```

In our world, networks are an immensely useful *data modelling tool*
to model complex *relational* problems.
Building on top of a network-oriented data model,
they have been put to great use in a wide variety of settings.

## A *formal* definition of networks

Before we explore examples of networks,
we want to first give you a more formal definition
of what networks are.
The reason is that knowing a *formal* definition
helps us refine our application of networks.
So bear with me for a moment.

In the slightly more academic literature,
networks are more formally referred to as **graphs**.

Graphs are comprised of two *sets* of objects:

- A
**node set**: the "entities" in a graph. - An
**edge set**: the record of "relationships" between the entities in the graph.

For example, if a **node set** n is comprised of elements:

Then, the **edge set** e would be represented as tuples of *pairs* of elements:

If you extracted every node from the edge set e,
it should form *at least a subset* of the node set n.
(It is at least a subset because not every node in n might participate in an edge.)

If you draw out a network, the "nodes" are commonly represented as shapes, such as circles, while the "edges" are the lines between the shapes.

## Examples of Networks

Now that we have a proper definition of a graph, let's move on to explore examples of graphs.

One example I (Eric Ma) am fond of, based on my background as a biologist, is a protein-protein interaction network. Here, the graph can be defined in the following way:

- nodes/entities are the proteins,
- edges/relationships are defined as "one protein is known to bind with another".

A more colloquial example of networks is an air transportation network. Here, the graph can be defined in the following way:

- nodes/entities are airports
- edges/relationships are defined as "at least one flight carrier flies between the airports".

And another even more relatable example would be our ever-prevalent social networks! With Twitter, the graph can be defined in the following way:

- nodes/entities are individual users
- edges/relationships are defined as "one user has decided to follow another".

Now that you've seen the framework for defining a graph,
we'd like to invite you to answer the following question:
**What examples of networks have you seen before in your profession?**

Go ahead and list it out.

## Types of Graphs

As you probably can see, graphs are a really flexible data model
for modelling the world,
as long as the nodes and edges are strictly defined.
(If the nodes and edges are *sloppily* defined,
well, we run into a lot of interpretability problems later on.)

If you are a member of both LinkedIn and Twitter,
you might intuitively think that there's a *slight* difference
in the structure of the two "social graphs".
You'd be absolutely correct on that count!

Twitter is an example of what we would intuitively call a **directed** graph.
Why is this so?
The key here lies in how interactions are modelled.
One user can follow another, but the other need not necessarily follow back.
As such, there is a *directionality* to the relationship.

LinkedIn is an example of what we would intuitively call an **undirected** graph.
Why is this so?
The key here is that when two users are LinkedIn connections,
we *automatically* assign a bi-directional edge between them.
As such, for convenience, we can collapse the bi-directional edge
into an *undirected* edge,
thus yielding an undirected graph.

If we wanted to turn LinkedIn into a directed graph, we might want to keep information on who initiated the invitation. In that way, the relationship is automatically bi-directional.

## Edges define the interesting part of a graph

While in graduate school, I (Eric Ma) once sat in a seminar organized by one of the professors on my thesis committee. The speaker that day was John Quackenbush, a faculty member of the Harvard School of Public Health. While the topic of the day remained fuzzy in my memory, one quote stood out:

The heart of a graph lies in its edges, not in its nodes. (John Quackenbush, Harvard School of Public Health)

Indeed, this is a key point to remember!
Without edges, the nodes are merely collections of entities.
In a data table, they would correspond to the rows.
That alone can be interesting,
but doesn't yield *relational insights* between the entities.