Skip to content

Binder

Introduction

In this notebook, we will take a quick look at the "collider" effect.

Let's say we have the following causal graph:

a \rightarrow b \leftarrow c

Apparently, if we "condition" on b, then a and c will be correlated, even though they are independent.

import numpy as np
from causality_notes import noise
import pandas as pd
import seaborn as sns

%load_ext autoreload
%autoreload 2
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

Generate Data

Let's assume we have a causal model that follows the equations below:

a \sim N(0, 1)$$ $$c \sim N(0, 1)$$ $$b = 20a - 20c

This is expressed in the code below.

size = 1000
a = noise(size)
c = noise(size)
b = 20*a - 20*c + noise(size)

We now make it into a pandas DataFrame.

df = pd.DataFrame({'a': a, 'b': b, 'c': c})

Let's view a pair plot to see the pairwise correlation (dependency) between the variables.

sns.pairplot(df)
<seaborn.axisgrid.PairGrid at 0x7f3e54047ee0>

Ok, as shown in the causal graph, a and c are independent of one another, and so distributionally, there's no trend between them.

Conditioning

When we "condition" on a variable, remember that we are essentially taking a "slice" of a variable, and seeing what the distributions for the other variables are. I illustrated this on my blog.

In our problem, this means that we have to slice out a range of the values of b:

df_new = df[(df['b'] < df['b'].mean()) & (df['b'] > np.percentile(df['b'], 25))]

Now, let's visualize the relationship between a and c, now conditioned on b.

ax = df_new.plot(kind='scatter', x='a', y='c')
ax.set_aspect('equal')
ax.set_title('conditioned on b')
Text(0.5, 1.0, 'conditioned on b')

We can also look at the full joint distribution of a and c, colouring b to illustrate what would happen if we conditioned on particular values of b.

ax = sns.scatterplot(data=df, x='a', y='c', hue='b')
ax.set_aspect('equal')

Conclusion

Here, we see that in a collider situation, if we condition on the child variable, the parents will be unduly correlated.