Introduction
Following up on d-separation, my colleagues and I chatted about how to find the confounding set of variables in a causal graph. This is another graph search problem. Let's see how this can be applied.
from causality_notes import rule1, rule2, rule3, path_nodes
import networkx as nx
import matplotlib.pyplot as plt
%load_ext autoreload
%autoreload 2
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
From Judea Pearl's book, there is a diagram in chapter 4, Figure 4.7
. Let's reproduce it here.
G = nx.DiGraph()
edges = [
('D', 'A'), ('D', 'C'), ('F', 'C'),
('A', 'B'), ('C', 'B'), ('C', 'Y'),
('F', 'X'), ('F', 'Y'), ('C', 'E'),
('A', 'X'), ('E', 'X'), ('E', 'Y'),
('B', 'X'), ('X', 'Y'), ('G', 'X'),
('G', 'Y')
]
G.add_edges_from(edges)
pos = {
'D': (0, 0),
'A': (1, 0.5),
'C': (1, -1),
'F': (1, -2),
'B': (2, -0.3),
'E': (2, 1),
'X': (4, 0.5),
'G': (4.5, -2),
'Y': (5, 0.5)
}
nx.draw(G, pos=pos, with_labels=True)
To reveal the answer, the minimum confounding set is \{A, B, E, F, G\}.
What we would like to know is what is the set of confounders that we need to control for in order to correctly estimate the effect of X on Y.
To do this, we use the following logic:
- Find all undirected paths between X and Y.
- Traverse each node in the undirected paths.
- Check to see if, in the directed graph, the node blocks the path between X and Y if it were in the conditioning set.
- If yes, then it should be included as a confounder. Break out and continue on to next path.
- If no, it should not be included as a confounder.
Gpath = G.to_undirected()
confounders = set()
n1 = 'X'
n2 = 'Y'
for i, path in enumerate(nx.all_simple_paths(Gpath, n1, n2)):
for n in path:
if n is not n1 and n is not n2:
pass1 = rule1(n, [n], G, path)
pass2 = rule2(n, [n], G, path)
pass3 = rule3(n, [], G, path)
if pass1 or pass2 or pass3:
confounders.add(n)
# We break, because as soon as we find a good
# blocking node, there is no need to continue
# looking at other nodes.
break
confounders
We did it!