An Attempt At Demystifying Bayesian Deep Learning

Eric J. Ma

ericmjl

PyData NYC 2017

• Demystify Deep Learning
• Demystify Bayesian Deep Learning

Basically, explain the intuition clearly with minimal jargon.

## Take-Home Point 1

Deep Learning is nothing more than compositions of functions on matrices. ## Take-Home Point 2

Bayesian deep learning is grounded on learning a probability distribution for each parameter. ## Outline

1. Linear Regression 3 Ways
2. Logistic Regression 3 Ways
3. Deep Nets 3 Ways
4. Going Bayesian
5. Example Neural Network with PyMC3

## Linear Regression

Linear Regression

Logistic Regression

Deep Neural Networks

Key Idea: Learn probability density over parameter space.

## Bayesian Linear Regression

Bayesian Linear Regression

Bayesian Logistic Regression

Bayesian Deep Nets

Probabilistic Programming in Python. Provides:

• statistical distributions
• sampling algorithms
• syntax

## Predict Forest Cover Type

### Problem Overview

• UCI ML Repository: Covertype Dataset
• Input: 66 cartographic variables
• Output: one of 7 forest cover types

### Network Architecture import theano.tensor as tt  # pymc devs are discussing new backends
import pymc3 as pm

n_hidden = 20

with pm.Model() as nn_model:
# Input -> Layer 1
weights_1 = pm.Normal('w_1', mu=0, sd=1,
shape=(ann_input.shape, n_hidden),
testval=init_1)
acts_1 = pm.Deterministic('activations_1',
tt.tanh(tt.dot(ann_input, weights_1)))

# Layer 1 -> Layer 2
weights_2 = pm.Normal('w_2', mu=0, sd=1,
shape=(n_hidden, n_hidden),
testval=init_2)
acts_2 = pm.Deterministic('activations_2',
tt.tanh(tt.dot(acts_1, weights_2)))

# Layer 2 -> Output Layer
weights_out = pm.Normal('w_out', mu=0, sd=1,
shape=(n_hidden, ann_output.shape),
testval=init_out)
acts_out = pm.Deterministic('activations_out',
tt.nnet.softmax(tt.dot(acts_2, weights_out)))  # noqa

# Define likelihood
out = pm.Multinomial('likelihood', n=1, p=acts_out,
observed=ann_output)

with nn_model:
s = theano.shared(pm.floatX(1.1))
approx = pm.fit(100000, method=inference)
trace = approx.sample(5000)



1st Layer Weights

2nd Layer Weights

Output Weights

Class Predictions "point estimate"

Class Probabilities "probabilistic estimate"

Class Uncertainties "with uncertainties!"

## Take-Home Point 1

Deep Learning is nothing more than compositions of functions on matrices. ## Take-Home Point 2

Bayesian deep learning is grounded on learning a probability distribution for each parameter.

Teachers

• David Duvenaud
• Michelle Fullwood
• Thomas Wiecki

• David MacKay
• Yarin Gal