## An Attempt At Demystifying Bayesian Deep Learning

Eric J. Ma

ericmjl

PyData NYC 2017

### On your laptop

https://ericmjl.github.io/bayesian-deep-learning-demystified

I am out to solve Point 4.

## My (Modest) Goals

• Demystify Deep Learning
• Demystify Bayesian Deep Learning

Basically, explain the intuition clearly with minimal jargon.

## Take-Home Point 1

Deep Learning is nothing more than compositions of functions on matrices.

## Take-Home Point 2

Bayesian deep learning is grounded on learning a probability distribution for each parameter.

## Outline

1. Linear Regression 3 Ways
2. Logistic Regression 3 Ways
3. Deep Nets 3 Ways
4. Going Bayesian
5. Example Neural Network with PyMC3

## Going Bayesian

Key Idea: Learn probability density over parameter space.

## Cheat Sheet

Probabilistic Programming in Python. Provides:

• statistical distributions
• sampling algorithms
• syntax

## Predict Forest Cover Type

### Problem Overview

• UCI ML Repository: Covertype Dataset
• Input: 66 cartographic variables
• Output: one of 7 forest cover types

### Network Architecture

                            
import theano.tensor as tt  # pymc devs are discussing new backends
import pymc3 as pm

n_hidden = 20

with pm.Model() as nn_model:
# Input -> Layer 1
weights_1 = pm.Normal('w_1', mu=0, sd=1,
shape=(ann_input.shape[1], n_hidden),
testval=init_1)
acts_1 = pm.Deterministic('activations_1',
tt.tanh(tt.dot(ann_input, weights_1)))

# Layer 1 -> Layer 2
weights_2 = pm.Normal('w_2', mu=0, sd=1,
shape=(n_hidden, n_hidden),
testval=init_2)
acts_2 = pm.Deterministic('activations_2',
tt.tanh(tt.dot(acts_1, weights_2)))

# Layer 2 -> Output Layer
weights_out = pm.Normal('w_out', mu=0, sd=1,
shape=(n_hidden, ann_output.shape[1]),
testval=init_out)
acts_out = pm.Deterministic('activations_out',
tt.nnet.softmax(tt.dot(acts_2, weights_out)))  # noqa

# Define likelihood
out = pm.Multinomial('likelihood', n=1, p=acts_out,
observed=ann_output)

with nn_model:
s = theano.shared(pm.floatX(1.1))
approx = pm.fit(100000, method=inference)
trace = approx.sample(5000)



"point estimate"

### Class Probabilities

"probabilistic estimate"

### Class Uncertainties

"with uncertainties!"

## Take-Home Point 1

Deep Learning is nothing more than compositions of functions on matrices.

## Take-Home Point 2

Bayesian deep learning is grounded on learning a probability distribution for each parameter.

## Teachers

• David Duvenaud
• Michelle Fullwood
• Thomas Wiecki

• David MacKay
• Yarin Gal