Eric J. Ma

ericmjl

*PyData NYC 2017*

The Hype of Deep Learning:

— Ferenc HuszĂˇr (@fhuszar) November 23, 2017

1. Write a post with ML, AI or GAN in the title.

2. post appears at the top of hackernews (despite your best efforts)

3. HN drives tens of thousands of clicks

4. "what's with all the maths? show me pretty pics"

5. <=1% stay for longer than a minute

**I am out to solve Point 4.**

- Demystify Deep Learning
- Demystify Bayesian Deep Learning

Basically, *explain the intuition clearly with minimal jargon*.

Deep Learning is nothing more than **compositions of functions on matrices**.

Bayesian deep learning is grounded on **learning a probability distribution for each parameter**.

- Linear Regression 3 Ways
- Logistic Regression 3 Ways
- Deep Nets 3 Ways
- Going Bayesian
- Example Neural Network with PyMC3

Key Idea: Learn probability density over parameter space.

Probabilistic Programming in Python. Provides:

- statistical distributions
- sampling algorithms
- syntax

- UCI ML Repository: Covertype Dataset
**Input**: 66 cartographic variables**Output**: one of 7 forest cover types

` ````
import theano.tensor as tt # pymc devs are discussing new backends
import pymc3 as pm
n_hidden = 20
with pm.Model() as nn_model:
# Input -> Layer 1
weights_1 = pm.Normal('w_1', mu=0, sd=1,
shape=(ann_input.shape[1], n_hidden),
testval=init_1)
acts_1 = pm.Deterministic('activations_1',
tt.tanh(tt.dot(ann_input, weights_1)))
# Layer 1 -> Layer 2
weights_2 = pm.Normal('w_2', mu=0, sd=1,
shape=(n_hidden, n_hidden),
testval=init_2)
acts_2 = pm.Deterministic('activations_2',
tt.tanh(tt.dot(acts_1, weights_2)))
# Layer 2 -> Output Layer
weights_out = pm.Normal('w_out', mu=0, sd=1,
shape=(n_hidden, ann_output.shape[1]),
testval=init_out)
acts_out = pm.Deterministic('activations_out',
tt.nnet.softmax(tt.dot(acts_2, weights_out))) # noqa
# Define likelihood
out = pm.Multinomial('likelihood', n=1, p=acts_out,
observed=ann_output)
with nn_model:
s = theano.shared(pm.floatX(1.1))
inference = pm.ADVI(cost_part_grad_scale=s) # approximate inference done using ADVI
approx = pm.fit(100000, method=inference)
trace = approx.sample(5000)
```

*"point estimate"*

*"probabilistic estimate"*

*"with uncertainties!"*

Deep Learning is nothing more than **compositions of functions on matrices**.

Bayesian deep learning is grounded on **learning a probability distribution for each parameter**.

- David Duvenaud
- Michelle Fullwood
- Thomas Wiecki

- David MacKay
- Yarin Gal

Source: ericmjl/bayesian-deep-learning-demystified