# Normalizing Flows

#### Basic concept

Normalizing flow exploit the rule for change of variables. Normalizing flow begin with an initial distribution, and apply a sequence of K invertible transforms to formulate a new distribution.

e.g. Gaussians can be ‘deformed’ to fit complex data distributions. And it can be modeled as a small invertible neural net.

#### Autoregressive Models are NFs

Learns complex joint densities by decomposing the joint density into a product of one-dimensional conditional densities, where each $x_i$ depends on only the previous $i-1$ values:

$p(x)=\prod_{i} p\left(x_{i} | x_{1: i-1}\right)$

The conditional densities usually have learnable parameters. One example is an autoregressive density $p\left(x_{1: D}\right)$ whose conditional density is a univariate Gaussian, whose mean and standard deviations are computed by neural networks that depend on the previous $x_{1:i-1}$.

\begin{aligned} p\left(x_{i} | x_{1: i-1}\right) &=\mathcal{N}\left(x_{i} | \mu_{i},\left(\exp \alpha_{i}\right)^{2}\right) \\ \mu_{i} &=f_{\mu_{i}}\left(x_{1: i-1}\right) \\ \alpha_{i} &=f_{\alpha_{i}}\left(x_{1: i-1}\right) \end{aligned}

In this case, we’re assuming that the earlier variables don’t depend on later variables. This isn’t true for natural data.

To sample from the distribution, $D$ “noise variates” $u_i$ are computed from the standard Normal $N(0,1)$, then apply the recursion to get $x_{1:D}$:

$\begin{array}{c}{x_{i}=u_{i} \exp \alpha_{i}+\mu_{i}} \\ {u_{i} \sim \mathcal{N}(0,1)}\end{array}$

This autoregressive sampling is basically transforming the “noise variates” we sampled from a Normal distribution into a new distribution. We can then stack these deterministic transformations into a normalizing flow. This allows us to change the ordering of variables $x_1, \ldots x_D$ for each bijector in the flow, so that if one layer cannot model a distribution well, a subsequent layer might be able to do it.

Coming Soon