# Normalizing Flows

#### Basic concept

Normalizing flow exploit the rule for change of variables. Normalizing flow begin with an initial distribution, and apply a sequence of K invertible transforms to formulate a new distribution.

e.g. Gaussians can be ‘deformed’ to fit complex data distributions. And it can be modeled as a small invertible neural net.

#### Autoregressive Models are NFs

Learns complex joint densities by decomposing the joint density into a product of one-dimensional conditional densities, where each $x_i$ depends on only the previous $i-1$ values:

The conditional densities usually have learnable parameters. One example is an autoregressive density $p\left(x_{1: D}\right)$ whose conditional density is a univariate Gaussian, whose mean and standard deviations are computed by neural networks that depend on the previous $x_{1:i-1}$.

In this case, we’re assuming that the earlier variables don’t depend on later variables. This isn’t true for natural data.

To sample from the distribution, $D$ “noise variates” $u_i$ are computed from the standard Normal $N(0,1)$, then apply the recursion to get $x_{1:D}$:

This autoregressive sampling is basically transforming the “noise variates” we sampled from a Normal distribution into a new distribution. We can then stack these deterministic transformations into a normalizing flow. This allows us to change the ordering of variables $x_1, \ldots x_D$ for each bijector in the flow, so that if one layer cannot model a distribution well, a subsequent layer might be able to do it.

Coming Soon