Normalizing flow exploit the rule for change of variables. Normalizing flow begin with an initial distribution, and apply a sequence of K invertible transforms to formulate a new distribution.
e.g. Gaussians can be ‘deformed’ to fit complex data distributions. And it can be modeled as a small invertible neural net.
Learns complex joint densities by decomposing the joint density into a product of one-dimensional conditional densities, where each \(x_i\) depends on only the previous \(i-1\) values:
\[p(x)=\prod_{i} p\left(x_{i} | x_{1: i-1}\right)\]The conditional densities usually have learnable parameters. One example is an autoregressive density \(p\left(x_{1: D}\right)\) whose conditional density is a univariate Gaussian, whose mean and standard deviations are computed by neural networks that depend on the previous \(x_{1:i-1}\).
\[\begin{aligned} p\left(x_{i} | x_{1: i-1}\right) &=\mathcal{N}\left(x_{i} | \mu_{i},\left(\exp \alpha_{i}\right)^{2}\right) \\ \mu_{i} &=f_{\mu_{i}}\left(x_{1: i-1}\right) \\ \alpha_{i} &=f_{\alpha_{i}}\left(x_{1: i-1}\right) \end{aligned}\]In this case, we’re assuming that the earlier variables don’t depend on later variables. This isn’t true for natural data.
To sample from the distribution, \(D\) “noise variates” \(u_i\) are computed from the standard Normal \(N(0,1)\), then apply the recursion to get \(x_{1:D}\):
\[\begin{array}{c}{x_{i}=u_{i} \exp \alpha_{i}+\mu_{i}} \\ {u_{i} \sim \mathcal{N}(0,1)}\end{array}\]This autoregressive sampling is basically transforming the “noise variates” we sampled from a Normal distribution into a new distribution. We can then stack these deterministic transformations into a normalizing flow. This allows us to change the ordering of variables \(x_1, \ldots x_D\) for each bijector in the flow, so that if one layer cannot model a distribution well, a subsequent layer might be able to do it.
Coming Soon