Normalizing flow exploit the rule for change of variables. Normalizing flow begin with an initial distribution, and apply a sequence of K invertible transforms to formulate a new distribution.
e.g. Gaussians can be ‘deformed’ to fit complex data distributions. And it can be modeled as a small invertible neural net.
Learns complex joint densities by decomposing the joint density into a product of one-dimensional conditional densities, where each depends on only the previous values:
The conditional densities usually have learnable parameters. One example is an autoregressive density whose conditional density is a univariate Gaussian, whose mean and standard deviations are computed by neural networks that depend on the previous .
In this case, we’re assuming that the earlier variables don’t depend on later variables. This isn’t true for natural data.
To sample from the distribution, “noise variates” are computed from the standard Normal , then apply the recursion to get :
This autoregressive sampling is basically transforming the “noise variates” we sampled from a Normal distribution into a new distribution. We can then stack these deterministic transformations into a normalizing flow. This allows us to change the ordering of variables for each bijector in the flow, so that if one layer cannot model a distribution well, a subsequent layer might be able to do it.