Generative Adversarial Networks
GANs (mainly in image synthesis)

Survey Papers / Repos

Resources

Models

Loss functions

Regularization

Architecture

Conditional GANs

Others

Tricks

Metrics (my implementation: lzhbrian/metrics)

  • Inception Score [1606.03498] [1801.01973]​
    • Assumption
      • MEANINGFUL: The generated image should be clear, the output probability of a classifier network should be [0.9, 0.05, ...] (largely skewed to a class).
        p(y∣x)p(y|\mathbf{x})
        is of low entropy.
      • DIVERSITY: If we have 10 classes, the generated image should be averagely distributed. So that the marginal distribution
        p(y)=1Nβˆ‘i=1Np(y∣x(i))p(y) = \frac{1}{N} \sum_{i=1}^{N} p(y|\mathbf{x}^{(i)})
        __is of high entropy.
      • Better models: KL Divergence of
        p(y∣x)p(y|\mathbf{x})
        and
        p(y)p(y)
        should be high.
    • Formulation
      • ​
        IS=exp⁑(Ex∼pgDKL[p(y∣x)∣∣p(y)])\text{IS} = \exp (\mathbb{E}_{\mathbf{x} \sim p_g} D_{KL} [p(y|\mathbf{x}) || p(y)] )
        ​
      • where
        • ​
          x\mathbf{x}
          is sampled from generated data
        • ​
          p(y∣x)​p(y|\mathbf{x})​
          is the output probability of Inception v3 when input is
          x​\mathbf{x}​
          ​
        • ​
          p(y)=1Nβˆ‘i=1Np(y∣x(i))p(y) = \frac{1}{N} \sum_{i=1}^{N} p(y|\mathbf{x}^{(i)})
          is the average output probability of all generated data (from InceptionV3, 1000-dim vector)
        • ​
          DKL(p∣∣q)=βˆ‘jpjlog⁑pjqjD_{KL} (\mathbf{p}||\mathbf{q}) = \sum_{j} p_{j} \log \frac{p_j}{q_j}
          , where
          jj
          is the dimension of the output probability.
    • Reference
  • FID Score [1706.08500]​
    • Formulation
      • ​
        FID=∣∣μrβˆ’ΞΌg∣∣2+Tr(Ξ£r+Ξ£gβˆ’2(Ξ£rΞ£g)1/2)​\text{FID} = ||\mu_r - \mu_g||^2 + Tr(\Sigma_{r} + \Sigma_{g} - 2(\Sigma_r \Sigma_g)^{1/2})​
        ​
      • where
        • ​
          TrTr
          is trace of a matrix (wikipedia)​
        • ​
          Xr∼N(μr,Σr)X_r \sim \mathcal{N}(\mu_r, \Sigma_r)
          and
          Xg∼N(μg,Σg)X_g \sim \mathcal{N}(\mu_g, \Sigma_g)
          are the 2048-dim activations the Inception v3 pool3 layer
        • ​
          ΞΌr\mu_r
          is the mean of real photo's feature
        • ​
          ΞΌg\mu_g
          is the mean of generated photo's feature
        • ​
          Ξ£r\Sigma_r
          is the covariance matrix of real photo's feature
        • ​
          Ξ£g\Sigma_g
          is the covariance matrix of generated photo's feature
    • Reference