吳恩達(dá)DeepLearning.ai之生成對(duì)抗網(wǎng)絡(luò)(GANS)專業(yè)化〔Andre

Generative models
- Variational Autoencoders
- Encoder-Latent Space-Decoder
- Generative Adversarial Networks
- Generator: learn to produce realistic examples
- Discriminator: distinguish between fake and real
Discriminator
basically, a neural networks classifier

compare the Y^hat with Label Y.
P(Y|X)

Generator

Use noises, put noises into generator, and get X^\hat, then put X^\hat into discriminator, to Y^\hat_d.
Use the Y^\hat the difference between... to update the parameters of generator.

save the parameters

P(X|Y), probability of features given class Y.
Since, we only care about one specific label each time, so it is
P(X|Y)=P(X) here. we can ignore the Y here.
BCE cost function.
BCE stands for Binary Cross Entropy Loss equation by parts.

summary over the entire batch, where the summation from i=1 to m indicates
h means the prediction?
y is the label,
theta is the parameter,
x is the features.

first item:

if the true y is fake, the value of y is 0.
Then, no matter what the prediction is, the first item is 0.
If the true y is real, then if the prediction has a high probability say 0.99 to be real, the value of the first item is going to be 0.
However, if the prediction is close to 0, then the product of first item is going to be negative infinity.
Hence, negative infinity here indicates bad result.
if the prediction is good,
if the prediction is bad, it goes to -\infinity
second item:

If the prediction is really bad, the value goes to negative infinity.
similarly, negative infinity indicates bad prediction.



for discriminator, pass X^\hat and real X into the discriminator, then BCE.
update the \theta_d (parameter for the discriminator)
want to know the difference between fake and real
generator want the fake things to be as real as possible.

activations
non-linear differential


ReLu -dying ReLU when it is negative, it is always 0. loss the information.
Leaky ReLU solves the problem
max(az,z)
a = 0.1
so it is not compare to 0, but a small value.
Sigmoid/Tanh-- vanish gradient and saturation problem
batch normalization reduce the the covariate shift.
easier to train and speed the training process.
norm for training

norm for test fixed values.



10 modes for numbers

it will converge to 1 mode..that's the problem.
vanishing gradients











controllable generation control some of the features ...





take advantage of pre-trained classifier.

