project 5: Fun With Diffusion Models!

Introduction

In this project I will implement and deploy diffusion models for image generation.

Part 0

I used the given sample code to produce images in different inference steps with seed 777. I use the same seed in the whole part A.

10 steps

50 steps

1.1 Forward process

I used the equation below here to add noise to campanile(given picture). Results below \[ x_t = \sqrt{\bar{\alpha}_t} x_0 + \sqrt{1 - \bar{\alpha}_t} \epsilon \quad \text{where} \quad \epsilon \sim \mathcal{N}(0, 1) \]

1.2 Classical Denoising

Here we're trying to denoise using gaussian filter blur

1.3 one step denoising

use the pretrained diffusion model to denoise the image

1.4 Implementing Iterative Denoising

tried to get much better result by denoising in steps to get a clear image. \[ x_{t'} = \frac{\sqrt{\bar{\alpha}_{t'} \beta_t}}{1 - \bar{\alpha}_t} x_0 + \frac{\sqrt{\alpha_t}(1 - \bar{\alpha}_{t'})}{1 - \bar{\alpha}_t} x_t + v_\sigma \]

Where:

Results are below

1.5 diffusion model sampling

did the same thing as in 1.4 but with random noise, results below

1.6 Classifier-Free Guidance (CFG)

computed both conditional and unconditional noise estimate, used this \[ \epsilon = \epsilon_u + \gamma (\epsilon_c - \epsilon_u) \] to get better results:

1.7 Image to Image Translation

take an image, add noise to it and then denoise it to get a bit different image:).

Hand Drawn and web images:

let's start with custom, original first, and then epochs

first doodle

second doodle

1.7.2 Inpainting

in this part we use a mask to apply a process to specific location of the image.

result:

result:

result:

1.7.3 Text-Conditional Image-to-Image Translation

Use a prompt to translate an image. The next one is rocket prompt on campanile

This one is rocket to panda

The last one - rocket prompt on sunflower from Plants vs Zombies

1.8 visual anagrams

This part the main idea is making different optical models. The first one is an image that looks like one thing when shown correctly, and shows a complete different thing when flipped. To do this, we denoise an image with two prompts(one upside down) at every step to get noise est., and then flip the flipped denoised image back and add it to the first one. To make more sense, here's the equation: \[\epsilon_1 = \text{UNet}(x_t, t, p_1) \] \[\epsilon_2 = \text{flip}(\text{UNet}(\text{flip}(x_t), t, p_2)) \] \[ \epsilon = (\epsilon_1 + \epsilon_2) / 2 \] The first visual illusion is an oil painting of old man x an oil painting of people around campfire

the second one is pen x rocket, which, I initally thought was a good mix. After a few runs, I realized it is not a good choice.

the third one is a man and a dog mix:

1.10 Hybrid images of my assignment is an illusion with distance, the first one is supposed to be a lithograph of waterfall to a lithograph of a skull, done like this: \[ \epsilon_1 = \text{UNet}(x_t, t, p_1) \] \[ \epsilon_2 = \text{UNet}(x_t, t, p_2) \] \[ \epsilon = f_\text{lowpass}(\epsilon_1) + f_\text{highpass}(\epsilon_2) \] Below are three of my tries on prompts: "waterfall" + "skull", "pen" + "skull", "amalfi cost" + "campfire"

Part B

in this part we implement, UNet, use it to train a denoiser, and then try to add time and class condition. Some info on UNet below:

This is what uncond UNet looks like. First, we implement noise algorithm though. Here's what the process looks like with different sigma values:

Then we train our uncond UNet on sigma=0.5, and here are the results after the first and fifth(last) epoch:

My training loss curve below:

I tried to use my trained model on different sigma values, and this is what I got:

Part 2: Training a Diffusion Model.

Here I added my time conditioning to UNet, here's the diagram for more understanding

Here are some results after epoch 5

epoch 20

training loss curve:

Trying out class conditioned

It was implemented so that specific digit could be generated. For this implementation, I added additional blocks and new value passed around - c(was given).

Here are some results after epoch 1

epoch 5

epoch 10

epoch 15

epoch 20

training loss curve:

I just want to say it looks like I messed up with sampling even though it doesn't give me much loss, and my model seems to work. Hope this makes sense and doesn't hurt my score much. I tried my best, thank you for your time and consideration! <3