Make-A-Monet: Image Style Transfer With Cycle GANs

DataRes at UCLA
8 min readJul 1, 2021

By: Colin Curtis, Adhvaith Vijay

Try out the web application for the project here: https://make-a-monet.herokuapp.com/

Introduction

Here at Research@DataRes we always try to push the limits of deep learning and keep up to date with all the developments in the field. While there is no doubt that the advancements in the academic side of recent ML research is very impressive, there is little real world application for these models unless they can be deployed in a production environment. This is why for our latest project we decided to make sure we can integrate our final model with a deployable application that anyone can use.

Computer Vision (CV) has without a doubt experienced a renaissance during the past decade, and perhaps the Generative Adversarial Network (GAN) is one of the most captivating examples of modern deep learning. The idea is quite intuitive: if we want to generate images that match our dataset, then we can have a generator network that takes random noise and makes an image, and a discriminator attempts to distinguish between the generated image and the real image. The magic of backpropagation ensures that, in theory, the generator produces better images to the point that the generated and real images are indistinguishable.

For our project we decided to focus on an interesting area of GAN research called image style transfer. This area examines how we can take a photograph, for example, and transform the image with the styles of artists. Specifically, for our end goal we wanted a GAN that could, given a photograph, could embellish that photo with the style of a Monet painting. Luckily, the Cycle GAN can do just this: it can translate between two image domains (Monet paintings and photos in our case) by having two generators and two discriminators. Most importantly, since these models are each separate networks, we can save the photo-to-painting generator at the end of training to use in our web application.

The Cycle GAN

Since our goal was to produce Monet paintings, we needed lots of paintings. We eventually found that in the monet2photo dataset, which contained a few thousand examples of Monet paintings and photographs. Luckily since the Cycle GAN is unpaired we only need a bunch of photographs and paintings, which means that there are very few constraints on the data.

In terms of model architecture there is really nothing special going on. The generators take a 512x512 image, downsample it with two convolutional layers, and then there are five residual layers that help guarantee feature matching in the images. The discriminators take either a real or a fake image and return the probability that it thinks the image is either real or fake. A picture below illustrates the model at a high level where X and Y represent the image domains, G and F are the generators, and D represents the discriminators for their respective image domians.

Source: https://arxiv.org/pdf/1703.10593.pdf

Perhaps the most interesting thing about the Cycle GAN is the series of generator losses that help ensure we can generate photorealistic images. This is captured through the cycle, identity, and GAN loss, all described below.

  1. The cycle loss is simply the L1 norm between a real image and the same image passed through both generators. In theory we want the cycled image to be very close to the original image, so the L1 norm captures this error in the generators well.
  2. The identity loss on the other hand is the L1 norm between a real image and that image passed through its respective generator. This is because if we give a generator a photo, we want the painting it generates to maintain the same features as the original.
  3. The GAN loss is most familiar to traditional GANs. It is simply the Binary Cross Entropy between the discriminator output of the generated image and a tensor of ones. We want the generator to fool the discriminator by making it think that the fake images are real with probability 1 (hence the tensor of ones).

We decided to train our network on Google Cloud GPU instances since GANs are notorious for their heavy compute requirements. The training run that produced the models that we used in the web application took about a day, but sadly the TensorBoard logs got corrupted so we’re unable to display them here. Rather interestingly the losses were not constantly decreasing as one would like to see during training of a neural network, but for GANs the consensus seems to be that the loss curves don’t really matter past a certain extent since the main goal is just style transfer.

Results

Once the model was trained we wanted to actually generate some images using our photo-to-painting generator. The results were nice considering we only trained the model for a handful of epochs (even though it still took a long time). A raw image and the transformed image are shown below.

Left: raw image. Right: generated Monet-stylized image

Just from this example we can see that the model did in fact learn what we wanted it to since the right image definitely looks a bit like a Monet painting. The model does a good job at blurring the details in the raw image, and accentuating the colors, especially hues of orange and purple.

One interesting thing we tried to do was pass this generated image through the generator again, just to see what would happen. We were rather surprised at the results, since the image looks even more Monet-esque.

Left: raw image. Right: raw image passed through the generator three times

What was most intriguing about this is that it gives insight about how the generator is adding style to the images. During training, the generator likely learned how to blur and adjust the color of the input images so that it can better fool the discriminator since these are qualities that are most strongly present in Monet paintings.

It would be an understatement to say that it took a lot of work to build and train this Cycle GAN, but we were only halfway there. We still had to build a web app that allowed us to deploy our model, and ML deployment is without a doubt the most complicated part of any ML project.

Model Deployment

Since we built our models in PyTorch, at the end of training we can save the model weights in a .pt file. These weights can then be loaded into the web server, essentially allowing us to load the model as it existed during training, except now we are going to use it for inference.

When deploying a ML application, the difficulty no longer lies in the research code to build the model but rather in creating an environment where users can upload data to be passed through the model. In our case, we needed to allow the users to make a POST request to upload an image, and then we can have code that decodes the image bytes in the web server, giving us the image the user uploaded.

Model Pipeline

Without a doubt the most important part of any deployed ML application is the model pipeline. This describes the complete flow of data through the servers and the model itself, and a bad pipeline can render an otherwise good model useless. A flow chart of the model pipeline is shown below.

Flowchart of the model pipeline

The image does a pretty good job at explaining the general pipeline, but there are still some ideas that should be emphasized, especially for first-time model deployment. One important point is that performance is the name of the game in inference pipelines. We addressed this point by firstly loading the model with TorchScript JIT compiling, which speeds up the inference by freeing us from the slow Python interpreter.

Additionally, we quantized the model from float32 during training to int8 during inference. This essentially transforms the model weights from the original floating point to a more compact integer representation, thereby speeding up the model and reducing the RAM footprint. Both of these changes resulted in around a half a second reduction in inference time.

Deploying as a Web Application

We decided to host our work as a web application for the sake of easy user interaction. Using Dash, a data dashboarding Python framework, we constructed the front-end of our application with the help of various dash bootstrap components. A flask backend server then handled the routes for executing the model pipeline and allowing users to download and upload images.

Styled using the MINTY layout, courtesy of dbc themes, our application consists of two separate pages. Page 1 is dedicated to our homepage where users can interact with the application to upload, process, and eventually download their transformed images. Page 2 illustrates our model’s architecture from start to finish.

Lastly, to make our web application publicly accessible we use Heroku — a cloud platform ideal for small applications.

Challenges

The most challenging part of deploying the model was without a doubt the compute resources required by the model. Since we deployed the deployment Docker image on a free Heroku VM, we didn’t have access to a GPU or extensive amounts of RAM to process the large 512x512 images. Even though the webpage works fine and generates an image in around three seconds when running on a local computer during testing, if the container ever exceeds the resources it is allocated, Heroku will terminate the process, resulting in the webpage breaking.

If the webpage is lucky enough to not break during forward propagation through the generator, then it will still take around eleven seconds to finish generating the image. This bottleneck during inference could without a doubt be fixed by running the model on a GPU inference server, but that would additionally generate high costs.

Conclusion

Overall, our work in Cycle GANs over the past few months provided a great opportunity to dive into the complex and intriguing area of deep learning deployment. The challenges introduced when going from research to production introduced us to a more industry-oriented approach to machine learning research while also allowing us to explore the cutting edge of GANs.

Some areas for further exploration on the research side of this project include testing out different numbers of residual layers in the generator and its effects on model performance, and on the deployment side code optimization with something like Nvidia TensorRT or PyTorch Onnx could produce interesting benchmarks. We are very proud of this project and what we learned from it, and we look forward to exploring the cutting edge of deep learning at Research@DataRes in the fall.

Code for this project is freely available at https://github.com/colinpcurtis/datares_GANs

--

--