Ethan Stanks

Ending this week, I'm feeling really good about how this project is going. I am finally able to design the Gradio interface in a way that I feel is better, and I got the test stylegan2 model up and running on HuggingFace!
One issue I ran into this week was getting the Cuda toolkit to work in any of my development environments. I am unable to have any of my Torch libraries recognize my Cuda toolkit install, forcing all my development to run on the CPU instead of the GPU. The only time I don't have this issue is when I run the codebase on Docker. This is a hastle because HuggingFace is its own docker and repo, so having two copies might add some confusion down the line. I'd like to look into using the HuggingFace space's Docker container locally for next week.
Also, last night on my laptop, I decided to make the switch from Windows to Linux for development. I'm using the Pop OS distribution, which seems to be Ubuntu-based. I'm hoping Linux development will be better for me in the long run and be great to learn.
My tasks for this week consisted of getting the Clip NLP model running on HuggingFace, creating a header and footer for the interface, adding suggested prompts, augmenting images to see how accurate the image comparison script I made is, and getting the stylegan2 model up and running on HuggingFace.
The image comparison script I created last week has some new features. Before, it was only able to take an image and compare it to 1/4 of the training images, producing a score from LPIPS and SSIM. Our instructor, Philip, recommended that I take a real image and compare it to an augmented version of that image to see how well these scores perform. I did this task by taking a real image from the training set and looping it nine times to get nine different augmentations to compare against. Using Keras's ImageDataGenerator, I applied a rotation, width and height shift, and zoom. These augmented images are then compared to the original image for scoring. I found that the LPIPS score remains high because the more augmentation you do, the lower it gets. I found that the SSIM score is low; maybe this scoring method is bad or the augmentation changes the image's structure so much that it can't find resemblance. Below is the plot of the image's augmentation and the scores.
Now on to creating the header and footer for the interface. Initially, for Gradio, I was using its interface class to create the front-end, but this was causing so many headaches. I couldn't even get its button class to work with it, and these tasks involved creating HTML and custom CSS. After not being able to get the header HTML to appear above the interface, I found that Gradio has a low-level API system called Blocks, which is what the interface is made off of. With blocks, I am able to break up each interface component into its own section, giving me more freedom to design the interface. I was then able to add the header and footer and even switch to using buttons instead of checkboxes. The blocks were also super simple for getting the prompt suggestions setup. All it does is grab two prompts from a hardcoded list of 50 prompts. Below is an image of how the interface now looks. Next week, I'd like to polish the layout and sizes so they look more organized.
Getting the Clip NLP model onto HuggingFace was pretty straightforward. HuggingFace originally runs on the CPU, so I had to comment out all the GAN code in order for Temitayo's ClipGanIntegration script to run on the HuggingFace container. When a prompt is entered, it goes through his clip code and is then sent back to the front-end to be displayed on the interface to see what the prompt looks like before getting sent to the GAN model.
Remember me saying I had issues getting the Cuda toolkit to work on my machines? Well, that means I can't test the GAN integration locally. I initially tried to test it on HuggingFace but ran into issues because HuggingFace is using a CPU and can only handle full float tensors. Before the team spends any money on upgrading the HuggingFace space to a GPU, I wanted to see it actually work locally. The workaround for this was putting the HuggingFace code inside our Docker container so that it could use the container's Cuda toolkit. Testing it this way worked! Will chipped in $40 to upgrade the space to a GPU. It runs perfectly on HuggingFace's cheapest GPU for 40 cents an hour. The space allows pausing, so we only have it running now when we want to test a feature or showcase it during a build review, so as not to spend money on something that isn't needed. Below is an image of it running on HuggingFace using a suggested prompt, outputing the prompt clip used and the images that were generated.
Next week is the last week for the first month of development. I'd like to get Will's stylegan2 model running on HuggingFace instead of the test stylegan2 model that Temitayo uses. As well as getting the interface looking sharper and any more recommendations for the image compare from Philip. We still have two more months of development, so the team is discussing routes we can take with this software for more development. I had ideas for how we can make realistic faces for billboard advertisements, image to image generation, where you take an image and turn its characteristics into a realistic person, and lastly, background generation for the faces. I have no idea where the project could go in the following months, but I know it'll be awesome!

With two months left to go, the team is narrowing down what is left to be done. Will has been working on training the stylegan-t model; Temitayo has been working on setting up image augmentation; and I have been dedicating my time to helping Will. Will has been having all sorts of issues with his laptop, which he uses to train the model. Last week, he ran into a big issue that wouldn't let him train past 20 epochs. The model is set up to save a pickle file and a generation image from the current state of the generator every 5 epochs. For some reason, every time it got to the 20th epoch, it would crash for him.
This week, I was tasked with helping Will figure out this issue. During my group's meeting, I mentioned that it may be a memory issue. If it's crashing at 20 epochs every time, then maybe it is keeping the other saved models in memory and then running out of room at the 20'th epoch model. With this in mind, I went digging in the training loop.
The training loop takes in a load of parameters, but the one I wanted to focus on was the network snapshot ticks. This parameter is used to tell the loop how often to save network snapshots. We have this set to five, so it will save: 0, 5, 10, 15, and then crash on 20. When it sees that the current tick needs to save a snapshot, it creates a dictionary that contains the generator, descriminator, model, and training arguments. It will make a deep copy of the model, which is very memory-intensive. Strangely enough, Will conserves memory in the loop by deleting the deep copy after storing it, so it can't be that. Later in the process, the script evaluates the training metrics. After outputing the metrics, it will conserve memory by deleting the dictionary that stores the generator, discriminator, model, and training arguments. With it deleting these two variables that utilize a lot of memory, I can't seem to think of what would cause this issue.
Will always seems to run into issues with WSL. Our instructor, Phil, discussed during the meeting that the issue could be a result of the Docker container or WSL not having enough allocated disk space or memory. Will already tried to allocate more space for both on his Docker container, but that was no luck. While Will worked on figuring out how to allocate more space for WSL, I wanted to run the training on my machine to see if I ran into the same issue with it crashing at 20 epochs. Will's machine runs Docker on WSL2 using Ubuntu on Windows. My machine runs Docker using a Linux distribution, so I won't run into any issues with WSL. After fixing some pathing issues, I was able to start training the model using labels and images. I left to run some errands and came back to the training on epoch 50! Since then, Will and I have discussed switching his laptop to a Linux distribution instead of running WSL on Windows. I have been running the training on my laptop for the past three or four days. Below is a GIF of all the image snapshots that were saved. It's very interesting to see how the images change over time, including the generator exploring different saturation levels, like going from black all the way to fully white. As of right now, the model has been trained on 1,500 kimgs. I'm going to continue to train the model until Will gets his laptop all ready for training again. Also, I was tasked with figuring out how to generate images from the pickle file on HuggingFace and fixing the error there, but I haven't been able to use my working laptop while it trains.
Next week, I want to generate valuable descriptions for the Flickr face dataset. The dataset the model is currently training on contains a variety of images that are not exclusive to faces. We can't use the Flickr dataset due to not having good enough labels for the images. If I can get some decent descriptions generated, we can narrow the training scope to better match FaceCraft's goals.

Ethan Stanks

Capstone Blogs

Month One

Introduction

Research and Planning

Project Management

HuggingFace Research

Month Two

First Week of Development

Performing EDA

Upgrades People, Upgrades

Reformatting Layout

Month Three

Filters and Effects

Transparent Backgrounds

Generation From Pickle

Training Command Interface

Month Four

Debugging and Training

Label Generation

Image Generation On HuggingFace

Backup Training

Month Five

HuggingFace

GitHub and Docker

Final Development Week

The Last Blog