Lab · Generative AI

Monet style transfer

One-week solo lab in the Generative AI module. The brief was a real Kaggle competition: turn 7,028 photos into Monet-style paintings without paired training data. CycleGAN from scratch and Stable Diffusion + LoRA, run across three platforms because one wasn't enough.

Context

The Generative AI module at ESAIP runs weekly labs alongside its larger projects. TP4 — the Kaggle Monet Challenge — was the most demanding of them: solo, one week, on the live gan-getting-started Kaggle competition. The deliverable was a notebook with a working CycleGAN, the optional Stable Diffusion + LoRA exploration, and a Kaggle submission of 7,028 generated images.

The lab was deceptively heavy for a week of work. Image-to-image translation on unpaired data is a mode-collapse-prone setup that needs patient tuning, and the cohort generally felt it.

The problem

7,028 photos to convert into Monet-style paintings, with 300 actual Monet paintings as style reference. No paired data — the photos and the paintings have nothing in common except both being landscapes. The competition metric is MiFID (Memorization-informed Fréchet Inception Distance), a variant of FID that adds a penalty when generated images are too close pixel-wise to the training set. Lower is better. The brief flags under 100 as decent for a first try and 40-60 as the leading-edge band on this competition.

The approach

The mandatory approach was a CycleGAN built from scratch (Zhu et al., 2017): a ResNet generator with 9 residual blocks and instance normalisation, a 70×70 PatchGAN discriminator, three losses summed during training — adversarial (LSGAN, MSE), cycle consistency (L1, λ=10), identity (L1, λ=5). Linear LR decay after epoch 20. A 50-image replay buffer to stabilise the discriminator. Both the replay buffer and the LR decay are listed by the brief under "going further" rather than as minimum requirements; integrating them by default was the reasonable thing to do given how unstable CycleGAN training is without them.

The optional second approach used Stable Diffusion v1.5 with a Monet-trained LoRA, fed through an img2img pipeline (the strength parameter sweeps how much of the source photo's structure to keep versus how much to replace with the model's prior). Same target — turn the 7,028 photos — different mechanism: instead of training a generator from scratch, take a pretrained image generator and shift its style with a small adapter (~10–100 MB of weights) trained on the 300 Monet paintings.

Across three platforms

Mid-week the Colab session started timing out before the CycleGAN converged, so I migrated the run to my own RTX 4090. The notebook ended up in three platform variants — Kaggle (free P100), Colab (T4), and local — with a competitive skeleton on top, so the same code is reproducible regardless of which GPU is available. The brief asks for one notebook; ensuring the same code runs reliably across three different GPU profiles took roughly half the work.

About the score

Kaggle's MiFID submission scoring was broken across the cohort that week — submissions stuck in queue, scores returning errors. Most of the class did not get a final number. The lab grade was based on the notebook itself rather than the leaderboard position, so the absence of a score doesn't change what was built. But it does mean this page lists no MiFID figure to compare against the brief's 40-60 leaderboard band. The model trains, the visual outputs match the Monet style as expected from CycleGAN, the pipeline is end-to-end. The score is just unmeasured.