StyleGAN experiments


Generative Art / StyleGAN2 / Audio-Reactive / Machine Learning



Audio-reactive generative art built by mapping music to StyleGAN2 latent spaces trained on faces, Mughal miniatures, and modern art.

Personal Project, 2020 - 2021

I treat machine learning models as instruments. StyleGAN2 learns to compress an entire visual world into a latent space, a continuous field where every coordinate is a valid image and every path between them is a smooth transformation. When I map music to that space, the model stops being a generator and becomes a performer. Sound shapes the trajectory. The image follows. These four pieces are the result: each trained on a different visual world, each driven by a different piece of music, each revealing something the dataset alone could never show.

This work was made in 2020 and 2021, before the ChatGPT era, when AI-generated art was still a raw, experimental frontier and every output required training your own models from scratch.

Role

Concept, ML Training, Creative Direction

Stack

StyleGAN2, VQVAE, Spleeter, U2Net, OpenCV, Processing

Music

gloomghost

Period

2020 - 2021

A

Cascade

Over 100,000 people had died from COVID in the US when this piece was made. A number that large stops meaning anything. I wanted to make it visceral again. This video contains roughly 100,000 AI-generated faces. Each one unique. None of them real. They cascade through the frame in a continuous stream, set to a dark ambient track by gloomghost, composed while he was fighting the virus himself.

The audio is decomposed into tokens using VQVAE and Spleeter, then mapped to latent vectors in a StyleGAN2 model trained on the FFHQ face dataset. Each face is isolated with U2Net, fragmented through Delaunay triangulation, and composited via OpenCV. The result is meditative and eerie: a wall of people who never existed, dissolving into each other to the rhythm of a song about survival.

[sound on] Cascade by gloomghost
B

Scorpioid

I grew up practicing Mughal and Indian Miniature painting, a form of art that is extraordinarily intricate and largely forgotten. I wanted to know if a machine could learn it. Not just the colors or the shapes, but the logic of the tradition: the ornate border frames, the layered floral motifs, the muted palettes that distinguish this work from everything else in the history of visual art.

After months of training StyleGAN2 on a curated dataset of these paintings, the model learned to generate compositions that feel authentic. The video maps gloomghost's audio directly to the model's latent space. As the music shifts, the machine moves through its understanding of a centuries-old visual language, producing forms that are simultaneously familiar and impossible.

[sound on] Scorpioid by gloomghost
Watch in higher quality
C

Hello from the latent space

Can a machine generate me? Not a generic face, but my face, my expressions, my movement. The answer was uncomfortably precise. StyleGAN2 produces a version of me that blinks, shifts, and reacts with unsettling naturalism.

Behind the face, a second model trained on abstract textures generates a reactive background. It responds to the expressions of the first model. Two generative systems watching each other: one producing a person, the other producing a world that moves with them.

Two generative models in dialogue: one produces a face, the other reacts to it.
D

Crushed. Exhausted. Lost.

A StyleGAN2 model trained on modern art, driven entirely by music. The audio tokens from gloomghost's track "Talisman" are decomposed and mapped to the latent space. Every beat, every shift in texture and intensity pulls the model along a different path through its learned visual vocabulary. The output is raw, hallucinatory abstraction that never repeats and never settles.

[sound on] Talisman by gloomghost