Create
Profile
⏲️ Cyber Monday SALE ⏲️ Ends soon, stock up now! Buy Credits

Experiment

VQGAN+CLIP Keyword Modifier Comparison

We compared 125 keyword modifiers with the same prompt and initial image. These are the results.

Jump to the results
This is a living document. The most recent keyword modifier was tested on Oct 13, 2021.
Experiment Base Image
Base image (upsized). 400 iterations of "Scary skeleton astronaut in space".

Method

The Experiment Explained

We started by running the prompt "Scary skeleton astronaut in space" for 400 iterations at thumb resolution (400x400px). That gave us this base image.

Then, we evolved that creation 125 times, each time adding a different keyword modifier, and running for an additional 400 iterations. "Evolving" a creation uses the previous creation's output as the start image for the next, so every experiment started from the base image (i.e. NOT from scratch).

This experiment was inspired by this fantastic album by Reddit user u/kingdomakrillic.

Want to try your own modifiers on this base image? Click "Evolve It" below then add your modifier to the prompt.

Evolve It

Jump to results

VQGAN+CLIP Keyword Modifiers
Clockwise from top left: "A dog on the beach", "A dog on the beach Thomas Kinkade", "A dog on the beach Unreal Engine", "A dog on the beach detailed painting".

How do VQGAN+CLIP Modifiers Work?

Taken from our VQGAN+CLIP tutorial on Medium.

Modifiers are just keywords that have been found to have a strong influence on how the AI interprets your prompt. In most cases, using one or more modifiers in your prompt will dramatically improve the resulting image. Here’s an example using the text prompt “A dog on the beach”. It’s obvious that the top left image (without any modifiers) is noticeably worse than the others.

So why do modifiers have such a dramatic effect? It’s to do with the data that the CLIP network was trained on — millions of image and caption pairs from the internet. CLIP has seen a huge number of images on the internet, and the ones that include the words “Thomas Kinkade” in the caption tend to be nicely textured paintings like those shown in the centre-left image. Likewise the images that were paired with a caption containing the words “Unreal Engine” tend to look like scenes from a video game (because Unreal Engine is a video game rendering engine).

Thus, when you include modifiers like “Thomas Kinkade” or “Unreal Engine”, CLIP knows that the image should look a certain way. Note that in the examples above, it’s not so much the shapes that are better with modifiers, it’s the finer textures that make it look better.

A few of our favourites
Clockwise from top left: "Unreal Engine", "VRay", "SketchFab", "CryEngine".

Observations

A few key takeaways

3D rendering engines as modifiers

The modifiers that are 3D rendering engines (pictured: Unreal Engine, CryEngine, VRay, SketchUp) really shine here. Interestingly, the rendering engines targeted at games (Unreal Engine, CryEngine) both ended up with a spaceship interior in the the background.

Some didn't affect the base image much

Some modifiers like "futuristic", "mystical", "dream" and a few others didn't end up deviating far from the base image. Perhaps this means that CLIP doesn't have a strong concept of what these keywords should look like? Or maybe these modifiers are a bit too broad and therefore hard for CLIP to steer towards any particular look? It would be interesting to do more experiments with these modifiers to get a better idea of what's going on.

VQGAN+CLIP on NightCafe Creator

>2x faster than Google Colab • Run multiple jobs in parallel • Works on any device • Create, evolve and add modifiers in a few clicks

Results

Scroll through the results and vote for your favourites by liking them.

  • Most Popular
  • Alphabetical
  • Most Recent
3 months ago
matte painting
matte painting

matte painting

TextMediumThumb

VRay

TextMediumThumb

35mm

TextMediumThumb

Follow NightCafe

Get the best artworks in your feed. Follow us on your favourite channel.

Facebook
Instagram
Twitter
Reddit

pixiv

TextMediumThumb

pixar

TextMediumThumb

8K 3D

TextMediumThumb

bokeh

TextMediumThumb

Flickr

TextMediumThumb

HDR

TextMediumThumb

IMAX

TextMediumThumb