Lionel Cheng | Stable Diffusion

So in April of this year (2022), OpenAI announced the release of the much-anticipated DALL-E 2, an AI capable of generating digital images from natural language descriptions. The results were mindblowing as the level of details achieved, especially for faces, was on par with what humans could produce. One of the most famous pictures from DALL-E 2 is the star-gazing raccoon astronaut:

Image generated by DALL-E 2 using the following prompt: "a raccoon astronaut with the cosmos reflecting on the glass of his helmet dreaming of the stars".

A few weeks later, Google announced Imagen, a similar text-to-image model which was claimed at the time to be slightly better than DALL-E 2. APIs to generate images from prompts were soon released to the public for OpenAI and if you were among the first beta-testers you could generate images on-the-fly. However those models were really heavy and computationally expensive to run, even for inference so that generating those images locally on your laptop was out of reach.

This all changed in August 2022 with the release of Stable Diffusion, an open-source, optimized text-to-image model that could be run on a sufficiently powerful laptop with a GPU (such as Macs M1/M2). The idea of this new model is to train and generate images on a latent-space, much smaller than the high-resolution space of the actual images (a link to the paper here).

I have been very enthusiastic at the release of Stable Diffusion and wanted to try things out by myself so here is the beginning of my journey to become a prompt master! All the images created here have been done so with weights of Stable Diffusion v1.5 with the InvokeAI fork of the original repository on GitHub.

I think the most amazing thing about Stable Diffusion is that it allows a lot of people without any drawing/painting talent 🧑‍🎨 (that includes me) to create complex and beautiful images. It still takes time to become a prompt master as there are a lot of additional keywords necessary to create something close to what you have in mind. Several pages have helped me refine my skills in that regard: Best 100+ Stable Diffusion Prompts, Stable Diffusion: Prompt Guide and Examples among others.

If you are only prompting from text to image you are likely to produce astonishing images but you lose some kind of control about the exact shape and positioning of things in your image. This is fun at the beginning but my intent with Stable Diffusion was to reproduce images that I had in my head. A very exciting feature in Stable Diffusion is that you can give it a sketch of yours as a first guess and it will “enhance” your image depending on your prompt. The first example can be found below where I used Seashore (an equivalent of Paint on Mac) to draw a rather childish sketch of a mountain with a river that I turned into an oil painting thanks to Stable Diffusion:

Original image created using Seashore (left). Image generated using Stable Diffusion v1.5 and the following prompt: "oil painting of flowing river, mountains with cloud, hyper realistic grass, sunset, turner, 4k, detailed, nature photograph, national geographic" -f 0.85 (right)

The result is quite impressive as the childish sketch has been tranformed to something much prettier (to me at least). I had quite some trouble turning directly the original sketch into a realistic picture, I found that it is easier for the model to transform any sketch to a painting as it blurs things out. So in a second step I started from this oil painting to generate a photorealistic view of my initial sketch (left image of the figure below). I was not quite happy with the mountain itself so I used a feature of the model to replace the mountain by another one in the image so that last one was closer to what I had in mind.

"oil painting of flowing river, mountains with cloud, hyper realistic grass, sunset, turner, 4k, detailed, nature photograph, national geographic " -f 0.9 (left). "snowy mountain, hyper realistic, national geographic" (right)

Gallery

Here are some other snapshots of images that I have created using the same mechanism: sketch on Seashore, painting and then image (if applicable). The prompts can be found in the captions 😊.

Sketch of a city at night (left). "oil painting of a city at night, josef thomas, dim light" (middle). "city at night, winter, dim light, photograph, highly detailed, ultra realistic, 4k" (right)

Sketch of two boats at sunset (left). Sketch of two boats under a full moon (right)

"two boats sailing in the open ocean at sunset, some clouds, caspar david friedrich, detailed, 4k" -f 0.9 (left). "two boats sailing under a full moon at night, sky full of stars, white sailes, oil painting, turner, 4k, detailed" -f 0.8 (right)