DALL-E 2

All images on this page were generated with AI. That's right. The wait is finally over. I got access to DALL-E 2 from Open AI. This mind boggling image generation is state of the art, and runs completely on Azure. Getting access to it has been both incredibly exciting, but also a little bit terrifying.

What is it?

DALL-E is a combination of a powerful Natural Language Processing AI, meshed with an even more powerful image generation AI. The combination creates a system that can take text based descriptions and turn them into incredible, original images. It can even understand the context of the setting, scene, and style of Image. I'll give you a few examples.

How does it work?

Well, that might be the secret sauce. Much like Open AI's GPT-3, DALL-E 2 has been trained on a huge corpus of information. It is able to generate context about a huge number of words, and convert them into images which (mostly) make sense based on our descriptions of them. The more words, and the better descriptions we can give, the more accurate the images can be.

Shut up and show me

I wanted to pick a few topics that made sense to me, to demonstrate the originality and ingenuity of DALL-E 2.

A young beekeeper in a veil surrounded with bees - in the style of Grant Wood

This incredible portrait of a beekeeper is genuinely very creative - and looks like it could be hung in a gallery. Using specific examples of art styles is really powerful in DALL-E 2.

We can also ask for photo realism, here's the same prompt again with a change to the style:

A photograph of a young beekeeper in a veil surrounded with bees

At first glance this looks incredibly real. It could be a young woman beekeeping. It's not until we look closely that we see issues. Some to note are details of the eyes and teeth, lack of honeycomb structure on the frame, and the rather manic and poorly defined appearance of the bees. Although it certainly nailed the dirty suits!

Is it really unique?

DALL-E 2 thrives on it's uniqueness. In my opinion the more abstract the input, the better it does at generation. To test this let's head to a random idea generator and generate something that surely has never been seen before.

I started by heading to https://randommer.io/random-things-to-draw and picking 'the president shaking hands with an alien', but let's go further. Where are they? Surely not the oval office. What colour is the alien? Using other suitable random toolings I came up with:

'The president shaking hands with a purple alien at a carnival'. We'll add the tag 'digital art' for this one.

Four options for: 'The president shaking hands with a purple alien at a carnival, digital art'

It's interesting that 2/4 of the 'presidents' are older white males, but two are aliens. We never actually said what planet or country the 'president' came from. For the human versions we can see there is some level of bias there. Both with appearance, but also the presence of the US flag in the second image. This is likely due to a human association of the English word 'president' with the USA, and then subsequently older white males, being ingrained into the corpus of data.

Variations

DALL-E 2 can also start with an image and look at variations of the image. Let's start with the first human president example.

In this way we can quickly generate mew versions of our preferred image. In this case the tent remains similar, we keep a (mostly) human president, and get a few styles of poses and aliens.

The president shaking hands with a purple alien at a carnival, digital art

In this case though, I think the original selection was best, but that's just boring human opinions.

Safety First

Open AI has enforced a strict usage policy and guidelines that will automatically flag prompts that are deemed inappropriate, and stops image generation. At the moment this seems to stop us using direct likenesses of real people, so we can't generate some meme worthy celebrity moments that never existed. Read more about the content policy.

A few more, for the sake of it.

The Pokemon Pikachu eating spaghetti bolognaise - oil painting

Note the subtleties, like not depicting Pikachu on a chair, and understanding its relative size to a human plate of pasta.

I like this one, because it's perfectly valid to think that the etching itself should be placed on a hill, rather than the etching being of a car on a hill. Being specific with your wording is key, but that's true for humans too!