April 22, 2023

What a Midjourney GUI Might Look Like

It's got cute little control knobs

This post was first published in my newsletter, Delightful

Me talk pretty one day

When I was kid in short pants, I used to drive my ‘79 Oldsmobile to Barnes & Noble on Broad Street in Richmond, VA and read Photoshop tutorials in magazines.

I learned about things like layers and curves and blending and masks. I learned enough to be competent, but not enough to be talented. I could never figure out exactly how to translate an idea in my head to an image on a screen—which is what we’re all trying to do, right? From the caves of Lascaux to the laptops in our home offices, the history of creative endeavor is the history of us trying to make thing in brain exist outside brain.

That sounds simple enough! Yet the basic creative act of using Photoshop requires that you speak several languages:

English (or what have you)
The language of design and creative concepting
The language of Photoshop’s GUI

In Richmond, Va., everybody spoke English, and they could apply English to every waking task.

Fewer people spoke design, or knew much about fundamental concepts like color theory, shape theory, ratios, and composition. The ones who did could apply those concepts to a wide variety of creative tasks, such as using Photoshop.

Fewer people still spoke Photoshop, with its own complex menus and terminology. That’s what manipulating an interface is. It’s speaking another language. And if you spoke Photoshop well enough in the year of our lord 2001, then you could create wonders of artistic profundity, like, I dunno, what was on the splash screen for Photoshop 6? A floating eye, you say? And a starfish? Truly we were giants.

But now we have Midjourney.¹ You don’t need a GUI. You don’t need to understand design. All you need to know is your native language. All you gotta do is write a prompt, hit enter, and et voilà, you too can create an image of the supreme Roman pontiff in full drip.

Your native language is the new language of design.²

Which means that David Sedaris had it right: me do, in fact, talk pretty one day.

Ok but

It takes some practice.

Getting what you want out of Midjourney isn’t exactly like just, y‘know, writing “imagine the Pope in a Balenciaga puffer jacket”.

It takes practice to generate the image that you want. Midjourney likes to be spoken to in a specific manner.

There are plenty of people out there on twitter and youtube who are kindly explaining this fact, day in, day out, ad nauseum.

Each of them will yell you something like you don’t just write “pope in Balenciaga”. Rather, you write something like “Street style fashion photo capturing a serious Pope Francis walking confidently at the Vatican. He’s wearing wearing a high fashion, full-length white puffer jacket, style Balenciaga, hypebeast. Lit by off-camera flashes, paparazzi style”.

And by writing that you get something close to what you envisioned.

But you could be more precise still. You could stipulate the camera angle, or the film type, or the material of the jacket. Or you could describe the ambiance, or the atmosphere. And so you adjust, and you upscale, and you make variations, and you re-roll.

This is not to say Midjourney is difficult. It’s to say that using Midjourney is easy, and it’s made orders of magnitude easier by two things:

Your ability to precisely articulate, in descriptive words, your vision
Your knowledge of a wide range of creative concepts like shot types, film types, lighting styles, photo styles, materials, artistic styles, etc.

The more you know about e.g. photography styles or illustration or art history or haute couture or whatever, the less image creation is about guess work and more about filling in a mad lib that might look something like this:

[Shot Type] photo of [Subject] wearing [Designer] [color] [material], [atmosphere], [ambiance], [Position], [Lighting Style], shot on [Film Type]

For each of those brackets in that Mad Lib, there are underlying concepts.

The more of those concepts you know, the more precise your results.

Which means the language layer has shifted.

Knowing how to speak Photoshop meant that you didn’t have to know how to articulate your design concepts. You could lean on Photoshop’s interface to fill in your language gaps. You could play around visually, even if you couldn’t explain what you were doing using your words.

But now that you no longer need to speak Photoshop, you need to be that much more articulate in your language and the languages of art and design.

Oddly enough, that brings us back to GUIs

So what if you’re not articulate in the languages of art and design?

What if you can see the photo of the Pope wearing a fashionable puffer jacket in your head, but you don’t know anything about fashion brands, or photography styles, or film types, and so you can’t describe it?

Well then it would be a lot easier to just pick those things from a GUI, wouldn’t it? Maybe from something like this³:

You’d begin, of course, with a text prompt.

The copy in the prompt would then toggle the GUI controls.

So if your prompt included the Pope, maybe the GUI sets the image type as Person and the aspect ratio as Instagram-friendly.

You’d want to be able to browse and set the type of image style and, if applicable, stipulate what kind of shot type, film format, camera angle, and lighting style the image requires.

And since this image would be of a person, you’d want to set those characteristics—along with the exact prompt weight to tell Midjourney the relative importance of each characteristic. (I really like these knobs, they remind me of a sound mixing board.)

Finally, you’d want to select the type of action the person in the image is doing, along with setting negative prompts for concepts you don’t want in the image.

Of course, all these changes within the GUI would update the text prompt as well, which would remain editable.

Is this the way Midjourney’s going?

No idea.

But it was certainly fun to wireframe it out as a thought exercise about how a GUI might look and function.

Me personally, I’ve been having a lot of fun learning how to describe my prompts with language. It’s a great excuse to get precise about certain concepts.

Then again, I’m lazy, and it’d be chill to be able to use writing and visuals to create something net new.

Language and Art: the original Reese’s Peanut Butter Cup™ of creativity.

What a Midjourney GUI Might Look Like

Me talk pretty one day

Ok but

Oddly enough, that brings us back to GUIs

Other posts in

Artificial Intelligence

AI as Therapy

What a Midjourney GUI Might Look Like

You're wrong about tomorrow

Try having lots of bad ideas

Let's work together

Alright, then.