The Interactive & Immersive HQ

FLUX: New Generative AI Models

There’s a new sheriff in town for those of us who are excited about the eruption of mind-blowing (if ethically dubious) tools made possible by advances in generative AI image-creation. The FLUX family of models from Black Forest Labs represents another leap forward in what is available to us, both in terms of the overall quality of the creative tools and the specificity of what can be asked of them.

FLUX AI vs Midjourney

As a long-time Midjourney user, I can’t help but point out that the platform hasn’t completely lost its mojo with the introduction of the FLUX suite.

When it comes to generating work that looks genuinely artistic and impactful–even tasteful–I still consider Midjourney to be the reigning champion for certain things, so long as you’re willing to do most of the work with text alone.

That being said, MidJourney is a pay-to-play tool, and it’s much (MUCH!) less flexible than what is available when using an advanced diffusion model in the context of a work environment like ComfyUI. With some extra elbow-grease, FLUX is a superior tool.

A futuristic landscape with remnants of a bygone era, dominated by a large sphere in the sky. In the foreground, FLUX AI creations roam alongside natural wildlife like deer and bees, blending technology seamlessly with nature's essence.

The FLUX Suite

There are a bunch of different variations within the FLUX toolset, even beyond the Big Three that Black Forest specifically offers.

There’s the Pro model, which you can only use after passing through a (pretty reasonable) online paywall like replicate.com or freepik.com.

Then there’s the Dev model and the Schnell (fast) model, which you can download from huggingface or CivitAI. The Dev Model comes in at a whopping 24 gigabytes and takes time to load and generate images; it’s a monster, both in its size and capabilities.

The Schnell model is a lightweight, faster model, but it still delivers impressive results (and you can run it on lighter hardware). If your GPU has less than 24gigs of VRAM, you’re going to struggle with the Dev model.

Beyond that, there are a bunch of “quantized” models that run at various different levels of speed and capability (these are referred to as the Q8, Q6, Q4 and so on). They are basically pruned a little bit, like compressed images. I didn’t test every single one of them, but the online chatter about them is that they deliver pretty satisfactory results with much less of a computational footprint…

What Makes FLUX Special?

So let’s look at some of the things that make FLUX special. First of all, check out some realistic photographic imagery. These don’t incorporate any LoRAs or other inputs (ControlNets, etc.)–just the model talking to the KSampler.

Top left: Two people in work attire stand by a red truck. Top right: A kingfisher bird splashes out of water. Bottom left: Woman in a dress on an elephant. Bottom right: Street musicians perform at sunset.

The images are basically the greatest hits of prompts that required several tries, additions, re-writes, and misfires. If you have a VERY specific image in mind, it’s going to take some time to get there, and your prompt will get longer and longer as the process continues.

That being said, the overall quality of photographic imagery is unsurpassed. When it comes to rendering people, faces, and poses, FLUX AI generates images that are stunningly realistic. Ditto for landscapes, nature images, machinery, and interiors. This is a well-trained animal, indeed.

FLUX AIn’t Perfect.

But let’s nitpick a bit. FLUX still struggles with that familiar “A.I. Look”–vaguely shiny, oversaturated, or just overly-pristine. The elephant in the phony Vogue shoot looks especially… weird. And fabrics and skin tones still seem oddly airbrushed, almost as if they’re occupying a space between photography and illustration.

In an ornate room with large windows, a woman in a black dress and tiara gracefully sits atop an elephant, looking as if she commands the space with ease—much like how FLUX AI seamlessly integrates innovation into everyday life.


Working on its own, FLUX still seems to behave a little like it was trained exclusively on stock photography. Even if you ask it to give you “candid” or “shot on an iPhone,” the people tend to look like they’re straight out of central casting, with a make-up and hair crew just outside the frame.

So if you want the people in your images to look like something other than fashion models or cultural cliches, you’ll need to say so specifically. (As an aside, the FLUX Pro model purports to tackle this issue directly).

It’s also inexplicably difficult to achieve certain things… For example, in the image of the father-daughter photo in front of the pick-up truck, I requested multiple times in the prompt that they be “muddy.” It simply wouldn’t do that… I’m sure I could have gotten there eventually with the right word combination, but it can be unexpectedly time-consuming to do simple things, given that much of what these systems do seems utterly impossible.

Get Our 7 Core TouchDesigner Templates, FREE

We’re making our 7 core project file templates available – for free.

These templates shed light into the most useful and sometimes obtuse features of TouchDesigner.

They’re designed to be immediately applicable for the complete TouchDesigner beginner, while also providing inspiration for the advanced user.

Artwork with FLUX Models

Artwork with the FLUX models is impressive, but without a LoRA (see below!), it doesn’t quite stand up to the creativity of Midjourney.

I generally find the models to traffic in cliches, and they don’t know the names of any artists, apparently. This may be a conscious choice on the part of Black Forest Labs… naming a particular artist in a prompt is a potential copyright infringement, so maybe we should thank them for leaving it out. Midjourney seems to advertise the um… theft(?) of artistic techniques as its raison-d’etre.

You can get some headway by naming a particular trend or style (i.e. watercolor), but the models tend to have set ideas about the meaning of particular keywords. Longer prompts yield increasingly better results, so don’t give up at first… you can communicate with FLUX in plain English, and it does a pretty good job of composing layouts and moods as per instruction.

A painting featuring an abstract raven, an orange fish with splashes, a futuristic building, and a dual-faced portrait.

Using FLUX to Render Text

Another leap forward is FLUX’s capacity to handle text. You can hand it a few descriptors and a line of text, and it will generate one viable option after another, only occasionally mangling or misspelling the words (this was a pitfall of other older models, and one area where Midjourney seemed to stand alone).

Check out these Logos, all generated with one-sentence prompts:

Four logos: Canopy Vineyard (tree icon), Busy Bee Baked Goods (bee icon), Gen AV 7 (modern font), and Acultura Design (ink splatter effect).

Inpainting & Outpainting Tools

One last example of the versatility of FLUX models is that they will work with inpainting and outpainting tools, so you can work on particular areas of an image, or enlarge the image and fill in the areas that are as yet un-rendered.

This tool has been available in Photoshop for a few months, so obviously this isn’t a revolutionary development, but you can do it in ComfyUI for free with a model that is exceptionally accurate and creative.

Check out these modifications to an Abe Lincoln portrait. (The grey areas are painted over and re-rendered in successive frames with the FLUX-Fill model.)

A series of four portraits of a man resembling Abraham Lincoln with varying sunglasses and gestures, including a hand raised in a peace sign.

Using FLUX with ControlNets & LoRAs

Unlocking the full potential of FLUX (and putting the tools on solid footing to compete with Midjourney) requires a deeper dive into more advanced workflows where ControlNets and LoRAs become essential.

ControlNets

ControlNets are specific guidance tools, enabling you to take an image and offer its general contours to the main model, so that you can feed it a particular image composition and have it work from there. Black Forest Labs released two (Depth and Canny), but there are a host of others from third-party developers.

Canny offers the model what essentially amounts to an edge filter, finding the main contours and rendering them as hard white lines.

A group of three people in urban outfits appear in various artistic styles across four panels labeled "Source," "Canny," and "Output.


The depth ControlNet provides an approximate depth map.

Image showing a transformation of a shark: original photo, depth map, and two artistic outputs—one realistic drawing and one colorful abstract painting with human faces.

LoRAs

A LoRA–short for “Low-Rank Adaptation”–is the ingredient that vastly expands the capacity of the FLUX toolkit and brings the versatility of the (aptly-named) base models into full bloom.

A LoRA is a sort of filter or plug-in that takes the output of the base model and modifies it according to its peculiar specialty. There are graffiti LoRAs and oil-paint LoRAs. There are LoRAs that are experts in watercolor, nature photography, impressionism, Salvador Dali, The Rock, Ariana Grande, comic book art, tarot art, and storybook. And if you enjoy making titillating images of fantasy warrior princesses or anime teenagers wielding uzis, well! You have an all-you-can-eat buffet of options. Head over to CivitAI or huggingface to check them out.

Basically, the base model does all the heavy lifting, getting the image close to what you’re after. The LoRA steps in and provides the finishing touches, but they can make a HUGE difference, especially if you’re targeting a particular style.

Wrap Up

The dizzying array of offerings for AI creators is in itself a prelude to what is true of any creative undertaking: multiple tools will be required to achieve particular ends.

ComfyUI is probably the most versatile software suite to bring the various digital gadgets into one place, but I would advise anyone looking at this medium not to lose sight of the big picture.

You could use Midjourney to create a certain artistic style, bring those images into FluxGym (running inside of Pinokio) to train your own FLUX LoRA, and then use a ComfyUI network to run your LoRA with a FLUX base model (and other tools) for your final output.

In short, there is no need to discard your favorite workflows and resources because there’s a new one in the mix. The FLUX suite of models is an incredible contender when it comes to the speed and quality of the full spectrum of AI workflows, but it need not replace them. The FLUX models can simply be leveraged to improve them all.