The Interactive & Immersive HQ

FLUXGYM to ComfyUI: Building and Using Custom LoRAs

The single most powerful tool for sharpening the output of any AI image-generation workflow is a LoRA (aka Low-Rank Adaptation). It’s considered “low-rank” because the diffusion model that it’s working with (in this case, one of the FLUX models) is still doing most of the heavy lifting, along with the billions of data points that it has devoured and digested somewhere in Silicon Valley. (For a little background on all of this, take a look at my last blog post about the FLUX toolset and ComfyUI.) In this post, I’m going to outline how to train a FLUX LoRA using FLUXGYM and integrate this custom LoRA into the simplest of ComfyUI workflows.

For my example LoRA, I took the silly hobbyist approach and trained a “character LoRA” on my own likeness. If you want to try this yourself, this is a great way to get your feet wet and maybe generate some laughs on your social media feed. You can use whatever dataset you like, of course. But this is a good place to start. These are the nine images I used to train my LoRA:

A sequence of eight portraits of the same person with different facial expressions ranging from serious to playful, with various angles and direct eye contact.
custom lora

What’s a LoRA again?

To review, a LoRA is a refiner… it takes the work of the central diffusion model and modifies it according to its training. Think of it as a plug-in or filter… Where the LoRA enters the diffusion process and the extent to which it varies the work of the diffusion model is a matter of changing a couple of parameters in your workflow. So you might have an oil paint LoRA or an anime LoRA, each of which has been trained on a set of images. As such, a LoRA has a particular look… it could be trained only on the work of a single artist or even a certain specific era of that artist’s work. Or–as is often the case–it is trained on imagery of a single person, so that when it is invoked, that person’s likeness and features will be woven into the final output.

A person in a wizard costume holds a glowing orb and a staff, standing in a forest with glowing mushrooms.

LoRAs in the Wild

If you’re using a LoRA for a creative purpose–say, training one on a bunch of your own nature photographs so that you can generate imagery inspired by your own artistic vision–then its utility is obvious. But LoRA-training in the wild is aimed at targets that are categorically less spiritual. Think, marketing! Think, social media! You might want to have a particular person as your brand ambassador or influencer, so you train a LoRA on her likeness and now, you don’t need to fly her to India for a photoshoot in front of the Taj Mahal wearing your brand’s new hoodie; you just generate that image on your laptop. Coming up with new places and poses for your influencer was never so easy.

A woman in a white hoodie stands in front of the Taj Mahal.

Or you might train a LoRA on your specific product, so that the product itself can be rendered accurately. Let’s say you have a line of makeup that has particular containers and branding… you could describe it in excruciating detail to your main diffusion model and hope that it gets it right (spoiler: it won’t), or you can photograph the products and train an expansive LoRA to know what you mean when you say the brand’s name and point to a specific product. (And yes, you can train a LoRA to respond to specific keywords for specific members of its data-set. LoRAs aren’t stuck doing only one type of modification.)

A werewolf with glowing eyes and sharp teeth stands aggressively in a moonlit forest, with trees silhouetted against a full moon.

How to Train a FLUX LoRA in FLUXGYM

Training a LoRA with FLUXGYM is remarkably simple and easy. There are multiple ways to do this, but for the purposes of this post I’m going to focus on doing it locally, which means you need a graphics card (and btw I’m running this on Windows 11. I don’t think it will run on a Mac, but I can’t find a definitive answer to that. I don’t think you can.)

Download and install pinokio

Pinokio is sort of a portfolio manager for all your AI image software (sort of like STEAM for gamers) and it’s pretty handy. One of the things that I like about it is that it stays relegated to one folder… when it installs additional system components, those components are confined to your pinokio folder, so you don’t run the risk of messing up your software environment.

Download and install FLUXGYM from within pinokio

Once you get pinokio installed, you’ll see that it has it’s own browser for additional software tools. FLUXGUM is likely to be at the very top, but if you don’t see it, write it in the search field. When you go to install it, it will announce a raft of software components that it needs to run properly, including the models that it needs, which amounts to dozens of gigabytes. If you’re wondering why you need to download models you already have, I share your ire, but I haven’t discovered a way to access them from within the software separately.

A person in a sci-fi uniform stands on an alien landscape with two moons, spiky rock formations, and blue crystalline structures under a purple sky with lightning.

You may need to take a few passes at the FLUXGYM install. It may appear that the install has failed, but if you just find the FLUXGYM download link and initiate the install process again, you’ll see that a bunch of the necessary components are already there. Just repeat the install. It downloads all of the system components it needs first, and then it installs FLUXGYM last. Your first click on the FLUXGYM download button may be the first of many. The process takes quite a while…

There are a bunch of click-throughs that I won’t describe… no need to customize anything. Just be aware that it’s going to occupy A TON of space on your drive, and the install takes a long time, depending on your internet speed. (My pinokio folder is nearly 100gigs with only an install of FLUXGYM).

FLUXGYM should automatically run when the install is complete. If not, kick it open from your list of apps.

A man with a thoughtful expression, wearing a dark jacket and white shirt, poses in front of a textured, colorful background. His hand is touching his chin.

Get Our 7 Core TouchDesigner Templates, FREE

We’re making our 7 core project file templates available – for free.

These templates shed light into the most useful and sometimes obtuse features of TouchDesigner.

They’re designed to be immediately applicable for the complete TouchDesigner beginner, while also providing inspiration for the advanced user.

LoRA Parameters

Most of these are pretty easy and self-explanatory. Name your LoRA and come up with a keyword that will activate it (something that you wouldn’t normally use in a prompt. I chose “ScottMann1”. Not creative, but I don’t write that very often). Whenever I write ScottMann1 in a ComfyUI prompt, and I’m using my LoRA, the LoRA will activate. Choose which model you want to use. The dev model is bigger and a littler better, but slower and greedier for processing power. Even if you have a powerful machine, you might go with the Schnell model if speed is important to you.

VRAM is important. Set it to match the amount of VRAM you have on your graphics card.

“Repeat trains per image” is the number of times it will analyze each image. Ten is the default and that works fine. I think six is the minimum. Higher values will take longer and be more accurate.

“Max trains epoch” at sixteen is also fine (it’s the maximum number of epochs). An epoch is the number of times it looks at the entire data set. So if you have ten images, with ten repeat trains per image, and you set the “Max trains epoch” to sixteen (which is the default), you should have an expected training step to be 1600. (10x10x16) Again, the defaults here are fine. You can lower them if things are running slowly.

Sampling

You can sample your LoRA–meaning, have the model generate a test image of your training data–as the LoRA is created. By entering in a sample prompt (Say, “ScottMann1 a closeup image of a knight standing in the forest”) the model will generate an image for you so you can see how the training is going. This will slow down your training, but it can be helpful. Sample the image about every 100-200 steps. Leave “Sample image every N steps” blank (or perhaps at zero?) if you don’t want to bother.

Assemble Your Dataset Imagery

As you prepare your images for the training, one piece of advice: crop them or shoot them square. (You can resize them all to 512×512 before passing them to the trainer, but it will do this for you, which is probably quicker). I tried to use portraits on one pass, and the LoRA it generated for me resulted in images that were oriented sideways. Also, if you’re really just going for a face-swap, best to just crop the images to include only your face. If you are wearing the same piece of clothing in each image (as I was, when I whipped out my phone and took a bunch of selfies), you might end up seeing it in your output images.

Best practices dictate that you use a bunch of different images, in different light, in different environments. Again, for the quick-and-dirty approach, don’t worry about it. But if you’re doing something professional, more is more, especially if you’re going after a particular artistic style. Once you’ve prepped all your images, select them all and drag them into the “Dataset” window where it says “upload your images.”

Caricature of a politician speaking at a podium with a seal, gesturing with raised hands, against a curtain backdrop.

Captions

At the bottom of the Dataset window is where each image in your dataset will get a caption, and they should immediately appear once you load your images into the interface. If you’re doing a character LoRA, you don’t really need detailed captions, though it helps to include things like “smile” or “frown” or “grimace.” Each image will get your standard prompt (in my case, “ScottMann1”), but you can add additional info… so you’re prompts might look like “ScottMann1 smile happy” or “ScottMann1 grimace.” If you’re doing something more advanced, like training a style LoRA, you should definitely create more detailed prompts, because the diffuser will use that data to identify image specifics (like day, night, flower, ocean, city, etc). The more information you give about each image, the “smarter” your LoRA will be.

Florence

There’s an option to caption your images using Florence2. Florence is an AI bot that will look at your images and create the captions for you, but it’s accuracy is a little dubious. You might give it a shot (just click on the chunk of text where it says “add AI captions using Florence-2”). Just make sure you review them carefully. This tool is in its nascent stages of development.

Screenshot of a machine learning platform showing steps to train a model. Left: LoRA info input. Center: Dataset with images and captions. Right: Training configuration and "Start training" button.

Start Training

Let it rip. In the upper-right hand corner is the “start training” button. Note that FLUXGYM doesn’t download the models you requested until you do your first training, so it will essentially finish the installtion at this stage. The models are huge, so it will take time. Grab lunch or go to bed. You can check on the process by clicking on the terminal option on the left side of the screen, which shows you what’s going on in the background (downloads, etc.) This is very helpful, but don’t accidentally click STOP.

Once the model has downloaded, the process will take thirty minutes to several/many hours depending on the speed of your hardware, the number of images you’ve given it, and the level of analysis you’ve chosen. When the LoRA is complete, you’ll find it in pinokio/api/fluxgym.git/outputs.

One thing you may notice is that there will a few different versions of your LoRA safetensors file in the output folder. This is because it outputs a LoRA after a certain amount of training. If your LoRA is overcooked (meaning its results are heavy-handed in some way) you can try one of the other LoRAs that have received less training. The safetensors file that is just the name you gave it (i.e. scottmann1.safetensors) is your file. The preliminaries have a bunch of numbers in the filename.

A man with a long beard holds a glowing crystal ball, seated in a decorated room with candles and a large ornate disc in the background.

Using Your New FLUX LoRA

Time to give it a shot. I’ve included this link to a (very) basic ComfyUI workflow for you to test your LoRA. When you open it, you’ll get a couple of errors; the most obvious is that you don’t have my custom LoRA. Take your LoRA and place it in your LoRAs folder at the path:(ComfyUI\models\loras\Flux\scottmann1.safetensors). Then go to the LoRA loader and load yours. Note that I’ve put a FLUX folder inside my LoRA folder. I HIGHLY recommend that you do this, because LoRAs will only play with the models that they were designed for. If you have all your LoRAs mixed together, you’ll have a heck of a time figuring out which one will work with which model.

Any other errors you get will be due to not having a node or model. You can use the Manager to help you get the nodes you need into you custom nodes folder. And it will be necessary to have the FLUX model that you want to use downloaded (separate from the one that it downloaded into FLUXGYM. Annoying, yes.) So get either the FLUX DEV model or the Schnell Model and place it in ComfyUI/models/diffusion_models. NOT your usual checkpoints folder. You will also need to grab the ae.safetensors file available at both links and put it in ComfyUI/models/vae.There are two files that you need to put into your clip folder (the path is ComfyUI_windows_portable/ComfyUI/models/clip). One is t5xxl_fp16.safetensors and the other is clip_l.safetensors. These are text-encoders that FLUX models need to work properly.

The strength setting on your LoRA should be between 0 and 1. One is generally the best setting and will make your LoRA work at it’s maximum potency. You can try going up to 1.5 or more, but I’ve found that this generally makes things get weird. If you’re mixing LoRAs, it’s definitely worth turning some of them down. Start at .5 if you just want a smidgeon of the LoRA to activate.

One of the oddities of this workflow is that I’ve plugged the output of the LoRA into the negative prompt, which is completely counterintuitive. I did this because the output needs to terminate somewhere, and FLUX models don’t use the negative prompt. The LoRA (also) sits on the model path, which is where it’s really doing its work. I’m offering this network because it has the smallest possible number of nodes that are new or unusual. (One exception is the Power Lora Loader from RGThree. You want this node).

A Few Tips!

I had more accurate results when I ran my outputs to 1024×1024, but you can try any resolution you like. 512×512 gave me less accurate results with respect to the rendering of my face.

Use ChatGPT to help you write detailed prompts, and specify that it’s for a FLUX model.

I used additional LoRAs to generate most of my images, downloaded from CivitAI. Search for FLUX LoRAS, download, put them in your LoRAs folder, and add them to the Power LoRA Loader. It makes for much more interesting results. You need to right-click on the name of the LoRA (once it’s loaded into the Power LoRA Loader) and choose “show info” to get access to the keywords that activate it. Put some of those terms into your prompt, or the LoRA won’t activate. If you don’t see any data in there, you can opt to download it from CivitAI.

Man standing on a beach with hands in pockets, wearing a button-up shirt and jeans, with waves in the background. Black and white sketch.
Fighting LoRAs. One frustration you may run into is that LoRAs will come into conflict–one LoRA interferes with the other. This image engages a charcoal drawing LoRA that I’ve had incredible results with, but for some reason when it’s working in tandem with my LoRA, it doesn’t look as much like a charcoal drawing… more like a Photoshop filter on a photograph. Plus, most of the images don’t really look like me (this is the closest). If you have insight on this, please message me on Instagram (@scott.p.mann).

Remember that a ComfyUI network is embedded in your output images. If you need to get back to a certain workflow, drag the output image right into the ComfyUI window and you’ll have it back.