Getting Consistency in AI Image Generation - Blog

I spent three hours trying to get AI to generate 60+ consistent images for a painting configurator. It didn’t work. Here’s what happened and why.

What I Was Trying to Build

I’m building a configurator for Hoist Painting - think like a Tesla or Porsche car configurator, but for painting your house. Users select different options (roof slope, trim style, siding type, etc.) and see visual representations of how those choices look.

The key requirement: consistency. When a user switches from “flat roof” to “gentle slope roof,” they should see the same house, from the exact same angle, with the same lighting - only the roof should change. Everything else needs to be pixel-perfect identical.

The goal was for this to look interactive.

I needed 60+ images total, with 4-5 variations per category. All in a clean, minimalist matte white 3D render style.

What I Tried

I tested three different AI image generation models:

Google Gemini Flash Image

This was the best of the bunch. It could generate nice-looking matte white 3D renders. The style was right. But when I asked it to generate variations (like different roof slopes), I got one of two problems:

The images were nearly identical - the model would copy the reference image instead of making distinct changes
Or the images were completely different - different house structures, different camera angles, different everything

I tried 9 different approaches:

Extremely detailed prompts (20,000+ words)
Explicit “DO NOT CHANGE” instructions (this actually made things worse)
Reference image chains (use first image as reference for subsequent ones)
Multi-turn conversations (establish context, then request edits)
And more…

Nothing worked. The model couldn’t reliably edit just one element while keeping everything else identical.

OpenAI GPT-Image-1

This one technically worked - the API accepted my edit requests. But the output quality was terrible. Different house structures, inconsistent perspectives, mismatched details. One image would have 4 windows, the next would have 6. One would have a chimney, the next wouldn’t. It was unusable.

Google Imagen 4.0

This model is optimized for photo-realistic images. Even with explicit prompts saying “This MUST be a 3D render, NOT a photograph,” it generated realistic house photos. The model’s bias toward photo-realism was too strong to overcome.

Why It Didn’t Work

The fundamental problem: Generative AI is probabilistic, not deterministic.

These models are designed to create variations and diversity. They’re trained on millions of images and learn patterns, but they can’t guarantee consistency. When you ask them to “edit” an image, they don’t actually edit it - they regenerate a new image based on your prompt, which introduces randomness.

Think of it like asking someone to redraw a picture but change only the roof. Even with detailed instructions, they’ll introduce small variations - the windows might be slightly different, the camera angle might shift, the lighting might change. That’s exactly what happened with AI.

The matte white 3D render style wasn’t the problem - AI can do that. The problem was the consistency requirement. AI models excel at creating diverse, creative outputs. They’re terrible at creating pixel-perfect consistent variations.

What I Learned

After three hours of experimentation, I learned that:

AI can generate good individual images - The style was achievable
AI cannot guarantee consistency across variations - This is a fundamental limitation
Prompt engineering has limits - No amount of detail can overcome the probabilistic nature of these models
“Editing” doesn’t work as expected - Even when APIs claim to support editing, models regenerate rather than modify