Free | 150 credits

Z-Image Character Consistency: Multi-Turn Guide | Z Image Tutorial

Learn how to maintain character consistency across multiple images using Z-Image (Z Image) multi-turn conversation format. Define characters once, make precise edits while preserving details.

Z-Image TeamReddit··9 min read
Z-Image Character Consistency: Multi-Turn Guide | Z Image Tutorial

One of the biggest challenges in AI image generation is maintaining consistency across multiple images. Z-Image Turbo addresses this with its unique multi-turn conversation format.

This guide explains how to define characters once and make precise modifications while preserving their core identity.

Understanding Z-Image's Chat Format

The Qwen3-4B Foundation

Z-Image Turbo uses Qwen3-4B as its text encoder. This model was trained on conversations with a specific structure:

<|im_start|>system
Instructions for the model<|im_end|>
<|im_start|>user
The user's request<|im_end|>
<|im_start|>assistant
<think>
Model's reasoning process
</think>
Response to the user<|im_end|>

When you use Z-Image through its custom ComfyUI nodes, you can access this full structure.

Why This Matters for Consistency

The model was trained to:

  1. Follow system instructions throughout the conversation
  2. Remember context from previous turns
  3. Use <think> blocks for reasoning about changes
  4. Maintain consistency while making requested modifications

By structuring your prompts as multi-turn conversations, you give Z-Image explicit context about what should stay the same.


The Multi-Turn Workflow

Step 1: Define Your Character (First Turn)

Create a comprehensive character profile with every detail you want preserved:

# Character Profile: Sarah Chen

## Core Identity
- Name: Sarah Chen
- Age: 28
- Ethnicity: Chinese-American
- Build: Slim, 5'6"

## Face & Features
- Face Shape: Oval with high cheekbones
- Skin: Fair with warm undertones, light freckles across nose
- Eyes: Dark brown, almond-shaped, slight upturn at corners
- Eyebrows: Natural, slightly thick, well-groomed
- Nose: Small, slightly upturned
- Lips: Full, natural rose color
- Expression: Default confident half-smile

## Hair
- Color: Black with subtle warm brown highlights
- Length: Mid-back
- Style: Usually worn down, slight natural wave
- Texture: Thick, healthy, glossy

## Distinguishing Features
- Small beauty mark below right eye
- Delicate gold hoop earrings (always wears)
- Silver necklace with crescent moon pendant

## Default Attire
- Style: Modern professional, minimalist
- Colors: Often black, white, navy, burgundy
- Preference: Clean lines, quality fabrics

Step 2: First Image Generation

Use the full character sheet in your first prompt:

[System Prompt]
Generate a photorealistic portrait following the character sheet exactly.
Maintain all specified features and distinguishing marks.

[User Prompt]
# Character Profile: Sarah Chen
[Full character sheet from above]

Current scene: Professional headshot, studio setting, soft lighting

[Think Block]
Subject is Sarah Chen as defined in the character sheet. Key features
to ensure: high cheekbones, beauty mark below right eye, gold hoop
earrings, crescent moon pendant. Setting is studio, keep background
simple grey gradient. Lighting should be soft and flattering.

[Assistant Response]
Here's Sarah Chen's professional headshot as specified.

Step 3: Make Modifications (Second Turn)

Now request a specific change while referencing the established character:

[User Prompt]
Change the setting to a Tokyo street at night. Keep Sarah exactly
the same - her face, hair, earrings, necklace - but put her in
a casual outfit: black turtleneck and jeans.

[Think Block]
Preserve: Face shape, skin tone, freckles, eye shape, beauty mark,
earrings, necklace. Change: Setting to Tokyo night street, clothing
to casual black turtleneck and jeans. Add environmental lighting
from neon signs while maintaining face recognizability.

[Assistant Response]
Here's Sarah on a Tokyo street at night in casual wear.

ComfyUI Implementation

Required Nodes

ZImageTextEncoder

  • Main node for first turn
  • Inputs: system_prompt, user_prompt, thinking_content, assistant_content
  • Outputs: conditioning, formatted_prompt, conversation

ZImageTurnBuilder

  • Adds subsequent turns to conversation
  • Inputs: previous (conversation), user_prompt, thinking_content
  • Outputs: conditioning (when clip connected), conversation

Basic Workflow

CLIPLoader (Lumina 2)
       ↓
ZImageTextEncoder ──────────────────────→ KSampler
├── system_prompt: "Generate photorealistic..."
├── user_prompt: [Character sheet]
├── thinking_content: [Reasoning about features]
└── assistant_content: [Brief response]

Multi-Turn Workflow

ZImageTextEncoder (Turn 1: Character Definition)
       ↓ conversation
ZImageTurnBuilder (Turn 2: First Modification)
       ↓ conversation
ZImageTurnBuilder (Turn 3: Second Modification)
       ↓ conditioning
KSampler

Settings

ParameterRecommended Value
add_think_blocktrue
is_final (TurnBuilder)true (on last turn)
template_presetphotorealistic or custom

Complete Example: Walter Finch

Let's walk through creating a consistent character across multiple images.

Turn 1: Define Walter

System Prompt:

Generate an image in classic American comic book style.
Bold outlines, flat colors with halftone shading.

User Prompt:

# Character Profile: Walter Finch (Wally)

## Core Identity
- Name: Walter Finch (Nickname: Wally)
- Gender: Male
- Age: 72
- Ethnicity: Caucasian (British descent)
- Skin Tone: Fair, warm sun-kissed glow on cheeks and nose

## Head & Face
- Face Shape: Oval, prominent jawline, receding hairline
- Eye Color: Ice-blue with subtle gold flecks around the iris
- Hair: Pure white, side-parted, full beard and mustache
- Glasses: Thin, gold-rimmed reading glasses
- Expression: Warm, gentle smile with a twinkle in his eye

## Attire
- Light gray and blue checkered button-down shirt
- Dark grey wool trousers
- Brown leather loafers

## Props
- Pale lavender ceramic mug of coffee
- Gold pocket watch chain visible in breast pocket

Think Block:

Creating Walter "Wally" Finch: 72 years old, British descent,
warm grandfather type. Key features: ice-blue eyes with gold
flecks (distinctive), gold-rimmed glasses, full white beard
side-parted hair. Must include: lavender coffee mug and gold
pocket watch chain. Style: American comic book with halftone.

Assistant Response:

Here's Walter Finch, the kindly British gentleman.

Turn 2: Modify Beard Color

User Prompt:

Let's change his beard to have some red mixed in with the white.
Keep everything else exactly the same.

Think Block:

Modifying beard only: change from pure white to red and white
mixed. Preserve: ice-blue eyes with gold flecks, gold-rimmed
glasses, facial structure, lavender mug, pocket watch chain,
checkered shirt, warm expression.

Turn 3: Add New Element

User Prompt:

Let's put a cute baby flying sloth hovering above his head too.

Think Block:

Adding element: baby flying sloth above Walter's head. Preserve
all of Walter's features including the red-white beard from
previous turn. The sloth should be small and cute, floating
or hovering position.

Result

The final image contains Walter with all his defined features, the red-white beard modification from turn 2, and the flying sloth addition from turn 3.


Why Using Qwen3 for Prompt Generation Helps

Z-Image's encoder is Qwen3-4B. All Qwen3 models share the same tokenizer.

When you use a Qwen3 model to generate your character descriptions:

  • Same vocabulary means same token IDs
  • Semantic nuances transfer directly
  • Think block reasoning primes the encoder

For best results, consider using Qwen3-72B or larger to generate detailed character sheets, then feed them directly to Z-Image.

Example Qwen3 System Prompt

You are a visual prompt engineer for Z-Image Turbo.

Generate detailed, visually-specific character descriptions.
Focus on concrete visual details - colors, textures, specific
features. Avoid abstract concepts.

Structure your output as:
1. <think> block with visual planning
2. Hierarchical character profile with sections

The output will be used directly as a Z-Image prompt.

Tips for Better Consistency

Be Exhaustively Specific

Don't leave anything to chance. If you want specific eye color, say exactly what it is. "Blue eyes" is vague. "Ice-blue with subtle gold flecks around the iris" is specific.

Use the Think Block

The think block lets you explicitly state what to preserve:

<think>
Changing: outfit to summer dress
Preserving: face shape, eye color (hazel with amber ring),
beauty mark on left cheek, ear piercings (two in left ear),
nose shape, lip fullness
</think>

One Change Per Turn

Don't overload modifications. Make one targeted change per turn:

Good:

  • Turn 2: Change outfit
  • Turn 3: Change background
  • Turn 4: Add accessory

Risky:

  • Turn 2: Change outfit AND background AND add accessory AND alter lighting

Reference Previous Content

In later turns, briefly reference what should stay:

Keep Sarah's face, hair, and jewelry exactly as before.
Only change her outfit to a red evening gown.

Consistent Style Markers

Keep style keywords consistent across all turns:

[Every turn ends with]
photorealistic, 8k, professional photography, shot on Canon EOS R5

Limitations & Expectations

What Works Well

  • Preserving distinctive features (scars, beauty marks, eye color)
  • Maintaining clothing style across scenes
  • Keeping accessories consistent
  • Changing backgrounds while preserving subject

What's Challenging

  • Exact face reproduction (this isn't face-swap)
  • Perfect consistency across wildly different poses
  • Maintaining consistency across different art styles

Realistic Expectations

Multi-turn conversation improves consistency but doesn't guarantee perfection. Expect:

  • 80-90% feature preservation with good prompting
  • Occasional need for regeneration
  • Better results with distinctive characters

Troubleshooting

Features Drifting

Problem: Character looks slightly different each generation.

Solution: Add more specific distinguishing features. Instead of "brown hair," use "chocolate brown hair with copper highlights, falling to mid-back, slight wave, side-parted."

Modifications Not Applying

Problem: Requested changes don't appear.

Solution: Be explicit in think block about what changes. State the change first, then list what stays the same.

Style Inconsistency

Problem: Art style changes between turns.

Solution: Include style keywords in system prompt and repeat them in each turn's assistant response.


Template Files

Z-Image includes 140+ templates in ComfyUI. For character work, try:

TemplateBest For
photorealisticRealistic characters
character_designReference sheets
comic_americanComic book style
anime_ghibliGhibli-style characters
portrait_studioStudio portraits

Access via template_preset in ZImageTextEncoder.


Get Started

  1. Download ComfyUI nodes: Comfy-Org/z_image_turbo
  2. Create a character sheet using the template above
  3. Start with Turn 1 - define everything
  4. Iterate with Turn 2+ - make targeted changes

Or practice basic prompting at z-image.vip.


References


Explore Z-Image at z-image.vip — free, unlimited.


Keep Reading