Dramatically Improve AI Image Generation! Complete Guide to JSON Structured Prompts

🎯 Dramatically Improve AI Image Generation!
Complete Guide to JSON Structured Prompts

~ Prevent "Concept Bleeding" and Generate Images Exactly as You Envision ~

💡 A sequel to our "3 Major Schools" article!
Today we dive deep into JSON Structured Prompts — the pinnacle of the "Language-Controlled" approach

🎬 Chapter 1: Ete Wants to Know More

Ete (Muhai Eiten)

Ran! Remember when you told me about "Language-Controlled" and "Visual-Intuitive" types?

Ran (Yoneno Ran)

Yes, we discussed how Nano Banana Pro is "Language-Controlled" while SeeDream and Midjourney are "Visual-Intuitive."

Ete

But Nano Banana Pro can also read images, right? How is that different from Visual-Intuitive types? They seem the same to me...

Ran

Great question! Even though both "read images," the way they interpret those images is fundamentally different. Today, I'll explain that difference and dive deep into "JSON Structured Prompts" — the ultimate technique for Language-Controlled types.

🔍 Chapter 2: How Language-Controlled and Visual-Intuitive Types "Understand" Images Differently

Ran

Let me explain how Language-Controlled (Nano Banana Pro) and Visual-Intuitive (Midjourney, etc.) types read images differently.

Comparison	Language-Controlled (Nano Banana Pro)	Visual-Intuitive (Midjourney, etc.)
How images are used	Supporting material for "verbalization"	"Direct style transfer"
Processing priority	Text instructions > Image	Image ≥ Text instructions
Analogy	An architect reviewing references to draw blueprints	A painter looking at a sample to replicate it
Strengths	Logical composition, precise placement	Atmosphere reproduction, style mimicry

Ete

I see... So even though both "look at images," Language-Controlled types "think about it themselves using the reference," while Visual-Intuitive types "copy what they see"?

Ran

Exactly! Nano Banana Pro looks at an image and converts it into language first — "this is a blue sky with white clouds and..." — before processing. That's why it can logically combine text instructions with image information.

Ran

On the other hand, Visual-Intuitive types transfer the "visual atmosphere" of an image directly to the new image. That's why they excel at "make it look like this painting."

Ete

Got it. So it's not that one is better than the other — they just have different strengths.

Ran

Yes! And the technique that maximizes Language-Controlled strengths is today's main topic: "JSON Structured Prompts."

📝 Chapter 3: What is JSON Anyway?

Ete

Hold on! What even is "JSON"? It sounds like some programmer spell that's going to curse me...

Ran

Don't worry! JSON is simply "a set of rules for organizing information neatly." Let me explain with a familiar example...

🍱 JSON is Like a Bento Box with Dividers!

Regular sentences (natural language) are like throwing everything into a container without dividers. If you put curry, salad, and cake all together... the flavors mix and you end up with curry-flavored cake!

JSON is like the plastic dividers in a bento box (a compartmentalized Japanese lunch box). By physically separating the rice section, main dish section, and dessert section, no matter how much you shake it, the flavors won't mix!

Ete

Oh! That makes so much sense! So you're telling the AI "from here to here is about the background" and "from here is about the character" — creating clear boundaries!

Ran

Exactly! Let's look at actual JSON. You wrap everything in curly braces { } and write in the format "item name: value".

{ "background": "cyberpunk city", "character": { "clothing": "traditional kimono", "hair_color": "black" } }

※JSON stands for "JavaScript Object Notation" — a format for writing structured data. While commonly used in programming, it can also be used for AI instructions.

Ete

Cool! "Background" and "character" are in separate rooms. This way "cyberpunk" won't bleed into the "kimono"!

⚠️ Chapter 4: Why Do We Need JSON? The "Concept Bleeding" Problem

Ete

Oh, that reminds me! Once I wrote "a girl in kimono in a cyberpunk city" and the kimono started glowing like neon lights! What was that about!?

Ran

That's called "Concept Bleeding." The AI looks at the entire text and sometimes applies the "cyberpunk" concept to the "kimono" as well.

※"Concept Bleeding" (also known as "Attribute Leakage") is a phenomenon where attributes (colors, styles, atmospheres, etc.) of one element in a prompt unintentionally affect other elements.

❌ Common Problems with Natural Language

Prompt: "Cyberpunk city, girl in kimono, neon signs"

Result: LED-lit kimono, metallic skin, traditional patterns turned into holograms... 😱

Ete

That's exactly what happened! What I wanted was "a girl in a normal kimono standing in a cyberpunk city"...

Ran

With natural language, the AI processes the entire text as "one chunk," making it ambiguous which adjective applies to which noun. With JSON, you can clearly separate "background is cyberpunk" and "character's clothing is traditional."

✍️ Chapter 5: How to Write JSON Prompts and Key Considerations

Ete

Alright! Teach me how to write JSON! I'm going to master this!

Ran

Let me explain the basic rules. JSON has several rules you need to follow.

📋 Basic JSON Rules

① Wrap everything in curly braces { }
These are the "entrance" and "exit" of JSON. Without them, it won't be recognized as JSON.

② Wrap item names in "double quotes"
Like "background", item names must always be in " ".

③ Values are also usually wrapped in "double quotes"
Strings should be wrapped like "cyberpunk". (Numbers can stay as-is)

④ Separate items and values with :
Format: "item_name": "value".

⑤ Separate multiple items with ,
Don't put a , after the last item (a common mistake!)

Ete

Ugh... So many rules... I'm definitely going to make mistakes...

Ran

Don't worry! Nano Banana Pro is smart enough that it can often autocomplete minor mistakes. It doesn't have to be perfect.

⚡ Common Tips to Keep in Mind

・Don't overuse nested structures
Too many levels of nesting can cause confusion. 2-3 levels is usually the sweet spot.

・You can use any language
Nano Banana Pro understands multiple languages, so you can use English, Japanese, or whatever you're comfortable with.

・Include a brief introduction
Adding "Please generate an image based on the following JSON format:" before your JSON makes it more reliable.

🎯 Chapter 6: Hands-On! Before/After Comparison

Ran

Now let's compare the same scene using "natural language" versus "JSON."

❌ Before: Natural Language Prompt

Cyberpunk city background, neon lights, rain, a beautiful girl wearing a traditional floral kimono holding an old Japanese umbrella standing in the center, black hair, melancholic expression, high contrast, cinematic lighting

Common problems: Kimono turns neon colors, umbrella becomes a transparent vinyl umbrella, neon colors reflect too much on skin...

✅ After: JSON Structured Prompt

Please generate an image based on the following JSON format: { "meta_instruction": "Prevent attribute mixing and process each element independently", "canvas_settings": { "aspect_ratio": "16:9", "quality": "8K, cinematic", "overall_mood": "melancholic, fantastical" }, "lighting": { "global_lighting": "dark and moody", "light_sources": [ {"type": "neon signs", "color": "pink and blue", "target": "background only"}, {"type": "spotlight", "color": "warm white", "target": "character's face"} ] }, "subjects": [ { "name": "girl", "appearance": { "clothing": "traditional floral kimono (red and white)", "clothing_constraint": "no sci-fi elements, maintain pure traditional Japanese style", "hair": "long black hair", "expression": "melancholic" }, "props": "traditional Japanese paper umbrella (bangasa)", "position_in_frame": "center" }, { "name": "background", "style": "cyberpunk, neon signs, high-rise buildings", "weather": "rain", "position_in_frame": "background (blurred)" } ], "color_constraints": { "character_colors": "natural skin tone, red and white kimono", "background_colors": "neon blue, purple, black" } }

Ete

Wow! "clothing_constraint: no sci-fi elements" and "target: background only" — it's so detailed!

Ran

The key is making "subjects" an array (list) and completely separating "girl" and "background." This way, there's no worry about neon light bleeding into the kimono.

Ete

I see... Using "meta_instruction" to say "don't mix things" at the very beginning is pretty clever too.

Ran

Exactly. With JSON, you can achieve fine-grained control that's difficult with natural language — like "apply this attribute only to this element."

📝 Copy-Paste Ready Basic Template

{ "canvas_settings": { "aspect_ratio": "[16:9 / 1:1 / 9:16 etc.]", "quality": "[high quality, 8K etc.]", "overall_mood": "[bright / dark / fantastical etc.]" }, "subjects": [ { "name": "[main character name]", "appearance": { "clothing": "[clothing details]", "hair": "[hairstyle, hair color]", "expression": "[expression]" }, "position_in_frame": "[center / left / right]" }, { "name": "background", "style": "[background details]", "position_in_frame": "background" } ] }

✨ Summary

Ete

JSON is incredible! Now I can say goodbye to the "glowing kimono problem"!

Ran

Let me summarize today's key points.

📌 Today's Key Takeaways

1. Language-Controlled vs Visual-Intuitive Differences
Language-Controlled types "verbalize" images before processing; Visual-Intuitive types directly transfer "visual appearance"

2. JSON is Like "Bento Box Dividers"
By clearly separating information, you can prevent concept bleeding

3. Basic JSON Rules
Wrap in { }, enclose item names in " ", separate with commas

4. Practical Tips
Separate subjects using arrays, specify constraints clearly, convey intent with meta instructions

Ete

Nice work, Ran! Now I can see the path to becoming a JSON master!

Ran

Use the template as a base and gradually customize it to your preferences.

Ete

Alright! I'm going to try this right away! See you next time!

Ran

Until next time! Take care!