Complete Guide to "3 Schools" of AI Image Generation
Language Control, Visual Intuition, Self-Learning — Find Your Perfect AI Partner
Raaaaaan! Help me! I'm so confused!
*sigh* ...Ete-senpai, it's so early in the morning. What's wrong?
I tried to use AI image generation tools! But there are TOO MANY! Midjourney, SeeDream, Nano Banana Pro, Ideogram, Flux, DALL-E... I can't tell them apart!
I see, you're lost in tool selection. Actually, these tools can be categorized into 3 major schools.
3 schools? That sounds interesting! Tell me now!
Sure. Let me give you the overview first.
Type A: Language Control (The Architect)
→ Logically instruct with words
Type B: Visual Intuition (The Painter)
→ Show reference images intuitively
Type C: Self-Learning (The Trainer)
→ Train the AI to become your specialized tool
Hmm... So it's like cooking — someone who follows recipes exactly, someone who copies by looking, and someone who creates original dishes?
Wow, that's a good analogy for you, Ete-senpai! Exactly like that. Let's dive into each one.
First, Type A "Language Control." This approach treats AI as "an architect who follows logical word-based instructions."
Word instructions... I'm not good at writing prompts. I once wrote "blue sky, white clouds, and a boy with a red hat," and got red clouds instead!
That's called "Attribute Leakage." Natural language is ambiguous, so AI sometimes gets confused about which adjective goes with which noun.
*Attribute Leakage refers to the phenomenon where adjectives in prompts are applied to unintended objects.
Yes, exactly! So what's the solution?
Modern Type A tools use "JSON Structured Prompts." Think of it like a multiple-choice answer sheet — information goes into clearly defined boxes.
*JSON (JavaScript Object Notation) is a format for structuring data, commonly used in programming.
{
"scene": {
"location": "park",
"weather": "sunny"
},
"subjects": [
{
"type": "boy",
"attributes": {
"accessory": "red_hat"
}
},
{
"type": "cloud",
"attributes": {
"color": "white"
}
}
]
}
I see! So "red_hat" is inside the "boy" box, so it can NEVER mix with "cloud"!
Exactly! You're quick to understand, Ete-senpai.
🛠️ Representative Type A Tools
① Nano Banana Pro (Google DeepMind)
A fusion of Gemini 3 Pro language model and image generation engine. Extremely high logical understanding — it can interpret abstract instructions like "create a futuristic sneaker ad reflecting current youth trends."
② Ideogram v2 (Typography Specialist)
Unmatched in text-design fusion. Excellent at long text placement and font style reproduction — perfect for T-shirt prints and logo creation.
③ Recraft (Vector Wizard)
Can generate vector images (SVG). Vectors can be scaled infinitely without quality loss — from business cards to building advertisements.
So Type A is all about "accuracy" and "logic." Perfect for people who want precise blueprint-style results!
But Ran! I'm bad at explaining with words! Sometimes I just want to say "like THIS" or "THAT vibe"!
Then Type B "Visual Intuition" is perfect for you! This approach believes "one image speaks louder than a hundred words."
The core of Type B is the "Reference" feature. You show reference images to AI, and it extracts the "style" or "composition" into new images. Modern tools can separately recognize "composition," "style," and "character identity."
🛠️ Representative Type B Tools
① SeeDream 4.5 (ByteDance)
Developed by TikTok's parent company. "Smart Canvas" treats images as "concept extraction sources" rather than just sketches. Change only the art style while keeping the face — all with sliders!
② Midjourney (--sref / --cref)
Unparalleled aesthetic sense. "--sref" (Style Reference) transfers color palette, texture, and lighting. "--cref" (Character Reference) maintains character consistency across different poses.
③ Krea AI (Real-time Generation)
Real-time conversion — draw a line, and it instantly becomes a high-quality image. No waiting!
Type B is perfect for intuitive people like me! I love collecting images on Pinterest!
But Ran, I have original characters! It's annoying to describe their features every time, and sometimes their faces change slightly!
That's where Type C "Self-Learning" comes in. Unlike Types A and B, this approach "modifies the AI model itself."
The core technology is "LoRA (Low-Rank Adaptation)" — think of it as "sticky notes you insert into an encyclopedia."
*LoRA is a lightweight technique for additional training on existing AI models. Without modifying the original model (several GB), you can teach new concepts with just small differential data (tens to hundreds of MB).
Amazing! It's like a summoning spell! "By my name, I command thee, Hanako, appear!"
...Well, that's close enough.
🛠️ Representative Type C Tools
① Flux.1 (Black Forest Labs)
Created by former Stability AI engineers. Open-weight model allowing free additional training with Midjourney-level quality.
② Civitai & Tensor.art (Platforms)
Web services where you can train LoRA using just a browser — upload images, follow the wizard, done! No high-spec PC required.
I see... Type C is about creating your "personal craftsman AI." Ultimate for manga artists or VTubers who draw the same character repeatedly!
| Feature | Type A Language Control |
Type B Visual Intuition |
Type C Self-Learning |
|---|---|---|---|
| Starting Point | Words & Logic | Images & Senses | Datasets |
| Best For | Precise layouts Text & UI design |
Mood creation Style imitation |
Character consistency Mass production |
| Difficulty | Medium | Low | High |
| Reproducibility | Very High | Medium-High | Highest |
Actually, you don't need to stick to just one school. In 2026, these approaches are merging. For example: Create character LoRA (Type C) → Apply style reference in Midjourney (Type B) → Final text and layout in Ideogram (Type A).
Like pairing wine with food, choosing tools for each step is the 2026 best practice.
🍷 Basic Terms
Prompt → "Order / Recipe"
Your order to the AI chef. Type A specifies "5g salt, medium-rare," while Type B says "like that photo."
Seed Value → "Parallel Universe Coordinates / Dice Roll"
AI rolls dice for each generation. Fixing the seed lets you reproduce the exact same result.
Latent Space → "Infinite Library"
AI's mental universe containing all images and concepts. Find "cat in spacesuit" between the "cat" and "spacesuit" shelves.
🧪 Type C Terms
Checkpoint (Model) → "Painter's Brain (Foundation)"
Base talent — realist painter, anime artist, etc. Flux.1 and Stable Diffusion are checkpoints.
LoRA → "Intensive Course / Special Move Training"
Teaching a painter only "how to draw a specific character." Used with the checkpoint.
Trigger Word → "Password / Summoning Spell"
A keyword to invoke LoRA-trained content. "When you hear this password, draw that character."
Ran, you really helped me today! Now I totally understand the 3 schools! You're amazing!
AI is no longer a competitor to fear. It's an excellent orchestra awaiting your direction.
By using different tools for different stages, your creativity can reach unprecedented heights.
Alright! I'll become the conductor! I'm going to master AI image generation!
See you next time!
Until next time! Take care!