AI Image Models — Full Overview
21 companies · 28 models in our quiz · 91+ versions documented
Same Prompt. 28 Models.
Choose a prompt and see how every model responds to the exact same input. Models from the same company appear side by side.
A cat curled up asleep on an open book, next to it a steaming cup of tea, cozy living room atmosphere.
⭐ Big Players
GPT Image 1.5
OpenAI's next-generation image model, succeeding DALL-E 3. Dramatically improved text rendering within images, more reliable instruction following, and stronger photorealism. Deeply integrated into ChatGPT and the OpenAI API.
FLUX.2 [max]
The premium tier of Black Forest Labs' FLUX.2 series. Maximum quality variant with improved photorealism, superior facial detail, accurate anatomy, and enhanced text rendering — pushing the FLUX architecture to its current ceiling.
Seedream 4.5
ByteDance's image generation model, marketed under the BytePlus enterprise brand. Builds on ByteDance's deep video and media processing expertise to produce cinematic, high-quality imagery with excellent composition and color accuracy.
Nano Banana Pro
Google's Gemini 3 Pro Image (codename: Nano Banana Pro) is a 2025 image generation and editing model built on the Gemini 3 architecture. A step up from Gemini 2.5 Flash Image (codename: Nano Banana), it is designed for professional design workflows — combining high-fidelity image synthesis with precise instruction-following for complex editing tasks.
Model Cards
DALL-E 3
OpenAI's widely adopted image model, integrated into ChatGPT and Microsoft products. Excels at following complex text instructions and generating photorealistic scenes. Its images often have a slightly 'polished' look — almost too perfect.
Imagen 2
Google DeepMind's second-generation text-to-image model, widely integrated across Google products including Google Slides, Workspace, and Search Generative Experience. Substantially improved photorealism and text rendering over Imagen 1, with a focus on diverse, high-quality imagery for productivity applications.
Imagen 4
Google DeepMind's fourth-generation text-to-image model. Substantially improved photorealism, fine-grained detail, and text accuracy over previous versions. Particularly strong in landscape photography, architecture, and product visualization.
Stable Diffusion 3.5 Large Turbo
A fast, distilled variant of Stability AI's SD 3.5 architecture. Generates high-quality images in fewer sampling steps without sacrificing significant quality. Improved typography, compositional understanding, and human anatomy over SD 2.x.
Stable Diffusion 1.5
The open-source model that democratized AI image generation. Because it's fully open, hundreds of fine-tuned versions exist. Quality varies widely — some versions rival top proprietary models, others show clear artifacts especially in hands and faces.
FLUX.1.1 [pro]
Black Forest Labs' enhanced commercial API model, succeeding FLUX.1 Pro with 6× faster generation and improved image quality. Stronger prompt adherence, better detail rendering, and improved color accuracy — accessible via the BFL API and partnered platforms.
FLUX.1 [dev]
Black Forest Labs' open-weights guidance-distilled model for non-commercial use. Derived from FLUX.1 Pro via guidance distillation, it produces high-quality outputs comparable to the Pro variant but optimized for local deployment and research.
FLUX.1 [schnell]
Black Forest Labs' open-source (Apache 2.0) fast generation model, designed for local development and personal use. Produces results in 1–4 steps at the cost of some fine detail and photorealism compared to Dev or Pro variants.
Kling 3.0 Omni
Kuaishou's multimodal image-and-video model with a unified architecture. Exceptional at consistent character generation, style-faithful rendering, and maintaining visual identity across multiple outputs.
Kolors
Open-source image generation model from Kuaishou's AI lab. Based on an enhanced SDXL architecture, recognized for vibrant color reproduction and strong aesthetic composition — particularly excelling in portrait photography with diverse skin tone rendering.
Qwen Image
Alibaba's image generation model from the Qwen AI family. Produces high-quality imagery with particular strengths in East Asian aesthetics, product photography, and diverse cultural representations. Part of Alibaba's broader multimodal AI platform.
Hunyuan Image 3
Tencent's third-generation image model with significantly improved photorealism and detail rendering. Strong in cinematic compositions, portrait photography, and lifestyle imagery. Part of Tencent's Hunyuan multimodal AI platform.
Microsoft Designer
Microsoft Designer uses OpenAI's GPT-Image and DALL-E models with Microsoft's own safety filters and style guidance on top. Available across Windows, Edge, and Microsoft 365 — bringing AI image generation to a massive mainstream audience.
Ideogram V3
Ideogram's third-generation model from the New York startup founded by former Google Brain researchers. Recognized as one of the strongest models for rendering legible, stylistically integrated text within images — the go-to for graphic design, posters, and typographic artwork.
HiDream I1
HiDream.ai's flagship image generation model, recognized for exceptional detail in fashion, beauty, and lifestyle photography. Strong color accuracy, style consistency, and skin tone rendering make it popular among professional creative applications.
Juggernaut Flux Pro
RunDiffusion's photorealistic fine-tune of the FLUX architecture. A community favorite for portrait photography and commercial imagery, known for natural skin textures, accurate studio-quality lighting, and high-detail rendering.
OpenArt Photorealistic
OpenArt's flagship model trained specifically for maximum photorealism in portrait and product photography. Optimized for studio-quality lighting simulations, natural skin rendering, and commercial-grade output.
Flux Realism
XLabs AI's FLUX fine-tune optimized specifically for photorealistic output. One of the most downloaded FLUX fine-tunes in the open-source community, with improved natural detail rendering, accurate color grading, and enhanced realism across portrait and landscape subjects.
Lucid Realism
A popular Stable Diffusion fine-tune from the Civitai community focused on hyperrealistic portrait photography. Combines multiple checkpoint merges to achieve natural skin tones, accurate facial anatomy, and cinematic lighting without the characteristic 'plastic' look of base SD models.
FLUX 2 LoRA Realism
A community realism LoRA (Low-Rank Adaptation) applied on top of the FLUX.2 architecture, sourced from the LoRA Gallery collection. Fine-tunes the base model toward hyperrealistic photography — enhancing skin detail, natural lighting, and scene authenticity without full retraining.
Deliberate
A community fine-tune of Stable Diffusion 1.5, created by XpucT on Civitai. One of the most downloaded checkpoints in the open-source SD ecosystem. Deliberate (v1–v3) is optimized for photorealism, cinematic compositions, and anatomical accuracy — achieving high-quality output with minimal prompt engineering.
Reve
Reve AI's image generation model, known for strong prompt adherence and richly detailed output. Produces high-quality images with a distinctive visual style that blends photorealism with artistic polish — versatile across photography, illustration, and conceptual art.
Z-Image
Z-Image (造相) is an open-source image generation model family by Alibaba's Tongyi Lab. It uses a novel S3-DiT (Scalable Single-Stream Diffusion Transformer) architecture that processes text and image in a unified single stream — enabling efficient, high-quality generation with strong compositional control and instruction following.
Titan Image Generator v1
Amazon's first-generation text-to-image model, part of the Amazon Bedrock foundation model platform. Designed for enterprise use cases including product visualization, marketing assets, and content generation at scale. Features built-in safety controls and watermarking capabilities.
All Companies & Version History
21 companies · 91+ model versions · release dates and benchmark links
OpenAI
DALL-E
Pioneering text-to-image model integrated into ChatGPT and Microsoft
First version — introduced the concept of text-to-image generation to the public. 12B parameter transformer model.
Major quality leap. Introduced inpainting and outpainting. 4× higher resolution than DALL-E 1. Used CLIP image embeddings.
FID score study (arXiv)Dramatically improved prompt adherence. Deeply integrated into ChatGPT. Strong text rendering in images.
T2I-CompBench evaluationGPT Image
Next-gen image generation built into the GPT-4o architecture
First native image generation model integrated into GPT-4o. Best-in-class text rendering, strong instruction following.
Improved consistency, photorealism, and editing capabilities over GPT Image 1.
Google DeepMind
Imagen
Google's flagship diffusion model — consistently top-ranked photorealism
Introduced by Google Brain. Outperformed DALL-E 2 on COCO FID at launch.
Imagen paper (arXiv)Substantially improved photorealism, text rendering, and multilingual support. Launched in Google Bard and Vertex AI.
Highest quality Imagen to date at launch. Improved detail, lighting, and artifact reduction.
Imagen 3 paper (arXiv)Next-gen text-to-image with best-in-class landscape, architecture, and product photography. Integrated into Gemini.
Gemini Image (Nano Banana)
Image generation built natively into the Gemini 3 architecture
Fast image generation + editing model based on Gemini 2.5 Flash. Integrated into Gemini apps.
Professional-grade image generation and editing. Precise instruction following, near-seamless inpainting.
Stability AI
Stable Diffusion
The open-source model that democratized AI image generation
First public release. Latent diffusion model — runs on consumer GPUs. Sparked the open-source AI art movement.
LDM paper (arXiv)Improved training. Still the most widely used SD checkpoint — thousands of community fine-tunes built on it.
Higher resolution (768px default), new text encoder (OpenCLIP), new depth model.
Improved NSFW filtering approach. Less over-filtering than 2.0, better prompt adherence.
Major architecture jump — 3.5B parameters. Native 1024px output. Two-stage pipeline with refiner model.
SDXL paper (arXiv)Multimodal Diffusion Transformer (MMDiT) architecture. Major improvement in text rendering and composition.
SD3 paper (arXiv)2B parameter variant of SD 3. Lighter weight, runs on consumer hardware. Same MMDiT architecture as SD 3 but optimized for accessibility.
2.5B parameter model. Balanced between quality and speed — ideal for local use on consumer GPUs with less than 8 GB VRAM.
8B parameter model. Best quality in the SD 3.x family.
Distilled 4-step variant. Near SD 3.5 Large quality at a fraction of inference time.
SD Community Fine-tunes
The most widely used community checkpoints built on Stable Diffusion
One of the most downloaded SD 1.5 fine-tunes on Civitai. Optimized for photorealism, cinematic compositions, and anatomical accuracy — achieves high quality with minimal prompt engineering.
Deliberate on CivitAICommunity checkpoint merge focused on hyperrealistic portrait photography. Combines multiple SD models to achieve natural skin tones and accurate facial anatomy without the plastic look of base SD.
Lucid Realism on CivitAIMidjourney
Midjourney
Artistic aesthetic, Discord-native — the model that went viral
Initial alpha release via Discord.
Improved coherence and style.
Better quality, more artistic control.
New proprietary model architecture. Major quality jump. More coherent scenes.
Photorealistic quality leap. Detailed hands and faces. Introduced --ar and --style flags.
Improved default aesthetics, more accurate with simple prompts.
Sharpened images, improved aesthetics, introduced zoom-out feature.
Much more literal prompt following, improved text in images, better photorealism.
Refinement of v6 — improved image quality, better human anatomy, sharper details.
Major architecture change. New personalization system, significantly faster generation, improved realism.
Black Forest Labs
FLUX
State-of-the-art open model from former Stability AI researchers
Apache 2.0 open-source, 4-step distilled model. Fast generation, consumer-ready.
Guidance-distilled, 12B parameter Flow Matching model. Better quality than schnell. Non-commercial license.
FLUX.1 paper (arXiv)API-only commercial model. Highest quality in FLUX.1 family.
Improved version of FLUX.1 pro — 6× faster, better prompt adherence and quality.
32B parameter model — generation, editing, and multi-reference image combining in one. Open weights for research and non-commercial use. 226k+ downloads on Hugging Face.
FLUX.2 [dev] on Hugging FaceCommercial API tier. Production-grade generation and editing with 4MP output and multi-reference control.
Flexible tier between dev and pro — balances quality and cost for scalable production workflows.
Premium tier. Superior facial detail, enhanced text rendering, maximum photorealism.
Smallest and fastest FLUX models to date — sub-second inference on capable hardware. Available in 4B and 9B parameter versions. Open weights.
FLUX.2 [klein] on Hugging FaceFLUX Community Fine-tunes
The most important open-source fine-tunes built on the FLUX architecture
One of the first and most downloaded FLUX fine-tunes. Optimized for photorealistic output — improved natural color grading, skin detail, and landscape rendering. Fully open source.
XLabs AI on Hugging FaceConsistency distillation of FLUX.1 [dev] down to 8 sampling steps with near-identical quality. Makes local FLUX generation significantly faster on consumer hardware.
Hyper-SD on Hugging FaceFace identity preservation for FLUX. Generate consistent portraits of a specific person without fine-tuning — just provide a reference photo. Popular for character consistency workflows.
PuLID on GitHubCommunity-favorite portrait fine-tune. Natural skin textures, studio-quality lighting, and sharp hair detail. Widely regarded as the best FLUX fine-tune for commercial portrait work.
Juggernaut on CivitAIControlNet implementation for FLUX — enables structural guidance via depth maps, canny edges, and pose skeletons. Brings the precise layout control from SD/SDXL to the FLUX architecture.
XLabs-AI/x-flux on GitHubAdversarial distillation of FLUX.1 [dev] to just 8 steps with minimal quality loss. Developed by Alibaba's Alimama team — one of the fastest high-quality FLUX variants.
FLUX-Turbo on Hugging FaceRealism LoRA for FLUX.2 from the community LoRA Gallery collection. Enhances skin detail, natural lighting, and photographic authenticity on top of the FLUX.2 base model.
Adobe
Adobe Firefly
Commercially safe, trained on licensed content — built into Creative Cloud
First commercially licensed AI image generator. Trained on Adobe Stock + Creative Commons. Integrated into Photoshop.
Improved photorealism and detail. Photo settings for lighting and depth of field.
Major quality improvement. Introduced Structure Reference and Style Reference features.
Advanced multi-entity generation, improved realism, better Creative Cloud integration.
Meta AI
Emu
Meta's image generation model — available inside Instagram and WhatsApp
Meta's first image generation model. Fine-tuned on curated high-quality data. Integrated into Instagram.
Emu paper (arXiv)Instruction-based image editing model. Precise local and global edits via text commands.
Emu Edit paper (arXiv)Largest generative multimodal model from Meta. 37B parameters, strong few-shot visual generation.
Emu 2 paper (arXiv)ByteDance / BytePlus
Seedream
Cinematic, enterprise-grade image generation from the makers of TikTok
Strong composition and color accuracy. Designed for professional and enterprise workflows.
Improved cinematic quality, better text support, enhanced style fidelity.
Kuaishou
Kolors
Open-source SDXL fine-tune with vivid colors and strong portrait quality
Open-source release. Built on enhanced SDXL. Recognized for vibrant colors and diverse skin tone rendering.
Kolors paper (arXiv)Kling Omni
Unified image + video generation model
Image generation component of Kling. Consistent character generation and style fidelity.
Unified image and video architecture. Exceptional character consistency across outputs.
Alibaba (Tongyi Lab)
Wanx (通义万象)
Alibaba's flagship image model with strong cultural diversity
High-quality generation with East Asian aesthetic strengths. Available via Alibaba Cloud API.
Integrated into the Qwen multimodal model family. Strong product photography and cultural representations.
Z-Image (造相)
Open-source S3-DiT architecture — unified text and image stream
Novel S3-DiT (Scalable Single-Stream Diffusion Transformer). Efficient unified text+image processing.
Tencent
Hunyuan Image
Cinematic image generation from Tencent's Hunyuan platform
First public release. Strong in portrait and lifestyle imagery.
Improved realism and cinematic composition.
Best-in-class portrait photorealism from Tencent. Warm color grading, studio-quality lighting.
Ideogram
Ideogram
Best-in-class text rendering — go-to for graphic design and typography
Launch. Immediately recognized for best text rendering of any image model at the time.
Significantly improved photorealism while maintaining text advantages. New Style and Color Palette features.
ELO benchmark (Ideogram blog)Further photorealism improvements. Strongest text-in-image model available. New canvas editing features.
Microsoft
Microsoft Designer / Copilot Image
DALL-E and GPT Image powered — built into Windows, Edge, and Microsoft 365
Microsoft Designer launched with DALL-E 3 backend. Integrated into Bing Image Creator and Edge.
Upgraded to GPT Image 1 backend. Integrated across Microsoft 365, Windows Copilot, and Designer.
HiDream.ai
HiDream I-Series
Fashion and beauty specialist — open-source with commercial options
Fast variant. 17B parameter model, 4-step distillation.
Development/research variant. Non-commercial license.
Full quality model. Exceptional skin tone accuracy, fashion and beauty photography.
HiDream paper (arXiv)xAI
Aurora / Grok Image
Elon Musk's AI lab — extremely realistic output with minimal safety filtering
xAI's first public image model, integrated into Grok on X (formerly Twitter). FLUX-based architecture. Notably less restrictive safety filters produce a hyper-realistic look, especially for people and social scenes.
Next-generation model. Continues the Aurora lineage with improved photorealism and native Grok-3 multimodal integration.
Recraft
Recraft
The first model to master graphic design — text, logos, and vector art
Early release focused on vector-style and design-oriented generation.
Improved style consistency, introduced SVG output support.
#1 on Hugging Face text-to-image leaderboard at launch. Industry-leading text rendering, logo generation, and vector graphic output. The go-to model for professional graphic designers.
Hugging Face T2I LeaderboardLeonardo AI (Canva)
Leonardo Phoenix / Kino
Cinematic Hollywood-style imagery — one of the world's largest AI art platforms
Cinematic-style SDXL fine-tune. Strong in dramatic lighting, movie-like scenes, and character consistency.
Leonardo's first proprietary foundation model. Improved text rendering, prompt adherence, and photorealism. Millions of daily generations across the platform.
Refined flagship model post-Canva acquisition. Cinematic quality with better anatomy and enhanced style controls.
Apple
Apple Intelligence Image
Deeply integrated into iOS/macOS — the AI model most people encounter daily
First public Apple image generation feature. Clean, illustrative style — Animation, Illustration, and Sketch modes. Runs fully on-device. Available on iPhone 15 Pro and later.
Sketch-to-image feature in Apple Notes. Turns rough hand-drawn sketches into polished illustrations. On-device, privacy-first.
AI-generated custom emoji from text descriptions. Integrated into keyboard and Messages.
Expanded style options and improved quality expected with iOS 19. Broader device support.
OpenArt
OpenArt Photorealistic
Studio-quality portrait and product photography model
OpenArt's flagship model trained for maximum photorealism in portrait and product photography. Optimized for studio lighting simulation, natural skin rendering, and commercial-grade output.
Amazon
Titan Image Generator
Enterprise image generation on AWS Bedrock with built-in safety
Amazon's first image generation model on Bedrock. Designed for enterprise workflows — product visualization, marketing assets, content generation at scale. Features built-in watermarking.
Improved quality, new image conditioning features including background removal and outpainting. Better instruction following.
Reve AI
Reve
Strong prompt adherence with distinctive artistic polish
Launch model. Recognized for strong depth-of-field rendering and rich shadow detail.
Now put it to the test
Can you spot the AI image when it counts?