Dimensions
7 quality dimensions for evaluating AI-generated images.
Evalytic scores images across up to 7 semantic dimensions, each evaluated on a 1–5 scale by a VLM judge. Dimensions are grouped by pipeline type: text-to-image (text2img) and image-to-image (img2img).
All Dimensions
| Dimension | Pipeline | Images Evaluated | Description |
|---|---|---|---|
visual_quality |
Both | Output only | Overall visual quality: composition, lighting, color, sharpness. |
prompt_adherence |
text2img | Output only | How well the image matches the text prompt. |
text_rendering |
text2img | Output only | Quality of rendered text in the image (legibility, accuracy). |
input_fidelity |
img2img | Input + Output | How well the output preserves key elements from the input. |
transformation_quality |
img2img | Input + Output | Quality of the transformation applied (style transfer, enhancement). |
artifact_detection |
img2img | Input + Output | Presence of artifacts, glitches, or unwanted modifications. |
identity_preservation |
img2img | Input + Output | Face and identity preservation in transformations. Opt-in only. |
Scoring Scale
All dimensions use a 1–5 scale:
| Score | Meaning |
|---|---|
| 1 | Poor — Major issues, unusable |
| 2 | Below Average — Noticeable issues |
| 3 | Average — Acceptable with some issues |
| 4 | Good — High quality, minor issues |
| 5 | Excellent — Production-ready, no issues |
Text-to-Image Dimensions
visual_quality
Evaluates overall visual quality independent of prompt. Considers:
- Composition and framing
- Lighting and exposure
- Color accuracy and harmony
- Sharpness and detail
- Absence of visual artifacts
visual_quality is always included in auto-detection regardless of context.
prompt_adherence
Measures how accurately the generated image reflects the text prompt. Evaluates:
- Subject presence and accuracy
- Scene composition matching prompt description
- Attribute accuracy (colors, sizes, quantities)
- Spatial relationships
- Style and mood alignment
Requires a prompt. Auto-selected when --prompt is provided to evaly eval, or always included for evaly bench.
text_rendering
Evaluates the quality of text rendered within the image:
- Character accuracy (correct spelling)
- Legibility and readability
- Font consistency
- Text placement and integration
- Absence of garbled characters
Auto-selected when the prompt contains text-related keywords: "text", "word", "letter", "write", "say", "font", "type", "sign".
Image-to-Image Dimensions
input_fidelity
Measures how well the output preserves important elements from the input image:
- Subject identity preservation
- Key feature retention
- Color palette consistency
- Structural integrity
transformation_quality
Evaluates the quality of the applied transformation:
- Smoothness and consistency of the transformation
- Appropriate level of modification
- Natural-looking results
- Effective application of the intended effect
artifact_detection
Checks for unwanted artifacts introduced during transformation:
- Edge artifacts and halos
- Color bleeding or banding
- Unnatural distortions
- Missing or duplicated elements
- Noise or grain introduction
artifact_detection, a higher score means fewer artifacts. Score 5 = no artifacts detected.
identity_preservation
Evaluates face and identity preservation when transforming images containing people:
- Facial feature accuracy (eyes, nose, mouth, jawline)
- Skin tone and complexion consistency
- Body proportions and posture
- Expression preservation
-d identity_preservation.
When no human faces are detected in the input image, it automatically scores 5 (not applicable) so it doesn't penalize the overall score.
Pair with --metrics face for a deterministic ArcFace embedding comparison alongside the VLM assessment.
Selecting Dimensions
You can select specific dimensions or rely on auto-detection:
# Auto-detect (recommended)
evaly bench -m flux-schnell -p "A cat"
# Explicit selection
evaly bench -m flux-schnell -p "A cat" -d visual_quality -d prompt_adherence
# All text2img dimensions
evaly bench -m flux-pro -p "Sign: HELLO" \
-d visual_quality -d prompt_adherence -d text_rendering
Set default dimensions in evalytic.toml:
# evalytic.toml
[bench]
dimensions = ["visual_quality", "prompt_adherence", "text_rendering"]