Dimensions

7 quality dimensions for evaluating AI-generated images.

Evalytic scores images across up to 7 semantic dimensions, each evaluated on a 1–5 scale by a VLM judge. Dimensions are grouped by pipeline type: text-to-image (text2img) and image-to-image (img2img).

All Dimensions

Dimension	Pipeline	Images Evaluated	Description
`visual_quality`	Both	Output only	Overall visual quality: composition, lighting, color, sharpness.
`prompt_adherence`	text2img	Output only	How well the image matches the text prompt.
`text_rendering`	text2img	Output only	Quality of rendered text in the image (legibility, accuracy).
`input_fidelity`	img2img	Input + Output	How well the output preserves key elements from the input.
`transformation_quality`	img2img	Input + Output	Quality of the transformation applied (style transfer, enhancement).
`artifact_detection`	img2img	Input + Output	Presence of artifacts, glitches, or unwanted modifications.
`identity_preservation`	img2img	Input + Output	Face and identity preservation in transformations. Opt-in only.

Scoring Scale

All dimensions use a 1–5 scale:

Score	Meaning
1	Poor — Major issues, unusable
2	Below Average — Noticeable issues
3	Average — Acceptable with some issues
4	Good — High quality, minor issues
5	Excellent — Production-ready, no issues

Text-to-Image Dimensions

visual_quality

Evaluates overall visual quality independent of prompt. Considers:

Composition and framing
Lighting and exposure
Color accuracy and harmony
Sharpness and detail
Absence of visual artifacts

visual_quality is always included in auto-detection regardless of context.

prompt_adherence

Measures how accurately the generated image reflects the text prompt. Evaluates:

Subject presence and accuracy
Scene composition matching prompt description
Attribute accuracy (colors, sizes, quantities)
Spatial relationships
Style and mood alignment

Requires a prompt. Auto-selected when --prompt is provided to evaly eval, or always included for evaly bench.

text_rendering

Evaluates the quality of text rendered within the image:

Character accuracy (correct spelling)
Legibility and readability
Font consistency
Text placement and integration
Absence of garbled characters

Auto-selected when the prompt contains text-related keywords: "text", "word", "letter", "write", "say", "font", "type", "sign".

Image-to-Image Dimensions

input_fidelity

Measures how well the output preserves important elements from the input image:

Subject identity preservation
Key feature retention
Color palette consistency
Structural integrity

transformation_quality

Evaluates the quality of the applied transformation:

Smoothness and consistency of the transformation
Appropriate level of modification
Natural-looking results
Effective application of the intended effect

artifact_detection

Checks for unwanted artifacts introduced during transformation:

Edge artifacts and halos
Color bleeding or banding
Unnatural distortions
Missing or duplicated elements
Noise or grain introduction

For artifact_detection, a higher score means fewer artifacts. Score 5 = no artifacts detected.

identity_preservation

Evaluates face and identity preservation when transforming images containing people:

Facial feature accuracy (eyes, nose, mouth, jawline)
Skin tone and complexion consistency
Body proportions and posture
Expression preservation

Opt-in only. This dimension is not included in auto-detection. Add it explicitly with -d identity_preservation. When no human faces are detected in the input image, it automatically scores 5 (not applicable) so it doesn't penalize the overall score.

Pair with --metrics face for a deterministic ArcFace embedding comparison alongside the VLM assessment.

Selecting Dimensions

You can select specific dimensions or rely on auto-detection:

# Auto-detect (recommended)
evaly bench -m flux-schnell -p "A cat"

# Explicit selection
evaly bench -m flux-schnell -p "A cat" -d visual_quality -d prompt_adherence

# All text2img dimensions
evaly bench -m flux-pro -p "Sign: HELLO" \
    -d visual_quality -d prompt_adherence -d text_rendering

Set default dimensions in evalytic.toml:

# evalytic.toml
[bench]
dimensions = ["visual_quality", "prompt_adherence", "text_rendering"]

Judges Reports