Dimensions

7 quality dimensions for evaluating AI-generated images.

Evalytic scores images across up to 7 semantic dimensions, each evaluated on a 1–5 scale by a VLM judge. Dimensions are grouped by pipeline type: text-to-image (text2img) and image-to-image (img2img).

All Dimensions

DimensionPipelineImages EvaluatedDescription
visual_quality Both Output only Overall visual quality: composition, lighting, color, sharpness.
prompt_adherence text2img Output only How well the image matches the text prompt.
text_rendering text2img Output only Quality of rendered text in the image (legibility, accuracy).
input_fidelity img2img Input + Output How well the output preserves key elements from the input.
transformation_quality img2img Input + Output Quality of the transformation applied (style transfer, enhancement).
artifact_detection img2img Input + Output Presence of artifacts, glitches, or unwanted modifications.
identity_preservation img2img Input + Output Face and identity preservation in transformations. Opt-in only.

Scoring Scale

All dimensions use a 1–5 scale:

ScoreMeaning
1Poor — Major issues, unusable
2Below Average — Noticeable issues
3Average — Acceptable with some issues
4Good — High quality, minor issues
5Excellent — Production-ready, no issues

Text-to-Image Dimensions

visual_quality

Evaluates overall visual quality independent of prompt. Considers:

  • Composition and framing
  • Lighting and exposure
  • Color accuracy and harmony
  • Sharpness and detail
  • Absence of visual artifacts
visual_quality is always included in auto-detection regardless of context.

prompt_adherence

Measures how accurately the generated image reflects the text prompt. Evaluates:

  • Subject presence and accuracy
  • Scene composition matching prompt description
  • Attribute accuracy (colors, sizes, quantities)
  • Spatial relationships
  • Style and mood alignment

Requires a prompt. Auto-selected when --prompt is provided to evaly eval, or always included for evaly bench.

text_rendering

Evaluates the quality of text rendered within the image:

  • Character accuracy (correct spelling)
  • Legibility and readability
  • Font consistency
  • Text placement and integration
  • Absence of garbled characters

Auto-selected when the prompt contains text-related keywords: "text", "word", "letter", "write", "say", "font", "type", "sign".

Image-to-Image Dimensions

input_fidelity

Measures how well the output preserves important elements from the input image:

  • Subject identity preservation
  • Key feature retention
  • Color palette consistency
  • Structural integrity

transformation_quality

Evaluates the quality of the applied transformation:

  • Smoothness and consistency of the transformation
  • Appropriate level of modification
  • Natural-looking results
  • Effective application of the intended effect

artifact_detection

Checks for unwanted artifacts introduced during transformation:

  • Edge artifacts and halos
  • Color bleeding or banding
  • Unnatural distortions
  • Missing or duplicated elements
  • Noise or grain introduction
For artifact_detection, a higher score means fewer artifacts. Score 5 = no artifacts detected.

identity_preservation

Evaluates face and identity preservation when transforming images containing people:

  • Facial feature accuracy (eyes, nose, mouth, jawline)
  • Skin tone and complexion consistency
  • Body proportions and posture
  • Expression preservation
Opt-in only. This dimension is not included in auto-detection. Add it explicitly with -d identity_preservation. When no human faces are detected in the input image, it automatically scores 5 (not applicable) so it doesn't penalize the overall score.

Pair with --metrics face for a deterministic ArcFace embedding comparison alongside the VLM assessment.

Selecting Dimensions

You can select specific dimensions or rely on auto-detection:

# Auto-detect (recommended)
evaly bench -m flux-schnell -p "A cat"

# Explicit selection
evaly bench -m flux-schnell -p "A cat" -d visual_quality -d prompt_adherence

# All text2img dimensions
evaly bench -m flux-pro -p "Sign: HELLO" \
    -d visual_quality -d prompt_adherence -d text_rendering

Set default dimensions in evalytic.toml:

# evalytic.toml
[bench]
dimensions = ["visual_quality", "prompt_adherence", "text_rendering"]