evaly eval

Score a single image without generation.

evaly eval --image <PATH_OR_URL> [OPTIONS]

The eval command scores an existing image without generating it. Useful for evaluating images you've already generated, or images from any source. Dimensions are auto-detected based on context.

Basic Usage

# Score a local image (visual_quality auto-selected)
evaly eval --image output.png

# Score with prompt context (adds prompt_adherence)
evaly eval --image output.png --prompt "A sunset over mountains"

# Score img2img transformation (adds input_fidelity, etc.)
evaly eval --image output.png --input-image input.png

# Score a URL
evaly eval --image https://example.com/generated.png

# Write results to JSON
evaly eval --image output.png -o scores.json

Options

Flag	Type	Default	Description
--image	TEXT	—	Required. Path or URL to the image to evaluate.
--dimensions, -d	TEXT (multiple)	auto	Dimensions to score. Auto-detected if omitted.
--prompt	TEXT	—	Generation prompt. Enables `prompt_adherence` scoring.
--input-image	TEXT	—	Input image path/URL. Enables img2img dimensions.
--judge, -j	TEXT	gemini-2.5-flash	VLM judge (e.g., `openai/gpt-5.2`, `ollama/qwen2.5-vl:7b`).
--judge-url	TEXT	—	Custom judge API base URL.
--output, -o	TEXT	—	Write results to JSON file.

Auto-Detection Logic

When no --dimensions are specified, Evalytic automatically selects the right dimensions based on context:

Context	Dimensions Selected
Image only	`visual_quality`
Image + `--prompt`	`visual_quality`, `prompt_adherence`
Image + `--prompt` with text keywords	`visual_quality`, `prompt_adherence`, `text_rendering`
Image + `--input-image`	`visual_quality`, `input_fidelity`, `transformation_quality`, `artifact_detection`

Text keywords: text_rendering is auto-added when your prompt contains words like "text", "word", "letter", "write", "say", "font", "type", or "sign".

Output Format

Terminal output

The terminal table includes a Conf (confidence) column alongside each score, showing how confident the judge is in its assessment:

$ evaly eval --image output.png --prompt "A sunset over mountains"

  ┌────────────────────┬───────┬──────┬─────────────────────────────────┐
  │ Dimension          │ Score │ Conf │ Explanation                     │
  ├────────────────────┼───────┼──────┼─────────────────────────────────┤
  │ visual_quality     │ 4.2   │ 100% │ Well-composed image with...     │
  │ prompt_adherence   │ 3.8   │ 72%  │ Captures the main subject...    │
  └────────────────────┴───────┴──────┴─────────────────────────────────┘

Confidence colors: green (≥80%), yellow (≥50%), red (<50%).

JSON output

The JSON output (-o scores.json) includes confidence for each dimension:

{
  "image": "output.png",
  "judge": "gemini-2.5-flash",
  "dimensions": [
    {
      "dimension": "visual_quality",
      "score": 4.2,
      "confidence": 1.0,
      "explanation": "Well-composed image with...",
      "evidence": ["good lighting", "sharp details"]
    },
    {
      "dimension": "prompt_adherence",
      "score": 3.8,
      "confidence": 0.72,
      "explanation": "Captures the main subject...",
      "evidence": ["sunset present", "mountains visible"]
    }
  ]
}

Examples

Evaluate with specific dimensions

evaly eval \
    --image photo.png \
    -d visual_quality \
    -d artifact_detection

Evaluate a background removal result

evaly eval \
    --image removed_bg.png \
    --input-image original.png \
    -o bg_removal_scores.json

evaly bench evaly gate