evaly eval

Score a single image without generation.

evaly eval --image <PATH_OR_URL> [OPTIONS]

The eval command scores an existing image without generating it. Useful for evaluating images you've already generated, or images from any source. Dimensions are auto-detected based on context.

Basic Usage

# Score a local image (visual_quality auto-selected)
evaly eval --image output.png

# Score with prompt context (adds prompt_adherence)
evaly eval --image output.png --prompt "A sunset over mountains"

# Score img2img transformation (adds input_fidelity, etc.)
evaly eval --image output.png --input-image input.png

# Score a URL
evaly eval --image https://example.com/generated.png

# Write results to JSON
evaly eval --image output.png -o scores.json

Options

FlagTypeDefaultDescription
--imageTEXTRequired. Path or URL to the image to evaluate.
--dimensions, -dTEXT (multiple)autoDimensions to score. Auto-detected if omitted.
--promptTEXTGeneration prompt. Enables prompt_adherence scoring.
--input-imageTEXTInput image path/URL. Enables img2img dimensions.
--judge, -jTEXTgemini-2.5-flashVLM judge (e.g., openai/gpt-5.2, ollama/qwen2.5-vl:7b).
--judge-urlTEXTCustom judge API base URL.
--output, -oTEXTWrite results to JSON file.

Auto-Detection Logic

When no --dimensions are specified, Evalytic automatically selects the right dimensions based on context:

ContextDimensions Selected
Image only visual_quality
Image + --prompt visual_quality, prompt_adherence
Image + --prompt with text keywords visual_quality, prompt_adherence, text_rendering
Image + --input-image visual_quality, input_fidelity, transformation_quality, artifact_detection
Text keywords: text_rendering is auto-added when your prompt contains words like "text", "word", "letter", "write", "say", "font", "type", or "sign".

Output Format

Terminal output

The terminal table includes a Conf (confidence) column alongside each score, showing how confident the judge is in its assessment:

$ evaly eval --image output.png --prompt "A sunset over mountains"

  ┌────────────────────┬───────┬──────┬─────────────────────────────────┐
  │ Dimension          │ Score │ Conf │ Explanation                     │
  ├────────────────────┼───────┼──────┼─────────────────────────────────┤
  │ visual_quality     │ 4.2    100% │ Well-composed image with...     │
  │ prompt_adherence   │ 3.8    72%  │ Captures the main subject...    │
  └────────────────────┴───────┴──────┴─────────────────────────────────┘

Confidence colors: green (≥80%), yellow (≥50%), red (<50%).

JSON output

The JSON output (-o scores.json) includes confidence for each dimension:

{
  "image": "output.png",
  "judge": "gemini-2.5-flash",
  "dimensions": [
    {
      "dimension": "visual_quality",
      "score": 4.2,
      "confidence": 1.0,
      "explanation": "Well-composed image with...",
      "evidence": ["good lighting", "sharp details"]
    },
    {
      "dimension": "prompt_adherence",
      "score": 3.8,
      "confidence": 0.72,
      "explanation": "Captures the main subject...",
      "evidence": ["sunset present", "mountains visible"]
    }
  ]
}

Examples

Evaluate with specific dimensions

evaly eval \
    --image photo.png \
    -d visual_quality \
    -d artifact_detection

Evaluate a background removal result

evaly eval \
    --image removed_bg.png \
    --input-image original.png \
    -o bg_removal_scores.json