evaly eval
Score a single image without generation.
evaly eval --image <PATH_OR_URL> [OPTIONS]
The eval command scores an existing image without generating it.
Useful for evaluating images you've already generated, or images from any source.
Dimensions are auto-detected based on context.
Basic Usage
# Score a local image (visual_quality auto-selected)
evaly eval --image output.png
# Score with prompt context (adds prompt_adherence)
evaly eval --image output.png --prompt "A sunset over mountains"
# Score img2img transformation (adds input_fidelity, etc.)
evaly eval --image output.png --input-image input.png
# Score a URL
evaly eval --image https://example.com/generated.png
# Write results to JSON
evaly eval --image output.png -o scores.json
Options
| Flag | Type | Default | Description |
|---|---|---|---|
| --image | TEXT | — | Required. Path or URL to the image to evaluate. |
| --dimensions, -d | TEXT (multiple) | auto | Dimensions to score. Auto-detected if omitted. |
| --prompt | TEXT | — | Generation prompt. Enables prompt_adherence scoring. |
| --input-image | TEXT | — | Input image path/URL. Enables img2img dimensions. |
| --judge, -j | TEXT | gemini-2.5-flash | VLM judge (e.g., openai/gpt-5.2, ollama/qwen2.5-vl:7b). |
| --judge-url | TEXT | — | Custom judge API base URL. |
| --output, -o | TEXT | — | Write results to JSON file. |
Auto-Detection Logic
When no --dimensions are specified, Evalytic automatically selects the right dimensions based on context:
| Context | Dimensions Selected |
|---|---|
| Image only | visual_quality |
Image + --prompt |
visual_quality, prompt_adherence |
Image + --prompt with text keywords |
visual_quality, prompt_adherence, text_rendering |
Image + --input-image |
visual_quality, input_fidelity, transformation_quality, artifact_detection |
Text keywords:
text_rendering is auto-added when your prompt contains words like "text", "word", "letter", "write", "say", "font", "type", or "sign".
Output Format
Terminal output
The terminal table includes a Conf (confidence) column alongside each score, showing how confident the judge is in its assessment:
$ evaly eval --image output.png --prompt "A sunset over mountains" ┌────────────────────┬───────┬──────┬─────────────────────────────────┐ │ Dimension │ Score │ Conf │ Explanation │ ├────────────────────┼───────┼──────┼─────────────────────────────────┤ │ visual_quality │ 4.2 │ 100% │ Well-composed image with... │ │ prompt_adherence │ 3.8 │ 72% │ Captures the main subject... │ └────────────────────┴───────┴──────┴─────────────────────────────────┘
Confidence colors: green (≥80%), yellow (≥50%), red (<50%).
JSON output
The JSON output (-o scores.json) includes confidence for each dimension:
{
"image": "output.png",
"judge": "gemini-2.5-flash",
"dimensions": [
{
"dimension": "visual_quality",
"score": 4.2,
"confidence": 1.0,
"explanation": "Well-composed image with...",
"evidence": ["good lighting", "sharp details"]
},
{
"dimension": "prompt_adherence",
"score": 3.8,
"confidence": 0.72,
"explanation": "Captures the main subject...",
"evidence": ["sunset present", "mountains visible"]
}
]
}
Examples
Evaluate with specific dimensions
evaly eval \
--image photo.png \
-d visual_quality \
-d artifact_detection
Evaluate a background removal result
evaly eval \
--image removed_bg.png \
--input-image original.png \
-o bg_removal_scores.json