Quickstart

Pick the path closest to what you want to evaluate.

Want to see results first? Run evaly demo to browse real visual benchmark reports in your browser — no API keys needed. Learn more →
One SDK, three starting paths. Visual workflows are the most mature public path today, but the same CLI now supports RAG, text, and agent evaluation too.

Choose Your Path

Shared Install

pip install evalytic

Core install includes the CLI, judge provider support, JSON/HTML reports, and the command groups for visual, RAG, text, and agent evaluation. Add extras only when your workflow needs them:

Use CaseInstallWhy
Visual benchmarks via fal.aievalytic[generation]Adds fal-client for fal.ai image generation.
Visual benchmarks via ParelevalyticParel builtins use the core httpx dependency. Set PAREL_API_KEY.
Visual local metricsevalytic[metrics] / evalytic[ocr]Add CLIP, LPIPS, NIMA, ArcFace, and OCR scoring.
RAG / semantic text metricsevalytic[embeddings]Add local embeddings for answer_relevancy and semantic_similarity.
Everythingevalytic[all]Install generation, metrics, OCR, and embeddings in one go.

Visual

Use this path when you want to benchmark image generation models or score an existing image without generation.

pip install "evalytic[generation]"
export FAL_KEY=your_fal_key
evaly bench -y

That single command generates an image, scores it, and prints a terminal report. If you already have an output image, use the visual-only scoring path instead:

You can also run Parel builtin models by setting PAREL_API_KEY and using parel/ model names:

export PAREL_API_KEY=your_parel_key
evaly bench -m parel/flux-schnell -p "A product photo on marble" --yes
export GEMINI_API_KEY=your_gemini_key
evaly eval --image output.png --prompt "A product photo of sneakers"

Next: evaly bench for generation benchmarks and evaly eval for existing image scoring.

RAG

Use this path when you already have a user query, a model response, and one or more retrieved context chunks.

pip install "evalytic[embeddings]"
export GEMINI_API_KEY=your_gemini_key
evaly rag eval \
    --query "What does Evalytic evaluate?" \
    --response "Evalytic evaluates images, text, RAG, and agents." \
    --context "Evalytic is an evaluation SDK for AI outputs." \
    --context "It supports visual, text, RAG, and agent workflows." \
    -o rag.json

Use per-metric gates for RAG reports:

evaly gate --report rag.json \
    --metric-threshold faithfulness:0.8 \
    --metric-threshold hallucination:0.9 \
    --metric-threshold contextual_relevancy:0.75 \
    --metric-threshold answer_relevancy:0.7

Or assert the same thresholds directly inside pytest with evalytic.testing.assert_test. Next: evaly rag for the full command reference and evaly gate for report-type-aware gating.

Text / Agent

Use this path when you want to evaluate plain text outputs, rubric-based responses, or tool-using agent runs.

pip install "evalytic[embeddings]"
export GEMINI_API_KEY=your_gemini_key

Text Output

evaly text eval \
    --input "Summarize the incident in one sentence." \
    --output-text "The service was unavailable for 12 minutes." \
    --expected "A brief outage lasted 12 minutes." \
    -o text.json

Agent Run

evaly agent eval \
    --input "Find pricing and summarize it." \
    --final-output "The Pro plan costs $99 per month." \
    --tool-call web.search \
    --expected-tool web.search \
    -o agent.json

Compare two runs of the same report type with a single command:

evaly compare \
    --baseline run-a.json \
    --candidate run-b.json

Next: evaly text, evaly agent, and evaly compare.

Need every extra, config option, or install variant? See Installation and the CLI command pages linked above.