Installation

Install Evalytic with pip. Choose the extras you need.

Requirements

Python 3.10+ (tested on 3.10, 3.11, 3.12, 3.13)
pip (or any PEP 517 compatible installer)

Install Variants

Core

pip install evalytic

Includes the CLI, shared judge/provider support for LLM and VLM judges, Rich terminal output, JSON/HTML reports, and the core commands for visual, RAG, text, and agent evaluation. ~5 MB download.

With Generation

pip install "evalytic[generation]"

Adds fal-client for fal.ai image generation. Parel builtin generation uses Evalytic's core httpx dependency, so Parel-only benchmark runs do not require this extra.

With Metrics

pip install "evalytic[metrics]"

Adds CLIP Score, LPIPS, ArcFace, and NIMA local metrics for visual workflows. Once installed, CLIP (text2img), LPIPS (img2img), and NIMA are auto-enabled in evaly bench. ~2 GB download.

Large download: The [metrics] extra installs PyTorch and model weights. Use --no-metrics to disable local metrics after install.

With OCR

pip install "evalytic[ocr]"

Adds OCR accuracy scoring via pytesseract for text-in-image workflows. Requires Tesseract on the system (brew install tesseract on macOS).

With Embeddings

pip install "evalytic[embeddings]"

Adds sentence-transformers for local embeddings. Recommended for answer_relevancy and semantic_similarity so those metrics work locally without a separate embeddings API.

Everything

pip install "evalytic[all]"

Installs generation, metrics, OCR, and embeddings extras in one go.

Use-Case Matrix

Workflow	Recommended Install	Notes
fal.ai visual benchmark with `evaly bench`	`evalytic[generation]`	Adds `fal-client` for fal.ai model APIs.
Parel visual benchmark with `evaly bench`	`evalytic`	Core install is enough. Set `PAREL_API_KEY`.
Visual benchmark + local metrics	`evalytic[all]`	Includes generation, CLIP/LPIPS/NIMA/ArcFace, OCR, and embeddings.
RAG evaluation with local answer relevancy	`evalytic[embeddings]`	Recommended so `answer_relevancy` works locally.
Text evaluation with semantic similarity	`evalytic[embeddings]`	`semantic_similarity` uses embeddings; deterministic metrics stay in core.
Agent evaluation	`evalytic`	Core install is enough. Embeddings can improve `goal_accuracy` when an expected output is provided.

Virtual Environment

We recommend using a virtual environment:

python3 -m venv .venv
source .venv/bin/activate  # macOS/Linux
pip install evalytic

Verify Installation

evaly --help
evaly bench --help
evaly rag eval --help
evaly text eval --help
evaly agent eval --help
evaly compare --help

CLI Aliases

Two entry points are available:

evaly — primary command name
evalytic — full-name alias (same functionality)

Development Install

For contributing to Evalytic or running tests:

git clone https://github.com/evalytic/evalytic.git
cd evalytic
pip install -e ".[dev]"
pytest -v

Quickstart Configuration