Installation
Install Evalytic with pip. Choose the extras you need.
Requirements
- Python 3.10+ (tested on 3.10, 3.11, 3.12, 3.13)
- pip (or any PEP 517 compatible installer)
Install Variants
Core
pip install evalytic
Includes the CLI, shared judge/provider support for LLM and VLM judges, Rich terminal output, JSON/HTML reports, and the core commands for visual, RAG, text, and agent evaluation. ~5 MB download.
With Generation
pip install "evalytic[generation]"
Adds fal-client for fal.ai image generation. Parel builtin generation uses Evalytic's core
httpx dependency, so Parel-only benchmark runs do not require this extra.
With Metrics
pip install "evalytic[metrics]"
Adds CLIP Score, LPIPS, ArcFace, and NIMA local metrics for visual workflows. Once installed, CLIP (text2img),
LPIPS (img2img), and NIMA are auto-enabled in evaly bench. ~2 GB download.
[metrics] extra installs PyTorch and model weights.
Use --no-metrics to disable local metrics after install.
With OCR
pip install "evalytic[ocr]"
Adds OCR accuracy scoring via pytesseract for text-in-image workflows. Requires Tesseract on the system
(brew install tesseract on macOS).
With Embeddings
pip install "evalytic[embeddings]"
Adds sentence-transformers for local embeddings. Recommended for answer_relevancy and
semantic_similarity so those metrics work locally without a separate embeddings API.
Everything
pip install "evalytic[all]"
Installs generation, metrics, OCR, and embeddings extras in one go.
Use-Case Matrix
| Workflow | Recommended Install | Notes |
|---|---|---|
fal.ai visual benchmark with evaly bench | evalytic[generation] | Adds fal-client for fal.ai model APIs. |
Parel visual benchmark with evaly bench | evalytic | Core install is enough. Set PAREL_API_KEY. |
| Visual benchmark + local metrics | evalytic[all] | Includes generation, CLIP/LPIPS/NIMA/ArcFace, OCR, and embeddings. |
| RAG evaluation with local answer relevancy | evalytic[embeddings] | Recommended so answer_relevancy works locally. |
| Text evaluation with semantic similarity | evalytic[embeddings] | semantic_similarity uses embeddings; deterministic metrics stay in core. |
| Agent evaluation | evalytic | Core install is enough. Embeddings can improve goal_accuracy when an expected output is provided. |
Virtual Environment
We recommend using a virtual environment:
python3 -m venv .venv
source .venv/bin/activate # macOS/Linux
pip install evalytic
Verify Installation
evaly --help
evaly bench --help
evaly rag eval --help
evaly text eval --help
evaly agent eval --help
evaly compare --help
CLI Aliases
Two entry points are available:
evaly— primary command nameevalytic— full-name alias (same functionality)
Development Install
For contributing to Evalytic or running tests:
git clone https://github.com/evalytic/evalytic.git
cd evalytic
pip install -e ".[dev]"
pytest -v