Skip to main content

Own your LLM evals: capture DeepEval runs in TestRelic

· 4 min read
TestRelic Team
TestRelic Maintainers

If you evaluate LLM output with DeepEval, your eval results are some of the most valuable test data you have — and until now they lived in a separate tool from the rest of your tests. testrelic-deepeval changes that: it captures your DeepEval runs and uploads them to your TestRelic org, so evals sit next to your browser, API, and mobile tests with the same history, AI Insights, and dashboards.

The problem with eval data in a silo

DeepEval is great at running evaluations. But by default its results go to Confident AI — a different platform from where your QA and engineering teams already track quality. That split means two dashboards, two notions of "is this passing," and no shared history between your functional tests and your eval suite.

We kept hearing the same ask: just put the evals where the rest of our tests are. So we built a bridge.

How it works

testrelic-deepeval reads DeepEval's in-memory TestRun and uploads it to TestRelic's native evals API. No monkey-patching, no HTTP redirect — it just reads the state DeepEval already produced. There are two ways to use it.

As a pytest plugin (zero code changes):

Terminal
pip install testrelic-deepeval
testrelic login
deepeval test run tests/

The plugin captures the run at session finish and uploads it. If no credentials are present it quietly no-ops, and it never fails your tests — upload errors are swallowed.

As a programmatic wrapper:

evals.py
from deepeval.test_case import LLMTestCase
from deepeval.metrics import AnswerRelevancyMetric
from testrelic import evaluate

results = evaluate(
test_cases=[LLMTestCase(input="Hi", actual_output="Hello")],
metrics=[AnswerRelevancyMetric(threshold=0.7)],
)
# results is whatever deepeval.evaluate() returns; upload is automatic

The evaluate() wrapper is a drop-in for deepeval.evaluate() — run your eval the way you already do, and the upload happens for you. Full command reference is on the CLI Reference page.

The eval workspace

Once a run uploads, it appears in the new LLM Evaluations workspace. Eval runs render through the same Run Detail and Session Workspace as your functional tests — there's no bespoke eval-only UI to learn. Each repository gets an Evaluations tab next to Test Cases and Test Runs, and you can opt eval rows into the global Test Runs feed.

The metric we think you'll watch most is Eval Stability — how consistently your eval cases pass across runs over time. A metric that flips between pass and fail run-to-run is telling you something about either your prompt or your judge, and stability surfaces it the same way flakiness surfaces an unreliable test.

Evals that never block CI

LLM evals call out to model providers, and provider hiccups shouldn't fail your pipeline's reporting. testrelic-deepeval writes failed uploads to a local offline queue (~/.testrelic/queue/) and replays them on demand:

Terminal
testrelic drain

It also auto-detects CI context — branch, commit, and CI URL — so every eval run is anchored to the change that produced it.

Migrating from Confident AI

Already sending evals to Confident AI? The package ships a helper that prints the migration steps:

Terminal
testrelic migrate-from-confident

In most cases it's just installing the package, running testrelic login, and pointing your existing deepeval test run at TestRelic — your eval code doesn't change.

Get started

Install testrelic-deepeval, run testrelic login, and run your existing eval suite. The results show up in your LLM Evaluations workspace immediately. This pairs with the broader Python SDK launch — evals are one of four Python packages we shipped today.

FAQ: Do I have to stop using Confident AI?

No — testrelic-deepeval reads DeepEval's in-memory results, so it works alongside whatever else you have configured. Most teams point evals at TestRelic to unify with their other tests.

FAQ: What Python version do I need?

Python 3.10 or newer (tested on 3.9–3.12). DeepEval itself installs via the optional extra: pip install "testrelic-deepeval[deepeval]".