Architecture

testrelic-deepeval is designed to sit beside DeepEval without altering it. It reads DeepEval's in-memory evaluation result and uploads it to TestRelic — there is no monkey-patching and no HTTP redirect of DeepEval's own traffic.

How it fits together

            pytest session
  ┌───────────────────────────────────────┐
  │  DeepEval plugin        TestRelic       │
  │  (runs metrics,         plugin          │
  │   builds the run)  ──▶  (reads the run   │
  │                         at session end)  │
  └────────────────────────────┬────────────┘
                               │ HTTPS
                               ▼
                   TestRelic native evals API

DeepEval builds a complete evaluation result before it would upload anywhere. The TestRelic plugin runs at session finish, reads that completed result, serializes it, and sends it to TestRelic's native /api/v1/evals/* API. Because it reads the finished result rather than hooking DeepEval's network layer, it stays decoupled from DeepEval's internals.

Why a pytest plugin

The plugin path captures the way most teams already run DeepEval — deepeval test run and assert_test — with no code changes. For programmatic evaluations that don't use pytest, the evaluate() wrapper provides the same upload behavior. Both read the finished evaluation result; neither changes how DeepEval computes scores.

What gets captured

For each evaluation run:

Run-level — identifier, dataset alias, hyperparameters, duration, evaluation cost, and start/end timestamps.
Per case — input, actual output, expected output, retrieval context, context, tools called, and pass/fail.
Per metric (per case) — metric name, score, threshold, pass/fail, reason, evaluation model, evaluation cost, error, and verbose logs.

Plus auto-detected context:

Git — branch, commit, commit message, and author.
CI — provider (GitHub Actions, GitLab CI, Jenkins, CircleCI) and run URL.

Offline queue

If an upload can't be delivered (network down, or the cloud is unreachable), the payload is written to a local queue at ~/.testrelic/queue/. The next successful run drains it, or you can replay it manually with testrelic drain. This keeps the SDK from ever failing your CI for reasons unrelated to your tests.

What it deliberately does not do

It doesn't run your metrics. Score computation stays in your process, where your LLM credentials live.
It doesn't proxy LLM calls. Your OPENAI_API_KEY (or whichever provider key) never touches TestRelic.
It doesn't intercept other uploaders. If you keep a Confident AI key set, DeepEval still uploads there too; TestRelic's upload happens in addition. To send to TestRelic only, unset the Confident AI key — see Migrating from Confident AI.

How it fits together​

Why a pytest plugin​

What gets captured​

Offline queue​

What it deliberately does not do​

Next steps​

How it fits together

Why a pytest plugin

What gets captured

Offline queue

What it deliberately does not do

Next steps