Skip to main content

Profile Judge

profile_judge is the profile evaluator in PatientHub. It uses an LLM plus a prompt-defined schema to inspect a generated client profile and return structured results.

Use Cases

  • Evaluate whether generated client profiles are coherent and internally consistent
  • Score profile quality with a prompt-defined rubric
  • Extract specific profile issues or contradictions
  • Benchmark different profile generators with the same evaluation schema
Notes
  • prompt_path must be provided by configuration. ProfileJudge itself does not hard-code client_profile.yaml.
  • Use a model that supports structured response schemas. The evaluator depends on structured output for response_format.
  • Unsupported paradigms are skipped when building the schema.
  • This evaluator is a good fit for judging profile quality before running a conversation simulation.

Overview

PropertyValue
Keyprofile_judge
Classpatienthub.evaluators.profile.ProfileJudge
Base Classpatienthub.evaluators.base.LLMJudge
Primary InputA payload containing profile
Prompt Sourceconfigs.prompt_path
Output StylePrompt-driven structured JSON

How It Works

profile_judge is built on top of LLMJudge:

  • patienthub/evaluators/profile.py validates that profile data exists.
  • patienthub/evaluators/base.py loads the prompt, builds the schema, calls the model, and returns structured output.

At runtime, the evaluator:

  1. Loads the YAML prompt from prompt_path
  2. Reads dimensions from that YAML
  3. Dynamically builds Pydantic response models for each dimension
  4. Renders the prompt with Jinja2 using the provided data
  5. Calls the chat model with response_format=<dimension model>
  6. Returns one structured result per dimension

Configuration

Hydra Configuration

defaults:
- _self_
- evaluator: profile_judge

evaluator:
prompt_path: data/prompts/evaluator/client_profile.yaml
use_reasoning: false
model_type: OPENAI
model_name: gpt-4o

Python Usage

from omegaconf import OmegaConf
from patienthub.evaluators import get_evaluator
from patienthub.utils import load_json

session = load_json("data/sessions/default/badtherapist.json")

configs = OmegaConf.create({
"agent_type": "profile_judge",
"prompt_path": "data/prompts/evaluator/client_profile.yaml",
"use_reasoning": False,
"model_type": "OPENAI",
"model_name": "gpt-4o",
})

evaluator = get_evaluator(configs=configs, lang="en")
results = evaluator.evaluate(session)
print(results)

Command Line

uv run python -m examples.evaluate \
evaluator=profile_judge \
evaluator.prompt_path=data/prompts/evaluator/client_profile.yaml \
evaluator.model_type=OPENAI \
evaluator.model_name=gpt-4o \
input_dir=data/sessions/default/badtherapist.json \
output_dir=data/evaluations/default/profile_judge_demo.json

Prompt Structure

profile_judge does not hard-code a specific prompt file. Instead, it reads whatever YAML file you provide in prompt_path.

A minimal prompt looks like this:

sys_prompt: |
You are evaluating the quality of a generated client profile.

## Profile:
{{data.profile}}

Return the judgment in the required structured format.

dimensions:
- name: consistency
description: Whether the profile is internally consistent.
paradigm: scalar
range: [1, 5]

The bundled example prompt:

  • data/prompts/evaluator/client_profile.yaml

Supported Paradigms

Each dimension in the prompt must use one of the paradigms supported by LLMJudge:

ParadigmReturned FieldExample
binarylabel: bool{"label": true}
scalarscore: int{"score": 4}
categoricallabel: Literal[...]{"label": "very consistent"}
extractionsnippets: List[str]{"snippets": ["Profile: ..."]}

If use_reasoning=true, each leaf result also includes a reasoning field.

Input Format

profile_judge expects a payload containing a profile object:

{
"profile": {
"name": "Alex",
"history": "Has anxiety around family events.",
"situation": "Invited to a cousin's wedding.",
"emotion": ["anxious", "sad"]
}
}

Output Format

The output shape is determined by your prompt's dimensions.

For the minimal prompt above, the result would look like:

{
"consistency": {
"score": 4
}
}

If use_reasoning=true, the leaf result includes a short explanation:

{
"consistency": {
"score": 4,
"reasoning": "The profile is mostly coherent and does not contain major internal contradictions."
}
}

With the bundled prompt data/prompts/evaluator/client_profile.yaml, a typical result can contain multiple paradigms in one run:

{
"consistency_binary": {
"label": true
},
"consistency_categorical": {
"label": "very consistent"
},
"consistency_scalar": {
"score": 4
},
"consistency_extraction": {
"snippets": [
"The profile consistently describes anxiety around family events."
]
}
}

See Also