SAPS

Automatic Interactive Evaluation for Large Language Models with State Aware Patient Simulator

Venue: ArXiv
Paper: arXiv

Overview

SAPS focuses on clinical consultation, where the agent role-plays based on real hospital cases to assess whether the doctor can accurately diagnose the condition.
SAPS introduces State Tracking and Memory Bank mechanisms, forcing the patient simulator to decide "how much information to give" or "whether to refuse to answer" based strictly on the doctor's performance.

Key Features

It uses specific Prompts to judge in three stages:

Broad Category (Stage I): Classify the doctor's questions into Inquiry, Advice, Demand, Other Topic or End
Specificity (Stage II): Determine the question is specific or broad
Relevance (Stage III): Extract relevant information from the complete medical record

Based on the final state obtained by the State Tracker, SAPS selects the corresponding state_instruction prompt to generate the patient response.

How It Works

Load a Real Case Profile: Selects a patient case (medical record) as the simulation “ground truth”.
Track the Doctor’s Intent (State Tracking): For each doctor turn, classifies the utterance type and checks whether it is specific enough to answer.
Retrieve Only Relevant Evidence (Memory Bank): When needed, extracts the most relevant fragment(s) from the full medical record; otherwise withholds details to avoid over-disclosure.
Route to a State-Specific Behavior Prompt: Chooses a response policy/instruction prompt based on the final state (what to disclose / how to respond).
Generate Response with Conversation Context: Produces the patient reply conditioned on the selected instruction and multi-turn history, then updates the dialogue memory for the next turn.

Usage

CLI

uv run python -m examples.simulate client=saps therapist=user

Python

from omegaconf import OmegaConf
from patienthub.clients import get_client

config = OmegaConf.create(
    {
        "agent_type": "saps",
        "model_type": "OPENAI",
        "model_name": "gpt-4o",
        "temperature": 0.7,
        "max_tokens": 1024,
        "max_retries": 3,
        "data_path": "data/characters/SAPS.json",
        "data_idx": 0,
    }
)

client = get_client(configs=config, lang="en")
client.set_therapist({"name": "Doctor"})

response = client.generate_response(
    "Could you describe your main symptoms and when they started?"
)
print(response)

Configuration

Option	Description	Default
`data_path`	Path to character file	`data/characters/SAPS.json`
`data_idx`	Character index	`0`

Character Data Format

  {
    "id": 539,
    "patient_info": "Female, 62 years old, recurrence of symmetrical facial features three years post soft palate adhesion surgery. Normal jaw opening and mouth opening type, postoperative changes to the soft palate, no obvious masses, local scarring, local induration, ill-defined margins, no tenderness, no ulceration or bleeding, no erythema, intact permanent dentition, and no other apparent abnormalities. Several nodules less than 1 cm in diameter palpable in the submandibular area, no enlarged lymph nodes in the neck. Blood work dated 2015-09-16 05:29: White blood cell count 7.4 x 10^9/L (N); red blood cell count 3.44 x10^12/L (L); hemoglobin 103 g/L (L); hematocrit 0.304 (L); platelet count 236 x10^9/L (N); segmented neutrophils 63.5% (N). Serum chemistry dated 2015-09-16 07:17: serum potassium 3.11 mmol/L (L); serum sodium 144 mmol/L (N); serum chloride 107 mmol/L (H). Examination: Histological cryosection Result: “IIB” lymph node with tumor tissue present (+).Examination: Histological cryosection Specimens examined: LN \"Submandibular 1\" (one), \"Submandibular 2\" (one), both negative (-)"
  }

Resources

Two Doctor LLMs Test Sets

data/resources/SAPS/HospitalCases.json(In Chinese): Includes 50 real hospital cases.

data/resources/SAPS/MedicalExam.json(In English): Includes 150 cases selected from MedQA, MedMCQA, MMLU, SelfExam, and QMAX.

Overview​

Key Features​

How It Works​

Usage​

CLI​

Python​

Configuration​

Character Data Format​

Resources​