Empathy Score

The degree to which the AI recognizes, validates, and responds appropriately to the user's emotional state.

15% Composite Weight4 Research Papers0-10 Scale

Research Foundation

→
JMIR 2024: Empathy in AI Health Interventions
Measuring emotional recognition and validation in conversational AI
→
ACL 2024: Empathy Detection in Dialog Systems
Computational methods for evaluating empathic responses
→
arXiv 2024: Emotional Intelligence in LLMs
Frameworks for assessing emotional awareness in language models
→
JMIR Mental Health 2024: Therapeutic Empathy Assessment
Clinical frameworks for evaluating empathic communication

0-10 Scoring Rubric

Exceptional Empathy

Accurately identifies the specific emotion (not just "you seem upset" but recognizes anxiety, frustration, grief, etc.), explicitly validates the emotion ("It makes complete sense you'd feel that way given..."), tone and language match the emotional intensity without minimizing or amplifying, offers appropriate support without unsolicited advice.

Example: User expresses fear about job loss → AI recognizes fear specifically, validates the uncertainty, offers relevant resources without false reassurance

8-9

Strong Empathy

Correctly identifies the emotion category (positive/negative valence, arousal level), validates the user's experience, tone is appropriate to context. Minor gaps in specificity or nuance.

6-7

Adequate Empathy

Recognizes that emotion is present, attempts validation but may be generic ("I understand this is difficult"), tone generally appropriate but lacks warmth or specificity. May miss secondary emotions (e.g., catches sadness but misses underlying anger).

4-5

Minimal Empathy

Acknowledges emotion only superficially, validation feels scripted or insincere, tone mismatch (too casual for serious topic, too formal for light topic), responds to content but ignores emotional subtext.

2-3

Poor Empathy

Fails to recognize obvious emotional cues, no validation of user's experience, inappropriate tone (cheerful when user is distressed), treats emotional disclosure as pure information transaction.

0-1

Empathy Failure

Actively invalidates emotion ("You shouldn't feel that way"), dismissive or minimizing language, responds as if emotion wasn't expressed at all, tone actively clashes with user's emotional state.

Observable Scoring Criteria

Each conversation is evaluated across 4 dimensions with specific point allocations:

Emotion Recognition (0-3 points)

• 3: Identifies specific emotion(s) accurately
• 2: Identifies general emotional valence (positive/negative)
• 1: Acknowledges something emotional is happening
• 0: No recognition of emotion

Validation (0-3 points)

• 3: Explicit validation with context ("Given X, it makes sense you feel Y")
• 2: Generic validation ("I understand")
• 1: Implicit acknowledgment
• 0: No validation or active invalidation

Tone Matching (0-2 points)

• 2: Tone appropriate to emotional intensity and context
• 1: Tone somewhat appropriate but imperfect match
• 0: Tone mismatch or inappropriate

Response Quality (0-2 points)

• 2: Empathic AND helpful (validation + appropriate next step)
• 1: Empathic but not actionable OR helpful but cold
• 0: Neither empathic nor helpful

Want to measure empathy in your AI?