Consistency Score
The degree to which the AI maintains coherent information, personality, and logic across turns and sessions.
- →W&B: AI Agent Evaluation Metrics and Best Practices
Framework for evaluating consistency in AI systems
- →Dialzara: Metrics for Evaluating Conversational AI
Industry standards for conversational coherence
Fully Consistent
No contradictions across entire conversation or session, remembers and references previous statements accurately, maintains stable personality/tone throughout, logical coherence across all responses.
Example: User asks about pricing in message 1, asks follow-up in message 10 → AI provides consistent pricing information without contradiction
Strong Consistency
Minimal contradictions, quickly self-corrects if noticed. Good context retention across turns, stable personality and tone, logical flow maintained.
Adequate Consistency
Occasional minor contradictions that don't undermine core message. Generally remembers context but may need reminders, mostly stable personality with slight variations, logic mostly sound with occasional gaps.
Inconsistent
Multiple contradictions within conversation, forgets important context from earlier turns. Personality/tone shifts noticeably, logical gaps that require user to re-explain.
Poor Consistency
Frequent contradictions that confuse user, minimal context retention. Unstable personality (formal → casual → formal), logical incoherence across responses.
Incoherent
Direct contradictions within same response, no context retention even within few turns. Personality completely unstable, responses don't follow from user's input.
Each conversation is evaluated across 4 dimensions with specific point allocations:
Factual Consistency (0-3 points)
- • 3: No contradictory factual statements
- • 2: Minor contradictions that don't affect core information
- • 1: Some contradictions that create confusion
- • 0: Major contradictions or incoherent information
Context Retention (0-3 points)
- • 3: Accurately references and builds on all prior turns
- • 2: Remembers most context, occasional gaps
- • 1: Remembers only recent context (last 1-2 turns)
- • 0: No context retention
Personality Stability (0-2 points)
- • 2: Consistent tone, personality, and interaction style
- • 1: Generally stable with minor variations
- • 0: Unstable or contradictory personality
Logical Coherence (0-2 points)
- • 2: All responses follow logically from context and prior statements
- • 1: Mostly logical with occasional non-sequiturs
- • 0: Illogical or contradictory reasoning