Skip to main content

Guideline Matching

Guideline matching is Parlant's core differentiator—the system that "lifts the curse" of instruction overload by dynamically filtering guidelines to only those relevant for each response. This page explains how guidelines are categorized, matched, and resolved.

The Matching Challenge

Why Semantic Similarity Isn't Enough

Consider this guideline:

condition="The customer's card was already declined"
action="Do not offer the same payment method again"

Semantic similarity can identify that the conversation involves "cards" and "declining." However, it cannot reliably determine whether the decline already happened in this conversation versus the customer merely mentioning declines. Temporal reasoning requires more than vector similarity.

Why Single-Strategy Matching Fails

Different guidelines need different evaluation depths:

# Simple: Pattern matching is enough
condition="Customer asks about store hours"

# Complex: Requires history analysis
condition="You've already explained the return policy in this conversation"

# Very Complex: Requires journey state reasoning
condition="Customer completed step 2 of the onboarding flow"

Applying a uniform strategy to all guidelines either wastes computational resources on simple cases or provides inadequate handling for complex ones.

Why Relationships Matter

Guidelines interact with one another. One guideline might specify "offer a discount" while another states "never discount premium products." Without explicit relationship handling, the agent could violate one guideline while following another.

Parlant's Solution

Category-batched LLM evaluation with relationship resolution:

  1. Categorize guidelines by their evaluation needs
  2. Batch each category with specialized prompts
  3. Evaluate batches in parallel
  4. Resolve relationships between matched guidelines

The Matching Pipeline

Algorithm

ALGORITHM: Guideline Matching Pipeline

INPUT: all_guidelines, context, previous_matches, journey_state
OUTPUT: matched_guidelines with scores and rationales

1. PREDICT relevant journeys:
IF journey_state has active journeys:
- Prioritize guidelines related to active journeys
- Include Top-K other journeys by semantic relevance
ELSE:
- Predict Top-K journeys likely to activate
- Only consider guidelines for those journeys

2. PRUNE guidelines:
- Include all guidelines for predicted/active journeys
- Include all non-journey-scoped guidelines
- Exclude guidelines for unpredicted inactive journeys
- Result: Candidate set (typically 10-30% of total)

3. CATEGORIZE each candidate:
- Check if observational (no action)
- Check if previously applied in this session
- Check if customer-dependent
- Check if journey node
- Check if disambiguation target
- Assign to appropriate category

4. CREATE batches:
- Group guidelines by category
- Respect batch_size from OptimizationPolicy
- Larger batches = fewer LLM calls but higher latency

5. EVALUATE batches in parallel:
FOR each batch (concurrently):
- Use category-specific prompt and ARQ structure
- LLM evaluates each guideline's condition
- Returns: [{ guideline, score (0-10), rationale, metadata }]

6. RESOLVE relationships:
- Load entailed guidelines (if A matches, B implicitly matches)
- Apply suppression (if A matches, B cannot match)
- Enforce priorities (when both match, which wins)
- Handle mutual exclusion

7. RETURN matched_guidelines

Stage 1: Journey Prediction

Journeys scope many guidelines. For example, 5 journeys with 20 guidelines each result in 100 journey-scoped guidelines, yet typically only 1-2 journeys are relevant at any given moment.

Journey prediction uses semantic similarity to estimate which journeys are likely to activate, allowing the matcher to skip evaluating guidelines for irrelevant journeys. This approach dramatically reduces the number of guidelines requiring LLM evaluation.

Stage 2: Guideline Pruning

After journey prediction, pruning further reduces the candidate set:

  • Active journey guidelines: Always included in evaluation.
  • Predicted journey guidelines: Included speculatively based on prediction results.
  • Global (non-journey) guidelines: Always included in evaluation.
  • Inactive/unpredicted journey guidelines: Excluded from evaluation.

This strategy maintains evaluation focus while still enabling detection of new journey activations.

Guideline Categories

Each category has a specialized prompt optimized for its evaluation needs.

Observational Guidelines

Purpose: Detect facts about the conversation without taking action.

condition="The customer mentioned a competitor product"
# No action - just records a fact

Evaluation needs: These guidelines require quick pattern matching with some reasoning to determine whether the conversation mentions a competitor and, if so, which one.

ARQ Structure:

{
"guideline_id": "...",
"condition": "The customer mentioned a competitor",
"rationale": "Customer said 'I saw it cheaper on Amazon'",
"applies": true
}

Observational guidelines are often used as dependencies for other guidelines.

Simple Actionable Guidelines

Purpose: Standard condition-action pairs for new behaviors.

condition="Customer asks about warranty"
action="Explain our 2-year warranty coverage"

Evaluation needs: These guidelines require moderate reasoning to determine whether the condition is met. The action is included in evaluation because conditions may reference it implicitly (for example, "As soon as it's done" requires understanding what "it" refers to).

ARQ Structure:

{
"guideline_id": "...",
"condition": "Customer asks about warranty",
"action": "Explain our 2-year warranty coverage",
"rationale": "Customer asked 'What if it breaks?'",
"applies": true
}

Previously Applied Guidelines

Purpose: Determine if an already-applied guideline should apply again.

condition="Customer wants to return an item"
action="Ask for their order number"

Consider a scenario where the agent already asked for the order number but the customer did not respond. Determining whether to ask again requires analyzing conversation history.

Evaluation needs: These guidelines require history analysis and partial action tracking.

ARQ Structure:

{
"guideline_id": "...",
"condition": "Customer wants to return an item",
"action": "Ask for their order number",
"condition_met_again": true,
"action_wasnt_taken": true,
"should_reapply": true
}

The sequential structure forces explicit reasoning about re-application.

Customer-Dependent Guidelines

Purpose: Actions that require customer response to complete.

condition="Customer wants to transfer money"
action="Ask for recipient name"

Determining whether the action completed requires checking if the customer provided the recipient name, which cannot be determined until the customer's next message arrives.

Evaluation needs: These guidelines require cross-turn analysis and special asynchronous handling.

These guidelines use a separate asynchronous evaluation after the agent responds, avoiding response delays while waiting for the determination.

Journey Node Selection

Purpose: Determine which step of a multi-step journey to execute.

When a journey is active, multiple nodes might seem applicable. Journey node selection reasons about:

  • Current journey state (which nodes have been visited)
  • Transition conditions (can we move to the next node?)
  • Backtracking (should we return to an earlier node?)

See Journeys for details on journey node matching.

Disambiguation Guidelines

Purpose: Resolve ambiguous customer intent.

condition="Customer says 'upgrade' without specifying what"
targets=[upgrade_subscription, upgrade_shipping, upgrade_product]

When multiple interpretations are possible, disambiguation creates a "virtual" guideline that prompts the customer to clarify their intent, but only among the specific targets defined in the configuration.

Relational Resolution

After LLM evaluation, relationships between guidelines are resolved:

Entailment

If guideline A matches, guideline B should also be considered matched.

# If "customer is VIP" matches...
guideline_a = condition="Customer is a VIP member"

# ...then "customer has loyalty status" also matches
guideline_b = condition="Customer has any loyalty status"

relationship = Entailment(source=a, target=b)

Entailed guidelines are added to the matched set without separate LLM evaluation.

Suppression

If guideline A matches, guideline B cannot match (even if it would otherwise).

guideline_a = condition="Customer explicitly declined upsell"
guideline_b = condition="Good opportunity for upsell"

relationship = Suppression(source=a, target=b)

Suppressed guidelines are removed from the matched set.

Priority

When both guidelines match, priority determines which takes precedence.

guideline_a = condition="Customer asks about pricing"
action="Quote standard pricing"

guideline_b = condition="Customer is enterprise tier"
action="Quote enterprise pricing"

relationship = Priority(high=b, low=a)

Lower-priority guidelines may be demoted or excluded depending on configuration.

Criticality-Based Processing

Guidelines have criticality levels that affect matching depth:

LevelMatching BehaviorMessage Generation
HIGHFull evaluation, no shortcutsARQ with explicit acknowledgment
MEDIUMStandard evaluationARQ in structured section
LOWMay be pruned aggressivelyStandard inclusion

High-criticality guidelines (compliance, safety) always get thorough evaluation. Low-criticality guidelines (nice-to-have behaviors) may be pruned when resources are constrained.

Why This Design?

Why LLM Instead of Rules?

Rule-based matching approaches (such as regex and keyword lists) cannot handle natural language conditions like "customer seems frustrated" or "after you've verified their identity." LLM evaluation enables expressive, natural-language conditions that would be impractical to encode as rules.

Why Batching?

Making individual LLM calls for each guideline is prohibitively slow—100 guidelines would require 100 separate calls. Conversely, evaluating all guidelines in a single call results in excessive context length and reduced accuracy. Category batching provides the right balance: guidelines with similar evaluation needs are grouped together and processed in parallel.

Why Categories?

Different guidelines require different prompts. A prompt optimized for determining whether a guideline should be reapplied includes fields that are irrelevant for evaluating whether a new condition matches. Specialized prompts improve accuracy without wasting tokens on unnecessary fields.

Tradeoffs

ChoiceBenefitCost
LLM evaluationEnables natural language conditionsIntroduces latency and computational cost
BatchingProvides parallelism and reduces total callsMay result in some accuracy loss compared to individual evaluation
CategoriesEnables specialized, optimized promptsIncreases implementation complexity
Journey pruningReduces the number of guidelines requiring evaluationMay miss unexpected journey activations

What's Next

  • Journeys: How multi-step workflows integrate with matching
  • Tool Calling: How matched guidelines trigger tools
  • Debugging: How to trace why a guideline did or didn't match