Deployment Guide

Your Parlant agent works great locally, but production deployment requires a few key changes. This guide walks you through setting up Parlant on Kubernetes with proper authentication, persistence, and scaling—whether you're deploying to AWS, Azure, or another cloud provider.

What This Guide Covers

This guide focuses on the infrastructure and deployment aspects unique to taking Parlant from local development to production. For topics like authentication policies, frontend integration, and agentic design principles, we'll reference the relevant documentation sections.

By the end of this guide, you'll have:

A containerized Parlant application
A production-ready Kubernetes deployment
MongoDB persistence configured
Load balancing and HTTPS termination
A scalable, secure production environment

Need deployment help?

Architecture Overview

Here's what a typical Parlant production deployment looks like:

Key Components:

Load Balancer: Handles SSL termination and routes traffic to Parlant pods
Parlant Pods: Stateless application containers (horizontally scalable)
MongoDB: Persistent storage for sessions and customer data
LLM Provider: External API for NLP services (OpenAI, Anthropic, etc.)

Prerequisites

Before you begin, ensure you have:

Local Tools:

Python 3.10 or higher
Docker installed and running
kubectl CLI tool
Cloud provider CLI (AWS CLI or Azure CLI)
A code editor

Cloud Resources:

Access to AWS EKS or Azure AKS (or another Kubernetes provider)
- New to EKS? See the AWS EKS Getting Started Guide
- New to AKS? See the Azure AKS Quickstart
A MongoDB instance (MongoDB Atlas recommended, or managed MongoDB from your cloud provider)
(Optional) A domain name for your agent
(Optional) SSL certificate (can use Let's Encrypt or cloud provider certificates)

Knowledge Prerequisites:

Basic understanding of Kubernetes concepts (pods, services, deployments)
Familiarity with environment variables and configuration management
Basic Docker knowledge

Starting Point

This guide assumes you have a working Parlant agent running locally. If you haven't built your agent yet, start with the Installation guide.

Understanding Parlant's Production Requirements

Stateless Architecture

Parlant's server is designed to be stateless, which means:

All session state is stored in MongoDB, not in memory
Multiple Parlant pods can run simultaneously without coordination
You can scale horizontally by adding more pods
Pods can be restarted or replaced without losing data

This design makes Parlant naturally suited for cloud deployment and Kubernetes orchestration.

Persistence Layer

Parlant requires two MongoDB collections:

Sessions: Stores conversation state, events, and history
Customers: Stores customer profiles and associated data

Both collections must be accessible from all Parlant pods with consistent connection strings.

Port Configuration

By default, Parlant's FastAPI server listens on port 8800. In production:

Your load balancer accepts HTTPS traffic on port 443
The load balancer forwards to Parlant pods on port 8800
Kubernetes services handle internal routing

Step 1: Prepare Your Production Application

Create a Production Configuration File

Create a production_config.py file to centralize your production settings:

# production_config.py
import os
import parlant.sdk as p

# MongoDB Configuration
MONGODB_SESSIONS_URI = os.environ["MONGODB_SESSIONS_URI"]
MONGODB_CUSTOMERS_URI = os.environ["MONGODB_CUSTOMERS_URI"]

# NLP Provider Configuration
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
ANTHROPIC_API_KEY = os.environ.get("ANTHROPIC_API_KEY")

# Server Configuration
SERVER_HOST = os.environ.get("SERVER_HOST", "0.0.0.0")
SERVER_PORT = int(os.environ.get("SERVER_PORT", "8800"))

# Choose your NLP service
NLP_SERVICE = p.NLPServices.openai  # or p.NLPServices.anthropic

def get_mongodb_config():
    """Returns MongoDB configuration for Parlant."""
    return {
        "sessions_uri": MONGODB_SESSIONS_URI,
        "customers_uri": MONGODB_CUSTOMERS_URI,
    }

Update Your Main Application File

Modify your main application to use production configuration:

# main.py
import asyncio
import parlant.sdk as p
from production_config import (
    get_mongodb_config,
    NLP_SERVICE,
    SERVER_HOST,
    SERVER_PORT
)
from auth import ProductionAuthPolicy  # We'll create this next


async def configure_container(container: p.Container) -> p.Container:
    """Configure production-specific dependencies."""

    # Set up production authorization
    container[p.AuthorizationPolicy] = ProductionAuthPolicy(
        secret_key=os.environ["JWT_SECRET_KEY"],
    )

    return container


async def main():
    """Initialize and run the Parlant server."""

    # MongoDB configuration
    mongodb_config = get_mongodb_config()

    async with p.Server(
        host=SERVER_HOST,
        port=SERVER_PORT,
        nlp_service=NLP_SERVICE,
        configure_container=configure_container,
        **mongodb_config
    ) as server:

        # Create or retrieve your agent
        agents = await server.list_agents()

        if not agents:
            agent = await server.create_agent(
                name="Production Agent",
                description="Your agent description here"
            )

            # Set up your guidelines, journeys, etc.
            await setup_agent_behavior(agent)

        # Start serving requests
        await server.serve()


async def setup_agent_behavior(agent: p.Agent):
    """Configure your agent's behavior."""
    # Your guidelines, journeys, tools, etc.
    pass


if __name__ == "__main__":
    asyncio.run(main())

Set Up Production Authorization

Create an auth.py file with your production authorization policy:

# auth.py
import parlant.sdk as p


class ProductionAuthPolicy(p.ProductionAuthorizationPolicy):
    """Production authorization with your custom rules."""

    def __init__(self, secret_key: str):
        super().__init__()
        self.secret_key = secret_key
        # Add your custom authorization logic here

Authentication & Rate Limiting

For comprehensive guidance on implementing JWT authentication, rate limiting, M2M tokens, and custom authorization policies, see the API Hardening guide.

Step 2: Containerize Your Application

Create an Optimized Dockerfile

Create a Dockerfile in your project root:

# Use Python 3.10 slim image
FROM python:3.10-slim

# Set working directory
WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

# Copy requirements first (for better caching)
COPY requirements.txt .

# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Expose Parlant's default port
EXPOSE 8800

# Health check endpoint
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD python -c "import requests; requests.get('http://localhost:8800/health')" || exit 1

# Run the application
CMD ["python", "main.py"]

Create Requirements File

Your requirements.txt should include:

parlant>=3.0.0
pyjwt>=2.8.0
python-limits>=3.0.0
pymongo>=4.0.0
redis>=5.0.0

Build and Test Locally

Build your Docker image:

docker build -t parlant-agent:latest .

Test it locally with environment variables:

docker run -p 8800:8800 \
  -e MONGODB_SESSIONS_URI="mongodb://localhost:27017/parlant_sessions" \
  -e MONGODB_CUSTOMERS_URI="mongodb://localhost:27017/parlant_customers" \
  -e OPENAI_API_KEY="your-key-here" \
  -e JWT_SECRET_KEY="your-secret-here" \
  parlant-agent:latest

Visit http://localhost:8800 to verify it's working.

Optimize Image Size (Optional)

For production, consider a multi-stage build to reduce image size. For more on optimizing Docker builds, see Docker's multi-stage build documentation.

# Stage 1: Builder
FROM python:3.10-slim as builder

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir --user -r requirements.txt

# Stage 2: Runtime
FROM python:3.10-slim

WORKDIR /app

# Copy only the dependencies from builder
COPY --from=builder /root/.local /root/.local
COPY . .

# Make sure scripts in .local are usable
ENV PATH=/root/.local/bin:$PATH

EXPOSE 8800

CMD ["python", "main.py"]

Step 3: Set Up MongoDB

You have two main options for MongoDB in production:

Option A: MongoDB Atlas (Recommended)

MongoDB Atlas is a fully managed service that handles backups, scaling, and maintenance. For a complete setup guide, see the official MongoDB Atlas Getting Started documentation.

Quick setup:

Create a MongoDB Atlas account at https://www.mongodb.com/cloud/atlas
Create a cluster (free tier works for development, paid tier for production)
Set up database access with a user that has read/write permissions
Configure network access for your Kubernetes cluster's IP range

Get your connection string:

mongodb+srv://username:password@cluster0.xxxxx.mongodb.net/?retryWrites=true&w=majority

You'll need two connection strings (or one string with different database names):

# Option 1: Two separate databases
MONGODB_SESSIONS_URI=mongodb+srv://user:pass@cluster0.xxxxx.mongodb.net/parlant_sessions
MONGODB_CUSTOMERS_URI=mongodb+srv://user:pass@cluster0.xxxxx.mongodb.net/parlant_customers

# Option 2: Same database, Parlant will create collections
MONGODB_SESSIONS_URI=mongodb+srv://user:pass@cluster0.xxxxx.mongodb.net/parlant
MONGODB_CUSTOMERS_URI=mongodb+srv://user:pass@cluster0.xxxxx.mongodb.net/parlant

Option B: Self-Hosted MongoDB on Kubernetes

For advanced users who need full control. For detailed guidance, see the official Kubernetes documentation on running MongoDB with StatefulSets.

StatefulSet
MongoDB Secret

# mongodb-statefulset.yaml
apiVersion: v1
kind: Service
metadata:
  name: mongodb
spec:
  ports:
  - port: 27017
    targetPort: 27017
  clusterIP: None
  selector:
    app: mongodb
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mongodb
spec:
  serviceName: "mongodb"
  replicas: 1
  selector:
    matchLabels:
      app: mongodb
  template:
    metadata:
      labels:
        app: mongodb
    spec:
      containers:
      - name: mongodb
        image: mongo:7.0
        ports:
        - containerPort: 27017
        volumeMounts:
        - name: mongodb-data
          mountPath: /data/db
        env:
        - name: MONGO_INITDB_ROOT_USERNAME
          valueFrom:
            secretKeyRef:
              name: mongodb-secret
              key: username
        - name: MONGO_INITDB_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mongodb-secret
              key: password
  volumeClaimTemplates:
  - metadata:
      name: mongodb-data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 20Gi

# mongodb-secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: mongodb-secret
type: Opaque
stringData:
  username: admin
  password: <your-secure-password>
  connection-string: mongodb://admin:<your-secure-password>@mongodb:27017/

Production MongoDB

For production workloads, consider using a managed MongoDB service or implementing a proper replica set with backups, monitoring, and disaster recovery.

Step 4: Deploy to Kubernetes

Now we'll deploy Parlant to a Kubernetes cluster. We'll show examples for both AWS EKS and Azure AKS.

Create Kubernetes Secrets

First, create a Kubernetes Secret with your sensitive configuration:

# parlant-secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: parlant-secret
  namespace: default
type: Opaque
stringData:
  mongodb-sessions-uri: "mongodb+srv://user:pass@cluster0.xxxxx.mongodb.net/parlant_sessions"
  mongodb-customers-uri: "mongodb+srv://user:pass@cluster0.xxxxx.mongodb.net/parlant_customers"
  openai-api-key: "sk-..."
  jwt-secret-key: "your-secure-random-string"

Apply it:

kubectl apply -f parlant-secret.yaml

Secret Management

For production, use a proper secrets management solution like AWS Secrets Manager, Azure Key Vault, or HashiCorp Vault instead of storing secrets directly in YAML files. See Kubernetes Secrets best practices for comprehensive guidance on secret management.

Create ConfigMap for Non-Sensitive Config

# parlant-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: parlant-config
  namespace: default
data:
  SERVER_HOST: "0.0.0.0"
  SERVER_PORT: "8800"

Apply it:

kubectl apply -f parlant-configmap.yaml

Create Deployment

# parlant-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: parlant
  namespace: default
  labels:
    app: parlant
spec:
  replicas: 3
  selector:
    matchLabels:
      app: parlant
  template:
    metadata:
      labels:
        app: parlant
    spec:
      containers:
      - name: parlant
        image: your-registry/parlant-agent:latest
        ports:
        - containerPort: 8800
          name: http
        env:
        - name: MONGODB_SESSIONS_URI
          valueFrom:
            secretKeyRef:
              name: parlant-secret
              key: mongodb-sessions-uri
        - name: MONGODB_CUSTOMERS_URI
          valueFrom:
            secretKeyRef:
              name: parlant-secret
              key: mongodb-customers-uri
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: parlant-secret
              key: openai-api-key
        - name: JWT_SECRET_KEY
          valueFrom:
            secretKeyRef:
              name: parlant-secret
              key: jwt-secret-key
        envFrom:
        - configMapRef:
            name: parlant-config
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8800
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /health
            port: 8800
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 3

Apply it:

kubectl apply -f parlant-deployment.yaml

Create Service

# parlant-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: parlant-service
  namespace: default
spec:
  type: ClusterIP
  selector:
    app: parlant
  ports:
  - port: 8800
    targetPort: 8800
    protocol: TCP
    name: http

Apply it:

kubectl apply -f parlant-service.yaml

Set Up Ingress for Load Balancing

The ingress configuration differs slightly between AWS and Azure.

AWS EKS
Azure AKS

For AWS EKS, use the AWS Load Balancer Controller with an Application Load Balancer:

First, install the AWS Load Balancer Controller:

# Add the EKS chart repo
helm repo add eks https://aws.github.io/eks-charts
helm repo update

# Install the controller
helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
  -n kube-system \
  --set clusterName=your-cluster-name \
  --set serviceAccount.create=false \
  --set serviceAccount.name=aws-load-balancer-controller

Create the Ingress:

# parlant-ingress-aws.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: parlant-ingress
  namespace: default
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS": 443}]'
    alb.ingress.kubernetes.io/ssl-redirect: '443'
    # If you have an ACM certificate:
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:region:account:certificate/xxxxx
spec:
  rules:
  - host: your-domain.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: parlant-service
            port:
              number: 8800

Apply it:

kubectl apply -f parlant-ingress-aws.yaml

Your ALB will be created automatically. Get the address:

kubectl get ingress parlant-ingress

For Azure AKS, use the Application Gateway Ingress Controller or NGINX Ingress:

Option 1: Application Gateway Ingress Controller

# parlant-ingress-azure.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: parlant-ingress
  namespace: default
  annotations:
    kubernetes.io/ingress.class: azure/application-gateway
    appgw.ingress.kubernetes.io/ssl-redirect: "true"
    # If you have a certificate in Key Vault:
    appgw.ingress.kubernetes.io/appgw-ssl-certificate: "your-cert-name"
spec:
  rules:
  - host: your-domain.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: parlant-service
            port:
              number: 8800

Option 2: NGINX Ingress (simpler for getting started)

First, install NGINX Ingress Controller:

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update

helm install nginx-ingress ingress-nginx/ingress-nginx \
  --namespace ingress-nginx \
  --create-namespace \
  --set controller.service.annotations."service\.beta\.kubernetes\.io/azure-load-balancer-health-probe-request-path"=/healthz

Then create the ingress:

# parlant-ingress-nginx.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: parlant-ingress
  namespace: default
  annotations:
    kubernetes.io/ingress.class: nginx
    cert-manager.io/cluster-issuer: letsencrypt-prod  # If using cert-manager
spec:
  tls:
  - hosts:
    - your-domain.com
    secretName: parlant-tls
  rules:
  - host: your-domain.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: parlant-service
            port:
              number: 8800

Apply it:

kubectl apply -f parlant-ingress-nginx.yaml

Get the external IP:

kubectl get service nginx-ingress-ingress-nginx-controller -n ingress-nginx

Verify Deployment

Check that everything is running:

# Check pods
kubectl get pods -l app=parlant

# Check service
kubectl get service parlant-service

# Check ingress
kubectl get ingress parlant-ingress

# View logs
kubectl logs -l app=parlant --tail=100

# Follow logs in real-time
kubectl logs -l app=parlant -f

You should see pods in "Running" state and ready (e.g., 3/3).

Step 5: Configure Your Frontend

Once your Parlant backend is deployed, connect your frontend to it.

For React applications, use the official parlant-chat-react widget pointing to your production URL. For custom integrations or other frameworks, see the Custom Frontend guide for detailed instructions on:

Event-driven conversation API
Session management
Message handling
Real-time updates with long polling
CORS configuration

Step 6: Production Hardening

Implement Authorization Policy

Set up a production-ready authorization policy with JWT authentication and rate limiting. See the API Hardening guide for:

Custom authorization policy implementation
JWT token validation
Rate limiting configuration
M2M token support

Set Up Input Moderation

Implement input moderation to prevent abuse. See the Input Moderation guide for details on content filtering and safety checks.

Configure Human Handoff

For scenarios where the AI agent needs to escalate to a human agent, see the Human Handoff guide.

Network Policies

Create network policies to restrict traffic:

# network-policy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: parlant-network-policy
  namespace: default
spec:
  podSelector:
    matchLabels:
      app: parlant
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: ingress-nginx
    ports:
    - protocol: TCP
      port: 8800
  egress:
  - to:
    - namespaceSelector: {}
    ports:
    - protocol: TCP
      port: 443  # HTTPS for external APIs
    - protocol: TCP
      port: 27017  # MongoDB

Step 7: Scaling and Performance

Horizontal Pod Autoscaling

Configure autoscaling based on CPU/memory usage:

# parlant-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: parlant-hpa
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: parlant
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Apply it:

kubectl apply -f parlant-hpa.yaml

Resource Management

The deployment above includes resource requests and limits. Adjust these based on your workload:

resources:
  requests:
    memory: "512Mi"   # Minimum guaranteed
    cpu: "250m"       # 0.25 CPU cores
  limits:
    memory: "2Gi"     # Maximum allowed
    cpu: "1000m"      # 1 CPU core max

MongoDB Performance

For MongoDB Atlas:

Use an appropriate tier (M10+ for production)
Enable connection pooling (handled automatically by Parlant)
Set up monitoring and alerts

For self-hosted MongoDB:

Use a replica set for high availability
Configure appropriate WiredTiger cache size
Set up regular backups

Caching Strategies

Parlant's guideline matching engine includes built-in caching. For additional performance:

# In your configure_container function
from limits.storage import RedisStorage

container[p.RateLimiter] = p.BasicRateLimiter(
    storage=RedisStorage("redis://redis-host:6379"),
    # ... other configuration
)

Step 8: Monitoring and Observability

Health Checks

Parlant exposes a /health endpoint. Monitor it:

curl https://your-domain.com/health

Expected response:

{"status": "healthy"}

Logging

View application logs:

# All pods
kubectl logs -l app=parlant --tail=100

# Specific pod
kubectl logs parlant-xxxxxxxxx-xxxxx --tail=100

# Stream logs
kubectl logs -l app=parlant -f

Metrics Collection

Install Prometheus and Grafana for metrics:

# Add Prometheus helm repo
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Install Prometheus + Grafana
helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace

Recommended Alerts

Set up alerts for:

Pod restarts and crashes
High memory/CPU usage
Response time degradation
MongoDB connection failures
LLM API rate limits or errors

Step 9: CI/CD Integration

A typical CI/CD pipeline for Parlant deployment looks like this:

GitHub Actions Example

Create .github/workflows/deploy.yml:

name: Deploy to Production

on:
  push:
    branches:
      - main

jobs:
  deploy:
    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v3

    - name: Configure AWS credentials
      uses: aws-actions/configure-aws-credentials@v2
      with:
        aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
        aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        aws-region: us-east-1

    - name: Login to Amazon ECR
      id: login-ecr
      uses: aws-actions/amazon-ecr-login@v1

    - name: Build and push Docker image
      env:
        ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
        ECR_REPOSITORY: parlant-agent
        IMAGE_TAG: ${{ github.sha }}
      run: |
        docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG .
        docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
        docker tag $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG $ECR_REGISTRY/$ECR_REPOSITORY:latest
        docker push $ECR_REGISTRY/$ECR_REPOSITORY:latest

    - name: Update kubeconfig
      run: aws eks update-kubeconfig --name your-cluster-name --region us-east-1

    - name: Deploy to Kubernetes
      run: |
        kubectl set image deployment/parlant parlant=$ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
        kubectl rollout status deployment/parlant

GitLab CI Example

Create .gitlab-ci.yml:

stages:
  - build
  - deploy

variables:
  DOCKER_DRIVER: overlay2
  IMAGE_TAG: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA

build:
  stage: build
  image: docker:latest
  services:
    - docker:dind
  script:
    - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
    - docker build -t $IMAGE_TAG .
    - docker push $IMAGE_TAG
    - docker tag $IMAGE_TAG $CI_REGISTRY_IMAGE:latest
    - docker push $CI_REGISTRY_IMAGE:latest
  only:
    - main

deploy:
  stage: deploy
  image: bitnami/kubectl:latest
  script:
    - kubectl config use-context your-cluster-context
    - kubectl set image deployment/parlant parlant=$IMAGE_TAG
    - kubectl rollout status deployment/parlant
  only:
    - main

Rolling Updates

Kubernetes handles rolling updates automatically. Configure the strategy:

spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 1

This ensures zero-downtime deployments.

Step 10: Troubleshooting

Troubleshooting Flow

Use this decision tree to diagnose common deployment issues:

Common Issues

Pods Not Starting

Check pod status and events:

kubectl describe pod parlant-xxxxxxxxx-xxxxx

Common causes:

Image pull errors (check registry credentials)
Resource limits (insufficient memory/CPU)
Environment variable issues

MongoDB Connection Errors

Check connection string format and network access:

# Test from within a pod
kubectl exec -it parlant-xxxxxxxxx-xxxxx -- python -c "import pymongo; client = pymongo.MongoClient('your-connection-string'); print(client.server_info())"

Verify:

Connection string format is correct
MongoDB cluster allows connections from Kubernetes IP range
Credentials are valid

Load Balancer Not Working

Check ingress status:

kubectl describe ingress parlant-ingress

Verify:

Ingress controller is installed and running
SSL certificate is configured correctly
DNS is pointing to the load balancer

High Response Latency

Common causes and solutions:

Guideline matching overhead: Review your agent's guidelines and optimize
MongoDB performance: Check indexes and query performance
LLM API latency: Consider using faster models or caching
Insufficient resources: Scale up pods or increase resource limits

Check pod metrics:

kubectl top pods -l app=parlant

Memory Leaks

Monitor memory usage over time:

kubectl top pod parlant-xxxxxxxxx-xxxxx --use-protocol-buffers

If memory grows continuously, check:

Large session histories (implement cleanup)
Caching configuration
Connection pool settings

Debug Mode

Enable debug logging by setting an environment variable:

env:
- name: LOG_LEVEL
  value: "DEBUG"

Getting Help

If you encounter issues not covered here:

Get deployment support

You can also:

Check the GitHub Issues
Join the Discord community
Review the documentation

Production Checklist

Deployment Verification Flow

Use this checklist to verify your deployment is production-ready:

Pre-Launch Checklist

Before going live, verify you have:

Security:

Production authorization policy implemented
JWT authentication configured
Rate limiting enabled
Secrets stored securely (not in version control)
Network policies configured
HTTPS/TLS enabled
Input moderation configured

Reliability:

MongoDB backups configured
Health checks and probes configured
Resource limits set appropriately
Horizontal pod autoscaling enabled
Multiple replicas running

Monitoring:

Logging configured and accessible
Metrics collection set up
Alerts configured for critical issues
Health check monitoring enabled

Operations:

CI/CD pipeline configured
Deployment documentation written
Rollback procedure tested
Disaster recovery plan documented

Performance:

Load testing completed
Resource allocation optimized
MongoDB indexes configured
Caching configured appropriately

Next Steps

Now that your Parlant agent is deployed, consider:

Review Agentic Design Principles: See the Agentic Design Methodology guide for best practices on building effective agents
Implement Monitoring: Set up comprehensive monitoring and alerting for production issues
Plan for Scale: Test your deployment under expected load and adjust resources accordingly
Iterate on Agent Behavior: Use production feedback to refine guidelines, journeys, and tools
Document Your Setup: Maintain documentation for your team on deployment procedures and troubleshooting

Questions about deployment?

Congratulations! You now have a production-ready Parlant deployment. Your AI agent is ready to handle real customer conversations at scale.

What This Guide Covers​

Architecture Overview​

Prerequisites​

Understanding Parlant's Production Requirements​

Stateless Architecture​

Persistence Layer​

Port Configuration​

Step 1: Prepare Your Production Application​

Create a Production Configuration File​

Update Your Main Application File​

Set Up Production Authorization​

Step 2: Containerize Your Application​

Create an Optimized Dockerfile​

Create Requirements File​

Build and Test Locally​

Optimize Image Size (Optional)​

Step 3: Set Up MongoDB​

Option A: MongoDB Atlas (Recommended)​

Option B: Self-Hosted MongoDB on Kubernetes​

Step 4: Deploy to Kubernetes​

Create Kubernetes Secrets​

Create ConfigMap for Non-Sensitive Config​

Create Deployment​

Create Service​

Set Up Ingress for Load Balancing​

Verify Deployment​

Step 5: Configure Your Frontend​

Step 6: Production Hardening​

Implement Authorization Policy​

Set Up Input Moderation​

Configure Human Handoff​

Network Policies​

Step 7: Scaling and Performance​

Horizontal Pod Autoscaling​

Resource Management​

MongoDB Performance​

Caching Strategies​

Step 8: Monitoring and Observability​

Health Checks​

Logging​

Metrics Collection​

Recommended Alerts​

Step 9: CI/CD Integration​

GitHub Actions Example​

GitLab CI Example​

Rolling Updates​

Step 10: Troubleshooting​

Troubleshooting Flow​

Common Issues​

Debug Mode​

Getting Help​

Production Checklist​

Deployment Verification Flow​

Pre-Launch Checklist​

Next Steps​

What This Guide Covers

Architecture Overview

Prerequisites

Understanding Parlant's Production Requirements

Stateless Architecture

Persistence Layer

Port Configuration

Step 1: Prepare Your Production Application

Create a Production Configuration File

Update Your Main Application File

Set Up Production Authorization

Step 2: Containerize Your Application

Create an Optimized Dockerfile

Create Requirements File

Build and Test Locally

Optimize Image Size (Optional)

Step 3: Set Up MongoDB

Option A: MongoDB Atlas (Recommended)

Option B: Self-Hosted MongoDB on Kubernetes

Step 4: Deploy to Kubernetes

Create Kubernetes Secrets

Create ConfigMap for Non-Sensitive Config

Create Deployment

Create Service

Set Up Ingress for Load Balancing

Verify Deployment

Step 5: Configure Your Frontend

Step 6: Production Hardening

Implement Authorization Policy

Set Up Input Moderation

Configure Human Handoff

Network Policies

Step 7: Scaling and Performance

Horizontal Pod Autoscaling

Resource Management

MongoDB Performance

Caching Strategies

Step 8: Monitoring and Observability

Health Checks

Logging

Metrics Collection

Recommended Alerts

Step 9: CI/CD Integration

GitHub Actions Example

GitLab CI Example

Rolling Updates

Step 10: Troubleshooting

Troubleshooting Flow

Common Issues

Debug Mode

Getting Help

Production Checklist

Deployment Verification Flow

Pre-Launch Checklist

Next Steps