Deployment Guide
Your Parlant agent works great locally, but production deployment requires a few key changes. This guide walks you through setting up Parlant on Kubernetes with proper authentication, persistence, and scaling—whether you're deploying to AWS, Azure, or another cloud provider.
What This Guide Covers
This guide focuses on the infrastructure and deployment aspects unique to taking Parlant from local development to production. For topics like authentication policies, frontend integration, and agentic design principles, we'll reference the relevant documentation sections.
By the end of this guide, you'll have:
- A containerized Parlant application
- A production-ready Kubernetes deployment
- MongoDB persistence configured
- Load balancing and HTTPS termination
- A scalable, secure production environment
Architecture Overview
Here's what a typical Parlant production deployment looks like:
Key Components:
- Load Balancer: Handles SSL termination and routes traffic to Parlant pods
- Parlant Pods: Stateless application containers (horizontally scalable)
- MongoDB: Persistent storage for sessions and customer data
- LLM Provider: External API for NLP services (OpenAI, Anthropic, etc.)
Prerequisites
Before you begin, ensure you have:
Local Tools:
- Python 3.10 or higher
- Docker installed and running
- kubectl CLI tool
- Cloud provider CLI (AWS CLI or Azure CLI)
- A code editor
Cloud Resources:
- Access to AWS EKS or Azure AKS (or another Kubernetes provider)
- New to EKS? See the AWS EKS Getting Started Guide
- New to AKS? See the Azure AKS Quickstart
- A MongoDB instance (MongoDB Atlas recommended, or managed MongoDB from your cloud provider)
- (Optional) A domain name for your agent
- (Optional) SSL certificate (can use Let's Encrypt or cloud provider certificates)
Knowledge Prerequisites:
- Basic understanding of Kubernetes concepts (pods, services, deployments)
- Familiarity with environment variables and configuration management
- Basic Docker knowledge
This guide assumes you have a working Parlant agent running locally. If you haven't built your agent yet, start with the Installation guide.
Understanding Parlant's Production Requirements
Stateless Architecture
Parlant's server is designed to be stateless, which means:
- All session state is stored in MongoDB, not in memory
- Multiple Parlant pods can run simultaneously without coordination
- You can scale horizontally by adding more pods
- Pods can be restarted or replaced without losing data
This design makes Parlant naturally suited for cloud deployment and Kubernetes orchestration.
Persistence Layer
Parlant requires two MongoDB collections:
- Sessions: Stores conversation state, events, and history
- Customers: Stores customer profiles and associated data
Both collections must be accessible from all Parlant pods with consistent connection strings.
Port Configuration
By default, Parlant's FastAPI server listens on port 8800. In production:
- Your load balancer accepts HTTPS traffic on port
443 - The load balancer forwards to Parlant pods on port
8800 - Kubernetes services handle internal routing
Step 1: Prepare Your Production Application
Create a Production Configuration File
Create a production_config.py file to centralize your production settings:
# production_config.py
import os
import parlant.sdk as p
# MongoDB Configuration
MONGODB_SESSIONS_URI = os.environ["MONGODB_SESSIONS_URI"]
MONGODB_CUSTOMERS_URI = os.environ["MONGODB_CUSTOMERS_URI"]
# NLP Provider Configuration
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
ANTHROPIC_API_KEY = os.environ.get("ANTHROPIC_API_KEY")
# Server Configuration
SERVER_HOST = os.environ.get("SERVER_HOST", "0.0.0.0")
SERVER_PORT = int(os.environ.get("SERVER_PORT", "8800"))
# Choose your NLP service
NLP_SERVICE = p.NLPServices.openai # or p.NLPServices.anthropic
def get_mongodb_config():
"""Returns MongoDB configuration for Parlant."""
return {
"sessions_uri": MONGODB_SESSIONS_URI,
"customers_uri": MONGODB_CUSTOMERS_URI,
}
Update Your Main Application File
Modify your main application to use production configuration:
# main.py
import asyncio
import parlant.sdk as p
from production_config import (
get_mongodb_config,
NLP_SERVICE,
SERVER_HOST,
SERVER_PORT
)
from auth import ProductionAuthPolicy # We'll create this next
async def configure_container(container: p.Container) -> p.Container:
"""Configure production-specific dependencies."""
# Set up production authorization
container[p.AuthorizationPolicy] = ProductionAuthPolicy(
secret_key=os.environ["JWT_SECRET_KEY"],
)
return container
async def main():
"""Initialize and run the Parlant server."""
# MongoDB configuration
mongodb_config = get_mongodb_config()
async with p.Server(
host=SERVER_HOST,
port=SERVER_PORT,
nlp_service=NLP_SERVICE,
configure_container=configure_container,
**mongodb_config
) as server:
# Create or retrieve your agent
agents = await server.list_agents()
if not agents:
agent = await server.create_agent(
name="Production Agent",
description="Your agent description here"
)
# Set up your guidelines, journeys, etc.
await setup_agent_behavior(agent)
# Start serving requests
await server.serve()
async def setup_agent_behavior(agent: p.Agent):
"""Configure your agent's behavior."""
# Your guidelines, journeys, tools, etc.
pass
if __name__ == "__main__":
asyncio.run(main())
Set Up Production Authorization
Create an auth.py file with your production authorization policy:
# auth.py
import parlant.sdk as p
class ProductionAuthPolicy(p.ProductionAuthorizationPolicy):
"""Production authorization with your custom rules."""
def __init__(self, secret_key: str):
super().__init__()
self.secret_key = secret_key
# Add your custom authorization logic here
For comprehensive guidance on implementing JWT authentication, rate limiting, M2M tokens, and custom authorization policies, see the API Hardening guide.
Step 2: Containerize Your Application
Create an Optimized Dockerfile
Create a Dockerfile in your project root:
# Use Python 3.10 slim image
FROM python:3.10-slim
# Set working directory
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
build-essential \
&& rm -rf /var/lib/apt/lists/*
# Copy requirements first (for better caching)
COPY requirements.txt .
# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY . .
# Expose Parlant's default port
EXPOSE 8800
# Health check endpoint
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD python -c "import requests; requests.get('http://localhost:8800/health')" || exit 1
# Run the application
CMD ["python", "main.py"]
Create Requirements File
Your requirements.txt should include:
parlant>=3.0.0
pyjwt>=2.8.0
python-limits>=3.0.0
pymongo>=4.0.0
redis>=5.0.0
Build and Test Locally
Build your Docker image:
docker build -t parlant-agent:latest .
Test it locally with environment variables:
docker run -p 8800:8800 \
-e MONGODB_SESSIONS_URI="mongodb://localhost:27017/parlant_sessions" \
-e MONGODB_CUSTOMERS_URI="mongodb://localhost:27017/parlant_customers" \
-e OPENAI_API_KEY="your-key-here" \
-e JWT_SECRET_KEY="your-secret-here" \
parlant-agent:latest
Visit http://localhost:8800 to verify it's working.
Optimize Image Size (Optional)
For production, consider a multi-stage build to reduce image size. For more on optimizing Docker builds, see Docker's multi-stage build documentation.
# Stage 1: Builder
FROM python:3.10-slim as builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir --user -r requirements.txt
# Stage 2: Runtime
FROM python:3.10-slim
WORKDIR /app
# Copy only the dependencies from builder
COPY --from=builder /root/.local /root/.local
COPY . .
# Make sure scripts in .local are usable
ENV PATH=/root/.local/bin:$PATH
EXPOSE 8800
CMD ["python", "main.py"]
Step 3: Set Up MongoDB
You have two main options for MongoDB in production:
Option A: MongoDB Atlas (Recommended)
MongoDB Atlas is a fully managed service that handles backups, scaling, and maintenance. For a complete setup guide, see the official MongoDB Atlas Getting Started documentation.
Quick setup:
- Create a MongoDB Atlas account at https://www.mongodb.com/cloud/atlas
- Create a cluster (free tier works for development, paid tier for production)
- Set up database access with a user that has read/write permissions
- Configure network access for your Kubernetes cluster's IP range
- Get your connection string:
mongodb+srv://username:password@cluster0.xxxxx.mongodb.net/?retryWrites=true&w=majority
You'll need two connection strings (or one string with different database names):
# Option 1: Two separate databases
MONGODB_SESSIONS_URI=mongodb+srv://user:pass@cluster0.xxxxx.mongodb.net/parlant_sessions
MONGODB_CUSTOMERS_URI=mongodb+srv://user:pass@cluster0.xxxxx.mongodb.net/parlant_customers
# Option 2: Same database, Parlant will create collections
MONGODB_SESSIONS_URI=mongodb+srv://user:pass@cluster0.xxxxx.mongodb.net/parlant
MONGODB_CUSTOMERS_URI=mongodb+srv://user:pass@cluster0.xxxxx.mongodb.net/parlant
Option B: Self-Hosted MongoDB on Kubernetes
For advanced users who need full control. For detailed guidance, see the official Kubernetes documentation on running MongoDB with StatefulSets.
- StatefulSet
- MongoDB Secret
# mongodb-statefulset.yaml
apiVersion: v1
kind: Service
metadata:
name: mongodb
spec:
ports:
- port: 27017
targetPort: 27017
clusterIP: None
selector:
app: mongodb
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mongodb
spec:
serviceName: "mongodb"
replicas: 1
selector:
matchLabels:
app: mongodb
template:
metadata:
labels:
app: mongodb
spec:
containers:
- name: mongodb
image: mongo:7.0
ports:
- containerPort: 27017
volumeMounts:
- name: mongodb-data
mountPath: /data/db
env:
- name: MONGO_INITDB_ROOT_USERNAME
valueFrom:
secretKeyRef:
name: mongodb-secret
key: username
- name: MONGO_INITDB_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mongodb-secret
key: password
volumeClaimTemplates:
- metadata:
name: mongodb-data
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 20Gi
# mongodb-secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: mongodb-secret
type: Opaque
stringData:
username: admin
password: <your-secure-password>
connection-string: mongodb://admin:<your-secure-password>@mongodb:27017/
For production workloads, consider using a managed MongoDB service or implementing a proper replica set with backups, monitoring, and disaster recovery.
Step 4: Deploy to Kubernetes
Now we'll deploy Parlant to a Kubernetes cluster. We'll show examples for both AWS EKS and Azure AKS.
Create Kubernetes Secrets
First, create a Kubernetes Secret with your sensitive configuration:
# parlant-secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: parlant-secret
namespace: default
type: Opaque
stringData:
mongodb-sessions-uri: "mongodb+srv://user:pass@cluster0.xxxxx.mongodb.net/parlant_sessions"
mongodb-customers-uri: "mongodb+srv://user:pass@cluster0.xxxxx.mongodb.net/parlant_customers"
openai-api-key: "sk-..."
jwt-secret-key: "your-secure-random-string"
Apply it:
kubectl apply -f parlant-secret.yaml
For production, use a proper secrets management solution like AWS Secrets Manager, Azure Key Vault, or HashiCorp Vault instead of storing secrets directly in YAML files. See Kubernetes Secrets best practices for comprehensive guidance on secret management.
Create ConfigMap for Non-Sensitive Config
# parlant-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: parlant-config
namespace: default
data:
SERVER_HOST: "0.0.0.0"
SERVER_PORT: "8800"
Apply it:
kubectl apply -f parlant-configmap.yaml
Create Deployment
# parlant-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: parlant
namespace: default
labels:
app: parlant
spec:
replicas: 3
selector:
matchLabels:
app: parlant
template:
metadata:
labels:
app: parlant
spec:
containers:
- name: parlant
image: your-registry/parlant-agent:latest
ports:
- containerPort: 8800
name: http
env:
- name: MONGODB_SESSIONS_URI
valueFrom:
secretKeyRef:
name: parlant-secret
key: mongodb-sessions-uri
- name: MONGODB_CUSTOMERS_URI
valueFrom:
secretKeyRef:
name: parlant-secret
key: mongodb-customers-uri
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: parlant-secret
key: openai-api-key
- name: JWT_SECRET_KEY
valueFrom:
secretKeyRef:
name: parlant-secret
key: jwt-secret-key
envFrom:
- configMapRef:
name: parlant-config
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "2Gi"
cpu: "1000m"
livenessProbe:
httpGet:
path: /health
port: 8800
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /health
port: 8800
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
Apply it:
kubectl apply -f parlant-deployment.yaml
Create Service
# parlant-service.yaml
apiVersion: v1
kind: Service
metadata:
name: parlant-service
namespace: default
spec:
type: ClusterIP
selector:
app: parlant
ports:
- port: 8800
targetPort: 8800
protocol: TCP
name: http
Apply it:
kubectl apply -f parlant-service.yaml
Set Up Ingress for Load Balancing
The ingress configuration differs slightly between AWS and Azure.
- AWS EKS
- Azure AKS
For AWS EKS, use the AWS Load Balancer Controller with an Application Load Balancer:
First, install the AWS Load Balancer Controller:
# Add the EKS chart repo
helm repo add eks https://aws.github.io/eks-charts
helm repo update
# Install the controller
helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
-n kube-system \
--set clusterName=your-cluster-name \
--set serviceAccount.create=false \
--set serviceAccount.name=aws-load-balancer-controller
Create the Ingress:
# parlant-ingress-aws.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: parlant-ingress
namespace: default
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS": 443}]'
alb.ingress.kubernetes.io/ssl-redirect: '443'
# If you have an ACM certificate:
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:region:account:certificate/xxxxx
spec:
rules:
- host: your-domain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: parlant-service
port:
number: 8800
Apply it:
kubectl apply -f parlant-ingress-aws.yaml
Your ALB will be created automatically. Get the address:
kubectl get ingress parlant-ingress
For Azure AKS, use the Application Gateway Ingress Controller or NGINX Ingress:
Option 1: Application Gateway Ingress Controller
# parlant-ingress-azure.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: parlant-ingress
namespace: default
annotations:
kubernetes.io/ingress.class: azure/application-gateway
appgw.ingress.kubernetes.io/ssl-redirect: "true"
# If you have a certificate in Key Vault:
appgw.ingress.kubernetes.io/appgw-ssl-certificate: "your-cert-name"
spec:
rules:
- host: your-domain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: parlant-service
port:
number: 8800
Option 2: NGINX Ingress (simpler for getting started)
First, install NGINX Ingress Controller:
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install nginx-ingress ingress-nginx/ingress-nginx \
--namespace ingress-nginx \
--create-namespace \
--set controller.service.annotations."service\.beta\.kubernetes\.io/azure-load-balancer-health-probe-request-path"=/healthz
Then create the ingress:
# parlant-ingress-nginx.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: parlant-ingress
namespace: default
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt-prod # If using cert-manager
spec:
tls:
- hosts:
- your-domain.com
secretName: parlant-tls
rules:
- host: your-domain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: parlant-service
port:
number: 8800
Apply it:
kubectl apply -f parlant-ingress-nginx.yaml
Get the external IP:
kubectl get service nginx-ingress-ingress-nginx-controller -n ingress-nginx
Verify Deployment
Check that everything is running:
# Check pods
kubectl get pods -l app=parlant
# Check service
kubectl get service parlant-service
# Check ingress
kubectl get ingress parlant-ingress
# View logs
kubectl logs -l app=parlant --tail=100
# Follow logs in real-time
kubectl logs -l app=parlant -f
You should see pods in "Running" state and ready (e.g., 3/3).
Step 5: Configure Your Frontend
Once your Parlant backend is deployed, connect your frontend to it.
For React applications, use the official parlant-chat-react widget pointing to your production URL. For custom integrations or other frameworks, see the Custom Frontend guide for detailed instructions on:
- Event-driven conversation API
- Session management
- Message handling
- Real-time updates with long polling
- CORS configuration
Step 6: Production Hardening
Implement Authorization Policy
Set up a production-ready authorization policy with JWT authentication and rate limiting. See the API Hardening guide for:
- Custom authorization policy implementation
- JWT token validation
- Rate limiting configuration
- M2M token support
Set Up Input Moderation
Implement input moderation to prevent abuse. See the Input Moderation guide for details on content filtering and safety checks.
Configure Human Handoff
For scenarios where the AI agent needs to escalate to a human agent, see the Human Handoff guide.
Network Policies
Create network policies to restrict traffic:
# network-policy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: parlant-network-policy
namespace: default
spec:
podSelector:
matchLabels:
app: parlant
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: ingress-nginx
ports:
- protocol: TCP
port: 8800
egress:
- to:
- namespaceSelector: {}
ports:
- protocol: TCP
port: 443 # HTTPS for external APIs
- protocol: TCP
port: 27017 # MongoDB
Step 7: Scaling and Performance
Horizontal Pod Autoscaling
Configure autoscaling based on CPU/memory usage:
# parlant-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: parlant-hpa
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: parlant
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Apply it:
kubectl apply -f parlant-hpa.yaml
Resource Management
The deployment above includes resource requests and limits. Adjust these based on your workload:
resources:
requests:
memory: "512Mi" # Minimum guaranteed
cpu: "250m" # 0.25 CPU cores
limits:
memory: "2Gi" # Maximum allowed
cpu: "1000m" # 1 CPU core max
MongoDB Performance
For MongoDB Atlas:
- Use an appropriate tier (M10+ for production)
- Enable connection pooling (handled automatically by Parlant)
- Set up monitoring and alerts
For self-hosted MongoDB:
- Use a replica set for high availability
- Configure appropriate WiredTiger cache size
- Set up regular backups
Caching Strategies
Parlant's guideline matching engine includes built-in caching. For additional performance:
# In your configure_container function
from limits.storage import RedisStorage
container[p.RateLimiter] = p.BasicRateLimiter(
storage=RedisStorage("redis://redis-host:6379"),
# ... other configuration
)
Step 8: Monitoring and Observability
Health Checks
Parlant exposes a /health endpoint. Monitor it:
curl https://your-domain.com/health
Expected response:
{"status": "healthy"}
Logging
View application logs:
# All pods
kubectl logs -l app=parlant --tail=100
# Specific pod
kubectl logs parlant-xxxxxxxxx-xxxxx --tail=100
# Stream logs
kubectl logs -l app=parlant -f
Metrics Collection
Install Prometheus and Grafana for metrics:
# Add Prometheus helm repo
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
# Install Prometheus + Grafana
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace
Recommended Alerts
Set up alerts for:
- Pod restarts and crashes
- High memory/CPU usage
- Response time degradation
- MongoDB connection failures
- LLM API rate limits or errors
Step 9: CI/CD Integration
A typical CI/CD pipeline for Parlant deployment looks like this:
GitHub Actions Example
Create .github/workflows/deploy.yml:
name: Deploy to Production
on:
push:
branches:
- main
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v2
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-east-1
- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v1
- name: Build and push Docker image
env:
ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
ECR_REPOSITORY: parlant-agent
IMAGE_TAG: ${{ github.sha }}
run: |
docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG .
docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
docker tag $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG $ECR_REGISTRY/$ECR_REPOSITORY:latest
docker push $ECR_REGISTRY/$ECR_REPOSITORY:latest
- name: Update kubeconfig
run: aws eks update-kubeconfig --name your-cluster-name --region us-east-1
- name: Deploy to Kubernetes
run: |
kubectl set image deployment/parlant parlant=$ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
kubectl rollout status deployment/parlant
GitLab CI Example
Create .gitlab-ci.yml:
stages:
- build
- deploy
variables:
DOCKER_DRIVER: overlay2
IMAGE_TAG: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA
build:
stage: build
image: docker:latest
services:
- docker:dind
script:
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
- docker build -t $IMAGE_TAG .
- docker push $IMAGE_TAG
- docker tag $IMAGE_TAG $CI_REGISTRY_IMAGE:latest
- docker push $CI_REGISTRY_IMAGE:latest
only:
- main
deploy:
stage: deploy
image: bitnami/kubectl:latest
script:
- kubectl config use-context your-cluster-context
- kubectl set image deployment/parlant parlant=$IMAGE_TAG
- kubectl rollout status deployment/parlant
only:
- main
Rolling Updates
Kubernetes handles rolling updates automatically. Configure the strategy:
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
This ensures zero-downtime deployments.
Step 10: Troubleshooting
Troubleshooting Flow
Use this decision tree to diagnose common deployment issues:
Common Issues
Pods Not Starting
Check pod status and events:
kubectl describe pod parlant-xxxxxxxxx-xxxxx
Common causes:
- Image pull errors (check registry credentials)
- Resource limits (insufficient memory/CPU)
- Environment variable issues
MongoDB Connection Errors
Check connection string format and network access:
# Test from within a pod
kubectl exec -it parlant-xxxxxxxxx-xxxxx -- python -c "import pymongo; client = pymongo.MongoClient('your-connection-string'); print(client.server_info())"
Verify:
- Connection string format is correct
- MongoDB cluster allows connections from Kubernetes IP range
- Credentials are valid
Load Balancer Not Working
Check ingress status:
kubectl describe ingress parlant-ingress
Verify:
- Ingress controller is installed and running
- SSL certificate is configured correctly
- DNS is pointing to the load balancer
High Response Latency
Common causes and solutions:
- Guideline matching overhead: Review your agent's guidelines and optimize
- MongoDB performance: Check indexes and query performance
- LLM API latency: Consider using faster models or caching
- Insufficient resources: Scale up pods or increase resource limits
Check pod metrics:
kubectl top pods -l app=parlant
Memory Leaks
Monitor memory usage over time:
kubectl top pod parlant-xxxxxxxxx-xxxxx --use-protocol-buffers
If memory grows continuously, check:
- Large session histories (implement cleanup)
- Caching configuration
- Connection pool settings
Debug Mode
Enable debug logging by setting an environment variable:
env:
- name: LOG_LEVEL
value: "DEBUG"
Getting Help
If you encounter issues not covered here:
You can also:
- Check the GitHub Issues
- Join the Discord community
- Review the documentation
Production Checklist
Deployment Verification Flow
Use this checklist to verify your deployment is production-ready:
Pre-Launch Checklist
Before going live, verify you have:
Security:
- Production authorization policy implemented
- JWT authentication configured
- Rate limiting enabled
- Secrets stored securely (not in version control)
- Network policies configured
- HTTPS/TLS enabled
- Input moderation configured
Reliability:
- MongoDB backups configured
- Health checks and probes configured
- Resource limits set appropriately
- Horizontal pod autoscaling enabled
- Multiple replicas running
Monitoring:
- Logging configured and accessible
- Metrics collection set up
- Alerts configured for critical issues
- Health check monitoring enabled
Operations:
- CI/CD pipeline configured
- Deployment documentation written
- Rollback procedure tested
- Disaster recovery plan documented
Performance:
- Load testing completed
- Resource allocation optimized
- MongoDB indexes configured
- Caching configured appropriately
Next Steps
Now that your Parlant agent is deployed, consider:
-
Review Agentic Design Principles: See the Agentic Design Methodology guide for best practices on building effective agents
-
Implement Monitoring: Set up comprehensive monitoring and alerting for production issues
-
Plan for Scale: Test your deployment under expected load and adjust resources accordingly
-
Iterate on Agent Behavior: Use production feedback to refine guidelines, journeys, and tools
-
Document Your Setup: Maintain documentation for your team on deployment procedures and troubleshooting
Congratulations! You now have a production-ready Parlant deployment. Your AI agent is ready to handle real customer conversations at scale.