Skip to main content

CI/CD & Testing

OpenClaw agents run in production 24/7 — so testing matters. This guide covers the full lifecycle: testing skills locally, scanning for security issues, validating agent behavior, automating deployments, and monitoring in production.


Testing Skills

Dry-Run Testing

Test a skill against a trigger phrase without installing it:

# Test a skill file with a sample message
openclaw skill test ./my-skill.md "trigger phrase here"

# Test an installed skill
openclaw skill test ~/.openclaw/skills/daily-standup.md "run my standup"

The agent processes the message as if it came from a real channel, but doesn't send any replies or execute side effects. You see the full reasoning chain, tool calls, and generated response.

Validation

Check that a skill file has valid frontmatter and structure before deploying:

# Validate format and structure
openclaw clawhub validate ./my-skill.md

This checks:

  • Valid YAML frontmatter (name, version, description, trigger)
  • No syntax errors in the Markdown body
  • Required fields are present
  • Tool references are valid

Heartbeat Dry-Run

Test your HEARTBEAT.md instructions without executing them:

# Preview what the heartbeat would do
openclaw heartbeat --now --dry-run

# Actually run one heartbeat cycle
openclaw heartbeat --now

Security Scanning

Built-In Scanners

OpenClaw includes static analysis that runs automatically when skills are installed or published:

ScannerWhat It ChecksSince
Static analysisPattern matching for known bad patternsv2026.2.6
VirusTotalSHA-256 hash check + Code Insight (Gemini-powered)v2026.2.6
Daily re-scanActive skills re-scanned for driftv2026.2.6
# Scan all installed skills
openclaw security scan --all

# Scan a specific skill
openclaw security scan ./skill.md

# Check a ClawHub skill's security report before installing
openclaw clawhub security-report <skill-name>

# View full source before installing
openclaw clawhub view <skill-name>

Third-Party Scanners

ToolWhat It Does
ClawdexPre-installation check against Koi Security's malicious skills database
SkillGuardFile scanner for vulnerability patterns
SafeClaw ScannerDetects prompt injections, backdoors, obfuscated code
Snyk mcp-scanFree Python tool powered by Snyk ML

Skill Workshop Gating

The Skill Workshop (v2026.6.1+) adds a proposal queue with scanner gating:

# Create a skill proposal (enters review queue)
openclaw skills workshop propose-create \
--name "deploy-helper" \
--description "Assists with deployments" \
--proposal ./PROPOSAL.md

# Scanner runs automatically at apply time
# Verdicts: Clean → applied, Suspicious → quarantined, Malicious → rejected

# Apply after review
openclaw skills workshop apply <proposal-id>

# Quarantine if suspicious
openclaw skills workshop quarantine <proposal-id> \
--reason "Unexpected external API calls"

Skills that fail scanning are blocked from activation. See the Skill Workshop guide for the full lifecycle.


Integration Testing

Testing Agent Responses

Test how your agent responds to specific inputs:

# Single-prompt test (no persistent session)
openclaw chat "What's the status of the staging deployment?"

# Test with context injection
openclaw chat --context ./test-data.json "Analyze this data"

# Test without memory (clean slate)
openclaw chat --no-memory "What do you know about our deployment schedule?"

Conversation Regression Testing

Create a test script that validates agent behavior across key scenarios:

test-agent.sh
#!/bin/bash
set -e

echo "=== Skill Trigger Test ==="
openclaw skill test ~/.openclaw/skills/deploy-helper.md "deploy to staging"

echo "=== Security Scan ==="
openclaw security scan --all

echo "=== Health Check ==="
openclaw doctor

echo "=== Channel Connectivity ==="
openclaw channel status

echo "=== MCP Server Status ==="
openclaw mcp status

echo "=== Config Validation ==="
openclaw config list > /dev/null

echo "All tests passed."
chmod +x test-agent.sh
./test-agent.sh

Testing MCP Servers

Verify MCP server connections and tool availability:

# Check all MCP servers
openclaw mcp doctor

# Probe a specific server's tools
openclaw mcp probe github

# Test a server interactively
npx @modelcontextprotocol/inspector

Testing Plugins

# Check plugin health
openclaw plugins doctor

# Inspect a specific plugin's runtime state
openclaw plugins inspect workboard --runtime

# View plugin logs
openclaw logs --filter plugin --follow

GitHub Actions

PR Review Bot

Automatically review pull requests with your OpenClaw agent:

.github/workflows/openclaw-review.yml
name: OpenClaw PR Review
on:
pull_request:
types: [opened, synchronize]

jobs:
review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Install OpenClaw
run: npm install -g openclaw

- name: Review PR
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
openclaw chat --once \
"Review this pull request. Check for bugs, security issues, \
and style. Post your review as a GitHub comment."
tip

Use a read-only GitHub token. The agent doesn't need push access to review code.

Skill Validation Pipeline

Validate skills on every push to your skills repository:

.github/workflows/validate-skills.yml
name: Validate Skills
on:
push:
paths:
- 'skills/**'
pull_request:
paths:
- 'skills/**'

jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Install OpenClaw
run: npm install -g openclaw

- name: Validate all skills
run: |
for skill in skills/*.md; do
echo "Validating $skill..."
openclaw clawhub validate "$skill"
done

- name: Security scan
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: |
for skill in skills/*.md; do
echo "Scanning $skill..."
openclaw security scan "$skill"
done

Deploy on Merge

Automatically deploy your agent when changes merge to main:

.github/workflows/deploy-agent.yml
name: Deploy Agent
on:
push:
branches: [main]
paths:
- 'config/**'
- 'skills/**'
- 'HEARTBEAT.md'
- 'SOUL.md'

jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install OpenClaw
run: npm install -g openclaw
- name: Validate config
run: openclaw clawhub validate config/openclaw.json || true
- name: Scan skills
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: |
for skill in skills/*.md; do
openclaw security scan "$skill"
done

deploy:
needs: validate
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Deploy to server
env:
DEPLOY_HOST: ${{ secrets.DEPLOY_HOST }}
DEPLOY_KEY: ${{ secrets.DEPLOY_SSH_KEY }}
run: |
mkdir -p ~/.ssh
echo "$DEPLOY_KEY" > ~/.ssh/deploy_key
chmod 600 ~/.ssh/deploy_key

# Sync config and skills
rsync -avz -e "ssh -i ~/.ssh/deploy_key -o StrictHostKeyChecking=no" \
config/ skills/ HEARTBEAT.md SOUL.md \
deploy@$DEPLOY_HOST:~/.openclaw/

# Restart gateway
ssh -i ~/.ssh/deploy_key deploy@$DEPLOY_HOST \
"openclaw gateway restart"

Nightly Health Check

Run a full diagnostic suite on a schedule:

.github/workflows/nightly-health.yml
name: Nightly Health Check
on:
schedule:
- cron: '0 3 * * *' # 3 AM UTC daily

jobs:
health:
runs-on: ubuntu-latest
steps:
- name: Health check
env:
AGENT_HOST: ${{ secrets.AGENT_HOST }}
SSH_KEY: ${{ secrets.SSH_KEY }}
run: |
mkdir -p ~/.ssh
echo "$SSH_KEY" > ~/.ssh/key
chmod 600 ~/.ssh/key

ssh -i ~/.ssh/key -o StrictHostKeyChecking=no \
deploy@$AGENT_HOST "openclaw doctor && openclaw mcp doctor && openclaw plugins doctor"

- name: Alert on failure
if: failure()
uses: slackapi/slack-github-action@v1
with:
payload: '{"text": "OpenClaw health check failed! Check the logs."}'
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}

GitLab CI

.gitlab-ci.yml
stages:
- validate
- scan
- deploy

validate-skills:
stage: validate
image: node:20
script:
- npm install -g openclaw
- for skill in skills/*.md; do openclaw clawhub validate "$skill"; done
only:
changes:
- skills/**

security-scan:
stage: scan
image: node:20
script:
- npm install -g openclaw
- openclaw security scan --all
variables:
ANTHROPIC_API_KEY: $ANTHROPIC_API_KEY
only:
changes:
- skills/**

deploy:
stage: deploy
image: node:20
script:
- apt-get update && apt-get install -y rsync openssh-client
- mkdir -p ~/.ssh
- echo "$DEPLOY_SSH_KEY" > ~/.ssh/deploy_key
- chmod 600 ~/.ssh/deploy_key
- rsync -avz -e "ssh -i ~/.ssh/deploy_key -o StrictHostKeyChecking=no"
config/ skills/ HEARTBEAT.md SOUL.md
deploy@$DEPLOY_HOST:~/.openclaw/
- ssh -i ~/.ssh/deploy_key deploy@$DEPLOY_HOST "openclaw gateway restart"
only:
- main
when: manual

Docker Deployment

Basic Docker Compose

docker-compose.yml
services:
openclaw:
image: openclaw/openclaw:latest
ports:
- "127.0.0.1:18789:18789"
volumes:
- openclaw-data:/root/.openclaw
environment:
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
restart: unless-stopped
healthcheck:
test: ["CMD", "openclaw", "doctor"]
interval: 30s
timeout: 10s
retries: 3

volumes:
openclaw-data:

Production-Hardened Docker

docker-compose.prod.yml
services:
openclaw:
image: openclaw/openclaw:latest
user: "1000:1000"
read_only: true
cap_drop:
- ALL
security_opt:
- no-new-privileges:true
tmpfs:
- /tmp:rw,noexec,nosuid,size=64M
volumes:
- openclaw-data:/home/node/.openclaw:rw
environment:
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
networks:
- openclaw-internal
deploy:
resources:
limits:
memory: 2G
cpus: "2.0"
healthcheck:
test: ["CMD", "openclaw", "doctor"]
interval: 30s
timeout: 10s
retries: 3
restart: unless-stopped

networks:
openclaw-internal:
driver: bridge

volumes:
openclaw-data:

Docker with Local Models

docker-compose.local-llm.yml
services:
openclaw:
image: openclaw/openclaw:latest
environment:
- OPENCLAW_BRAIN_PROVIDER=ollama
- OLLAMA_HOST=http://ollama:11434
depends_on:
ollama:
condition: service_healthy

ollama:
image: ollama/ollama:latest
volumes:
- ollama-models:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
capabilities: [gpu]
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
interval: 10s
timeout: 5s
retries: 5

volumes:
ollama-models:

Kubernetes (Helm)

helm repo add openclaw https://serhanekicii.github.io/openclaw-helm
helm install openclaw openclaw/openclaw -f values.yaml

The Helm chart provides:

  • StatefulSet with Chromium sidecar for web scraping
  • Non-root, read-only root filesystems
  • Init containers for auto-installing ClawHub skills
  • ArgoCD and Stakater Reloader compatible
  • Network policies with deny-all-ingress defaults

Pre-Deployment Checks

Run these before every deployment or upgrade:

Backup

# Full backup
tar czf ~/openclaw-backup-$(date +%Y%m%d).tar.gz \
~/.openclaw/openclaw.json \
~/.openclaw/workspace/ \
~/.openclaw/memory/ \
~/.openclaw/skills/
DirectoryContents
openclaw.jsonConfiguration, API keys, security settings
workspace/SOUL.md, IDENTITY.md, USER.md
memory/Persistent memory
skills/Installed skills

Validation Checklist

# 1. Health check
openclaw doctor

# 2. Security audit
openclaw security audit

# 3. Scan all skills
openclaw security scan --all

# 4. Check channels
openclaw channel list

# 5. Check MCP servers
openclaw mcp doctor

# 6. Check plugins
openclaw plugins doctor

# 7. Verify config loads
openclaw config list > /dev/null && echo "Config OK"

Upgrade Procedure

# 1. Backup (see above)

# 2. Check release notes
gh release view --repo openclaw/openclaw

# 3. Upgrade
npm update -g openclaw

# 4. Validate
openclaw doctor

# 5. Restart gateway
openclaw gateway restart

# 6. Verify channels reconnected
openclaw channel list

Rollback

If something breaks after an upgrade:

# Stop the agent
openclaw stop

# Restore from backup
tar xzf ~/openclaw-backup-YYYYMMDD.tar.gz -C /

# Downgrade to previous version
npm install -g openclaw@<previous-version>

# Restart
openclaw start

# Verify
openclaw doctor

Continuous Monitoring

Cron-Based Testing

Schedule recurring tests with OpenClaw's cron system:

# Nightly regression test
openclaw cron add "regression-test" \
--schedule "0 2 * * *" \
--message "Run the full test suite, compare against baseline, alert if regressions"

# Weekly integration check
openclaw cron add "integration-test" \
--schedule "0 3 * * 0" \
--message "Test all third-party integrations, verify API connectivity"

# Hourly health check during work hours
openclaw cron add "health-check" \
--schedule "0 9-17 * * 1-5" \
--message "Check API health, database connections, service status"

Heartbeat Monitoring

Use the heartbeat system for continuous self-monitoring:

~/.openclaw/HEARTBEAT.md
## System Health (every heartbeat cycle)

- Check that all channels are connected
- Verify MCP servers are responsive
- Monitor memory usage (alert if > 90%)
- Check disk usage (alert if > 85%)
- Review error logs since last heartbeat
- Send health summary to Telegram if any issues found

Production Diagnostics

# Real-time log monitoring
openclaw logs --follow

# Filter by component
openclaw logs --filter heartbeat --follow
openclaw logs --filter channel --follow

# Token usage
openclaw stats tokens

# Cost breakdown
openclaw gateway usage-cost

# Channel statistics
openclaw stats channels

# Per-heartbeat stats
openclaw stats heartbeat

Deep Security Audit

# Standard audit
openclaw security audit

# Deep audit (live WebSocket probe, browser exposure, plugin validation)
openclaw security audit --deep

# Auto-fix safe defaults (chmod, groupPolicy, logging)
openclaw security audit --fix

Lobster Workflow Shell

Lobster is OpenClaw's official workflow shell for typed CI/CD pipelines:

workflows/deploy-pipeline.yml
name: Deploy Pipeline
steps:
- name: lint
skill: code-lint
input: "{{ files.changed }}"

- name: test
skill: run-tests
needs: [lint]

- name: security-scan
skill: security-check
needs: [lint]

- name: deploy-staging
skill: deploy-staging
needs: [test, security-scan]

- name: smoke-test
skill: smoke-test
needs: [deploy-staging]

- name: deploy-production
skill: deploy-production
needs: [smoke-test]
approval: required

Key features:

  • Typed, local-first macro engine
  • Approval gates for side-effect actions (deploy, publish)
  • Stateful workflows with persistence
  • Data shaping tools (where, pick, head)
  • Reduces token usage via composable automation

Patterns

Multi-Agent DevOps Pipeline

Use the Workboard to coordinate specialized agents:

PR opened
├─ Reviewer Agent → Code review + security check
├─ Tester Agent → Test suite execution
└─ Deployer Agent → Staging → approval gate → Production

Each agent works independently, updating Workboard cards as they progress. The deployer agent waits for both review and test agents to complete before promoting.

See the Advanced Recipes guide for a full implementation.

Skill Version Control

Keep skills in a git repository for version control and CI:

skills-repo/
├── .github/
│ └── workflows/
│ └── validate.yml # CI pipeline
├── skills/
│ ├── deploy-helper.md
│ ├── code-review.md
│ └── incident-response.md
└── README.md

On every push, CI validates structure and scans for security issues. On merge to main, skills are synced to the production agent.

Canary Deployment

Deploy to a subset of agents first, then roll out:

# Deploy to canary agent (10% traffic)
rsync skills/ canary-host:~/.openclaw/skills/
ssh canary-host "openclaw gateway restart"

# Monitor for 30 minutes
# Check error rates, response quality, channel stability

# If healthy, deploy to remaining agents
rsync skills/ prod-host-1:~/.openclaw/skills/
rsync skills/ prod-host-2:~/.openclaw/skills/
ssh prod-host-1 "openclaw gateway restart"
ssh prod-host-2 "openclaw gateway restart"

Webhook-Triggered Deployments

Trigger agent actions from external CI/CD:

~/.openclaw/openclaw.json
{
"webhooks": {
"incoming": {
"enabled": true,
"secret": "${WEBHOOK_SECRET}",
"endpoints": [
{
"path": "/deploy-complete",
"message": "Deployment completed: {{body.service}} {{body.status}}. Run post-deploy checks."
},
{
"path": "/ci-failure",
"message": "CI failed for {{body.repo}} on branch {{body.branch}}. Error: {{body.error}}. Investigate and suggest fixes."
}
]
}
}
}
# Trigger from your CI pipeline
curl -X POST http://your-agent:18789/webhook/deploy-complete \
-H "Content-Type: application/json" \
-H "X-Webhook-Secret: $WEBHOOK_SECRET" \
-d '{"service": "api-v2", "status": "success", "commit": "abc123"}'

Checklist

A quick reference for production-ready deployments:

Before First Deploy

  • Config validated (openclaw doctor)
  • Security audit passed (openclaw security audit)
  • All skills scanned (openclaw security scan --all)
  • Channels tested individually
  • MCP servers probed (openclaw mcp doctor)
  • Backup script in place
  • Monitoring configured (heartbeat + health checks)
  • Rate limits set for all channels
  • Access controls configured (allowed_users, require_mention)

Before Every Update

  • Backup taken
  • Release notes reviewed
  • Skills re-validated after upgrade
  • openclaw doctor passes
  • Channels reconnected
  • Rollback plan ready

Ongoing

  • Nightly security scan (cron or CI)
  • Weekly integration test
  • Monthly credential rotation
  • Quarterly deep security audit

See Also