AI CI/CD with Harness: End-to-End Blueprint
·12 min read
AI delivery needs software discipline plus model-aware controls. This blueprint outlines a practical CI/CD architecture for teams running LLM-powered products with Harness.
Architecture overview
Developer PR -> CI build + unit tests -> Prompt checks + config validation -> Offline eval suite -> Security/policy validation -> Staging deployment -> Online canary verification -> Progressive prod rollout
Artifact model
- App artifact: API/service code.
- Prompt artifact: templates, system prompts, tool instructions.
- Evaluation artifact: dataset version + scoring outputs.
- Config artifact: model routing, fallback, budget limits.
Release decision matrix
| Condition | Decision |
|---|---|
| Quality improves, cost stable | Promote |
| Quality stable, cost rises sharply | Manual approval required |
| Critical scenario regression | Reject |
| High latency during canary | Pause rollout |
Suggested Harness stage layout
Stage A: Build Stage B: Static checks (prompt, schema, policy) Stage C: Eval gate (critical + full suites) Stage D: Deploy Staging Stage E: Verify (latency, error rate, quality probe) Stage F: Canary + auto-rollback Stage G: Full rollout + post-deploy report
Observability signals to keep
- Prompt version in every request trace.
- Model route and fallback reason in logs.
- Token usage and cost by endpoint.
- User feedback score by intent category.
Rollback strategy
- Rollback prompt/config first when app code is unchanged.
- Fallback to previous stable model route.
- Throttle high-risk endpoints while incident is open.
- Re-run critical eval suite before unpausing rollout.
Takeaway
AI CI/CD succeeds when release decisions are metric-driven and reversible. Harness gives the stage orchestration; your team must provide strong eval signals and clear rollback rules.