How to Design a Production-Grade CAMEL Multi-Agent System with Planning, Tool Use, Self-Consistency, and Critique-Driven Refinement

In this tutorial, we implement an advanced agentic AI system using the CAMEL framework, orchestrating multiple specialized agents to collaboratively solve a complex task. We design a structured multi-agent pipeline consisting of a planner, researcher, writer, critic, and rewriter, each with clearly defined responsibilities and schema-constrained outputs. We integrate tool usage, self-consistency sampling, structured validation with Pydantic, and iterative critique-driven refinement to build a robust, research-backed technical brief generator. Through this process, we demonstrate how modern agent architectures combine planning, reasoning, external tool interaction, and autonomous quality control within a single coherent workflow.

import os, sys, re, json, subprocess
from typing import List, Dict, Any, Optional, Tuple

def _pip_install(pkgs: List[str]):
subprocess.check_call([sys.executable, “-m”, “pip”, “install”, “-q”, “-U”] + pkgs)

_pip_install([“camel-ai[web_tools]~=0.2”, “pydantic>=2.7”, “rich>=13.7”])

from pydantic import BaseModel, Field
from rich.console import Console
from rich.panel import Panel
from rich.table import Table

console = Console()

def _get_colab_secret(name: str) -> Optional[str]:
try:
from google.colab import userdata
v = userdata.get(name)
return v if v else None
except Exception:
return None

def ensure_openai_key():
if os.getenv(“OPENAI_API_KEY”):
return
v = _get_colab_secret(“OPENAI_API_KEY”)
if v:
os.environ[“OPENAI_API_KEY”] = v
return
try:
from getpass import getpass
k = getpass(“Enter OPENAI_API_KEY (input hidden): “).strip()
if k:
os.environ[“OPENAI_API_KEY”] = k
except Exception:
pass

ensure_openai_key()
if not os.getenv(“OPENAI_API_KEY”):
raise RuntimeError(“OPENAI_API_KEY is not set. Add it via Colab Secrets (OPENAI_API_KEY) or paste it when prompted.”)

We set up the execution environment and install all required dependencies directly within Colab. We securely configure the OpenAI API key using either Colab secrets or manual input. We also initialize the console utilities that allow us to render structured outputs cleanly during execution.

from camel.models import ModelFactory
from camel.types import ModelPlatformType, ModelType
from camel.agents import ChatAgent
from camel.toolkits import SearchToolkit

def make_model(temperature: float = 0.2):
return ModelFactory.create(
model_platform=ModelPlatformType.OPENAI,
model_type=ModelType.GPT_4O,
model_config_dict={“temperature”: float(temperature)},
)

def strip_code_fences(s: str) -> str:
s = s.strip()
s = re.sub(r”^(?:json)?\s*”, “”, s, flags=re.IGNORECASE)
s = re.sub(r”\s*$”, “”, s)
return s.strip()

def extract_first_json_object(s: str) -> str:
s2 = strip_code_fences(s)
start = None
stack = []
for i, ch in enumerate(s2):
if ch == “{“:
if start is None:
start = i
stack.append(“{“)
elif ch == “}”:
if stack:
stack.pop()
if not stack and start is not None:
return s2[start:i+1]
m = re.search(r”\{[\s\S]*\}”, s2)
if m:
return m.group(0)
return s2

We import the core CAMEL components and define the model factory used across all agents. We implement helper utilities to clean and extract JSON reliably from LLM responses. This ensures that our multi-agent pipeline remains structurally robust even when models return formatted text.

class PlanTask(BaseModel):
id: str = Field(…, min_length=1)
title: str = Field(…, min_length=1)
objective: str = Field(…, min_length=1)
deliverable: str = Field(…, min_length=1)
tool_hints: List[str] = Field(default_factory=list)
risks: List[str] = Field(default_factory=list)

class Plan(BaseModel):
goal: str
assumptions: List[str] = Field(default_factory=list)
tasks: List[PlanTask]
success_criteria: List[str] = Field(default_factory=list)

class EvidenceItem(BaseModel):
query: str
notes: str
key_points: List[str] = Field(default_factory=list)

class Critique(BaseModel):
score_0_to_10: float = Field(…, ge=0, le=10)
strengths: List[str] = Field(default_factory=list)
issues: List[str] = Field(default_factory=list)
fix_plan: List[str] = Field(default_factory=list)

class RunConfig(BaseModel):
goal: str
max_tasks: int = 5
max_searches_per_task: int = 2
max_revision_rounds: int = 1
self_consistency_samples: int = 2

DEFAULT_GOAL = “Create a concise, evidence-backed technical brief explaining CAMEL (the multi-agent framework), its core abstractions, and a practical recipe to build a tool-using multi-agent pipeline (planner/researcher/writer/critic) with safeguards.”

cfg = RunConfig(goal=DEFAULT_GOAL)

search_tool = SearchToolkit().search_duckduckgo

We define all structured schemas using Pydantic for planning, evidence, critique, and runtime configuration. We formalize the agent communication protocol so that every step is validated and typed. This allows us to transform free-form LLM outputs into predictable, production-ready data structures.

planner_system = (
“You are a senior agent architect. Produce a compact, high-leverage plan for achieving the goal.\n”
“Return ONLY valid JSON that matches this schema:\n”
“{{\”goal\”: \”…\”, \”assumptions\”: [\”…\”], \”tasks\”: ”
“[{{\”id\”: \”T1\”, \”title\”: \”…\”, \”objective\”: \”…\”, \”deliverable\”: \”…\”, ”
“\”tool_hints\”: [\”…\”], \”risks\”: [\”…\”]}}], ”
“\”success_criteria\”: [\”…\”]}}\n”
“Constraints: tasks length <= {max_tasks}. Each task should be executable with web search + reasoning.”
).format(max_tasks=cfg.max_tasks)

planner = ChatAgent(system_message=planner_system, model=make_model(0.1))

researcher = ChatAgent(
system_message=(
“You are a meticulous research agent. Use the web search tool when useful.\n”
“You must:\n”
“- Search for authoritative sources (docs, official repos) first.\n”
“- Write notes that are directly relevant to the task objective.\n”
“- Return ONLY valid JSON:\n”
“{\”query\”: \”…\”, \”notes\”: \”…\”, \”key_points\”: [\”…\”]}\n”
“Do not include markdown code fences.”
),
model=make_model(0.2),
tools=[search_tool],
)

writer = ChatAgent(
system_message=(
“You are a technical writer agent. You will be given a goal, a plan, and evidence notes.\n”
“Write a deliverable that is clear, actionable, and concise.\n”
“Include:\n”
“- A crisp overview\n”
“- Key abstractions and how they connect\n”
“- A practical implementation recipe\n”
“- Minimal caveats/limitations\n”
“Do NOT fabricate citations. If evidence is thin, state uncertainty.\n”
“Return plain text only.”
),
model=make_model(0.3),
)

critic = ChatAgent(
system_message=(
“You are a strict reviewer. Evaluate the draft against the goal, correctness, and completeness.\n”
“Return ONLY valid JSON:\n”
“{\”score_0_to_10\”: 0.0, \”strengths\”: [\”…\”], \”issues\”: [\”…\”], \”fix_plan\”: [\”…\”]}\n”
“Do not include markdown code fences.”
),
model=make_model(0.0),
)

rewriter = ChatAgent(
system_message=(
“You are a revising editor. Improve the draft based on critique. Preserve factual accuracy.\n”
“Return the improved draft as plain text only.”
),
model=make_model(0.25),
)

We construct the specialized agents: planner, researcher, writer, critic, and rewriter. We define their system roles carefully to enforce task boundaries and structured behavior. This establishes the modular multi-agent architecture that enables collaboration and iterative refinement.

def plan_goal(goal: str) -> Plan:
resp = planner.step(“GOAL:\n” + goal + “\n\nReturn JSON plan now.”)
raw = resp.msgs[0].content if hasattr(resp, “msgs”) else resp.msg.content
js = extract_first_json_object(raw)
try:
return Plan.model_validate_json(js)
except Exception:
return Plan.model_validate(json.loads(js))

def research_task(task: PlanTask, goal: str, k: int) -> EvidenceItem:
prompt = (
“GOAL:\n” + goal + “\n\nTASK:\n” + task.model_dump_json(indent=2) + “\n\n”
f”Perform research. Use at most {k} web searches. First search official documentation or GitHub if relevant.”
)
resp = researcher.step(prompt)
raw = resp.msgs[0].content if hasattr(resp, “msgs”) else resp.msg.content
js = extract_first_json_object(raw)
try:
return EvidenceItem.model_validate_json(js)
except Exception:
return EvidenceItem.model_validate(json.loads(js))

def draft_with_self_consistency(goal: str, plan: Plan, evidence: List[Tuple[PlanTask, EvidenceItem]], n: int) -> str:
packed_evidence = []
for t, ev in evidence:
packed_evidence.append({
“task_id”: t.id,
“task_title”: t.title,
“objective”: t.objective,
“notes”: ev.notes,
“key_points”: ev.key_points
})
payload = {
“goal”: goal,
“assumptions”: plan.assumptions,
“tasks”: [t.model_dump() for t in plan.tasks],
“evidence”: packed_evidence,
“success_criteria”: plan.success_criteria,
}
drafts = []
for _ in range(max(1, n)):
resp = writer.step(“INPUT:\n” + json.dumps(payload, ensure_ascii=False, indent=2))
txt = resp.msgs[0].content if hasattr(resp, “msgs”) else resp.msg.content
drafts.append(txt.strip())
if len(drafts) == 1:
return drafts[0]
chooser = ChatAgent(
system_message=(
“You are a selector agent. Choose the best draft among candidates for correctness, clarity, and actionability.\n”
“Return ONLY the winning draft text, unchanged.”
),
model=make_model(0.0),
)
resp = chooser.step(“GOAL:\n” + goal + “\n\nCANDIDATES:\n” + “\n\n—\n\n”.join([f”[DRAFT {i+1}]\n{d}” for i, d in enumerate(drafts)]))
return (resp.msgs[0].content if hasattr(resp, “msgs”) else resp.msg.content).strip()

We implement the orchestration logic for planning, research, and self-consistent drafting. We aggregate structured evidence and generate multiple candidate drafts to improve robustness. We then select the best draft through an additional evaluation agent, simulating ensemble-style reasoning.

def critique_text(goal: str, draft: str) -> Critique:
resp = critic.step(“GOAL:\n” + goal + “\n\nDRAFT:\n” + draft + “\n\nReturn critique JSON now.”)
raw = resp.msgs[0].content if hasattr(resp, “msgs”) else resp.msg.content
js = extract_first_json_object(raw)
try:
return Critique.model_validate_json(js)
except Exception:
return Critique.model_validate(json.loads(js))

def revise(goal: str, draft: str, critique: Critique) -> str:
resp = rewriter.step(
“GOAL:\n” + goal +
“\n\nCRITIQUE:\n” + critique.model_dump_json(indent=2) +
“\n\nDRAFT:\n” + draft +
“\n\nRewrite now.”
)
return (resp.msgs[0].content if hasattr(resp, “msgs”) else resp.msg.content).strip()

def pretty_plan(plan: Plan):
tab = Table(title=”Agent Plan”, show_lines=True)
tab.add_column(“ID”, style=”bold”)
tab.add_column(“Title”)
tab.add_column(“Objective”)
tab.add_column(“Deliverable”)
for t in plan.tasks:
tab.add_row(t.id, t.title, t.objective, t.deliverable)
console.print(tab)

def run(cfg: RunConfig):
console.print(Panel.fit(“CAMEL Advanced Agentic Tutorial Runner”, style=”bold”))
plan = plan_goal(cfg.goal)
pretty_plan(plan)

evidence = []
for task in plan.tasks[: cfg.max_tasks]:
ev = research_task(task, cfg.goal, cfg.max_searches_per_task)
evidence.append((task, ev))

console.print(Panel.fit(“Drafting (self-consistency)”, style=”bold”))
draft = draft_with_self_consistency(cfg.goal, plan, evidence, cfg.self_consistency_samples)

for r in range(cfg.max_revision_rounds + 1):
crit = critique_text(cfg.goal, draft)
console.print(Panel.fit(f”Critique round {r+1} — score {crit.score_0_to_10:.1f}/10″, style=”bold”))
if crit.strengths:
console.print(Panel(“Strengths:\n- ” + “\n- “.join(crit.strengths), title=”Strengths”))
if crit.issues:
console.print(Panel(“Issues:\n- ” + “\n- “.join(crit.issues), title=”Issues”))
if crit.fix_plan:
console.print(Panel(“Fix plan:\n- ” + “\n- “.join(crit.fix_plan), title=”Fix plan”))
if crit.score_0_to_10 >= 8.5 or r >= cfg.max_revision_rounds:
break
draft = revise(cfg.goal, draft, crit)

console.print(Panel.fit(“FINAL DELIVERABLE”, style=”bold green”))
console.print(draft)

run(cfg)

We implement the critique-and-revision loop to enforce quality control. We score the draft, identify weaknesses, and iteratively refine it as needed. Finally, we execute the full pipeline, producing a structured, research-backed deliverable through coordinated collaboration among agents.

In conclusion, we built a production-style CAMEL-based multi-agent system that goes far beyond simple prompt chaining. We structured agent communication through validated schemas, incorporated web search tools for grounded reasoning, applied self-consistency to improve output reliability, and enforced quality using an internal critic loop. By combining these advanced ideas, we showed how we can construct scalable, modular, and reliable agentic pipelines suitable for real-world AI applications.

Check out the Full Codes with Notebook here. Also, feel free to follow us on Twitter and don’t forget to join our 130k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us

How to Design a Production-Grade CAMEL Multi-Agent System with Planning, Tool Use, Self-Consistency, and Critique-Driven Refinement

Inside the AI Power Move That Could Redefine Finance

Jacob Andreas and Brett McGuire named Edgerton Award winners | MIT News

Train-to-Test scaling explained: How to optimize your end-to-end AI compute budget for inference

OpenAI Agents SDK improves governance with sandbox execution

Thanks! We sent confirmation message to your inbox.

How to Start Making AI Videos in 2026 – Full Course

How to Use Claude AI in Excel – Complete Beginner Guide

Traders Invest $430M in Declining Oil Prices Just Before Trump Reveals Iran Ceasefire Extension – Bitcoin News

DeFi Platform Volo Hit by $3.5M Vault Attack, Begins Recovery Efforts

$1.4B Enters Crypto Funds, Marking Largest Weekly Influx Since Early This Year

Top Insights

Bitmine Accumulates 101,627 ETH, Marks Largest Weekly Increase in Four Months

From Trump’s Executive Orders to Congressional Inquiries, Deep-Sea Mining Gains Attention in Washington

How to Design a Production-Grade CAMEL Multi-Agent System with Planning, Tool Use, Self-Consistency, and Critique-Driven Refinement

Related Posts