Brainberg
Master Production-Ready AI Agents: Evaluate & Ship With Confidence
AI Integration & ApplicationMeetupFreeOnline

Master Production-Ready AI Agents: Evaluate & Ship With Confidence

Wed 1 Jul Β· 20:00
< 50 attendees

About this event

Evaluating & Shipping Production-Ready AI Agents

Demos are easy. Production is where AI agents break.
Join Kwasi Ankomah, Lead AI Architect at SambaNova Systems (15+ years cross-industry experience), for the finale of our agentic AI series - a deep dive into the evaluation discipline that separates fragile demos from production-ready AI agents.

What you'll learn:
πŸ”Ή Why traditional software testing fails for non-deterministic, multi-step agents β€” and what actually works instead
πŸ”Ή The 4 evaluator types every production agent needs: rule-based, LLM-as-a-judge, trajectory, and recovery-from-failure
πŸ”Ή How to combine evaluators into a scorecard + regression gate that runs in CI on every prompt, model, tool, or architecture change
πŸ”Ή The state-of-the-art LangSmith workflow β€” datasets, experiments, and trace-based evaluation that catches failure/retry loops inside subagents
πŸ”Ή The open-source alternative: LangFuse + OpenTelemetry for full observability
πŸ”Ή Online evaluation and pass^k reliability β€” how to score live traffic and build real statistical confidence in agent performance

Why this matters?
Multi-agent systems are non-deterministic β€” same input, different intermediate steps, different tool calls, different outputs. That's what makes them powerful, and exactly why traditional QA can't keep up. Without structured evaluation, every prompt tweak or model swap is a gamble on production stability.
This session is built live, on a real system, using actual traces and telemetry β€” not slides.

Who should come?

  • AI/ML engineers working with LangGraph, CrewAI, or similar frameworks
  • Data scientists and architects running production LLM systems
  • Technical leads evaluating agent observability and CI tooling

Familiarity with supervisor-subagent patterns helps but isn't required (catch up via Session 5 of this series).

Bring your questions β€” if you've ever shipped an agent and wondered if it'll hold up, this one's for you.

Source: meetup