
CAIML #42
About this event
CAIML #42 is going to happen on May 19, 2026, at MobiLab Solutions.
We will have two talks with additional time for networking.
Talk 1: Witold Czaplewski (Full Stack Engineer at MobiLab Solutions): AI Engineering on Databricks
Writing a prompt is easy. Building a production-ready GenAI system is not. In this session, we explore what AI engineering means in practice and how Databricks supports the full lifecycle of GenAI applications, from experimentation to deployment, monitoring, and governance. We cover the core concepts behind production AI systems, including the roles of software engineering, data engineering, and data science. The session walks through the end-to-end GenAI lifecycle on Databricks, including MLflow, batch and real-time inference, prompt management, and serving models or agents. Finally, we address governance, security, and responsible AI, with examples related to the EU AI Act, transparency, data governance, and operational risk management. A short live demo complements the discussion and shows how these capabilities come together in a practical setup.
Talk 2: Simonas Cerniauskas (Machine Learning Engineer at tisix.io): Building Reliable AI Agents: A DSPy-based Quality Assurance Framework
As publishers increasingly adopt AI agents for content generation and analysis, ensuring output quality and reliability becomes critical. This talk introduces a novel quality assurance framework built with DSPy that addresses the unique challenges of evaluating AI agents in publishing workflows. Using real-world examples from newsroom implementations, I will demonstrate how to design and implement systematic testing pipelines that verify factual accuracy, content consistency, and compliance with editorial standards. Attendees will learn practical techniques for building reliable agent evaluation systems that go beyond simple metrics to ensure AI-generated content meets professional publishing standards. This presentation addresses one of the most pressing challenges in professional publishing today: ensuring quality and reliability when deploying AI agents in editorial environments. We'll take a deep dive into how DSPy's programmatic approach to language model development can be leveraged to create robust testing and validation pipelines that meet the demanding standards of modern newsrooms. The discussion begins by exploring the current landscape of AI evaluation in publishing workflows, examining why traditional testing approaches fall short when dealing with language models, and identifying the specific quality requirements unique to journalistic and editorial content. We'll then move into a detailed technical exploration of solutions built with DSPy, demonstrating how to design modular evaluation pipelines, implement publishing-specific metrics, and create automated systems for fact-checking and consistency validation. Special attention will be given to the integration of knowledge graphs for reference-based evaluation and the incorporation of these systems into broader MLOps workflows. To ground these concepts in reality, we'll examine a detailed case study of implementing this framework in an actual newsroom environment. This will include practical discussions of handling various content types, along with strategies for managing test data and evaluation criteria. We'll share real-world performance monitoring approaches and concrete improvement strategies that have proven successful in production environments. The presentation concludes with hard-won insights and best practices, including practical strategies for finding the right balance between automated testing and human review, effective approaches to handling edge cases, and methods for scaling quality assurance processes across diverse content teams. Throughout the talk, we'll share code examples and practical implementations that attendees can adapt for their own projects.
We will share an agenda soon.
Source: meetup