Maxim AI release notes

🔉 Tracing and evaluation support for LiveKit Agents

New

LiveKit is a powerful platform for building real-time video, audio, and data applications. You can now integrate Maxim's Observability suite with your LiveKit voice agents to capture detailed insights into conversation flows, function calls, and performance metrics in real-time.

With just 3 lines of code, you can:

Trace multi-turn voice recordings for granular evaluations and observability.
Automatically capture the details of LLM calls and tool/function calls.
Monitor the entire session recordings and transcripts in a unified view.
Debug and optimize your voice AI agents with an interactive Gantt chart of the entire session.

Learn more about our SDK integration for LiveKit to get started.

🧠 Gemini 2.5 model family is live on Maxim!

New

Google’s latest Gemini 2.5 models are now available on Maxim. Access Gemini 2.5 Pro, Flash, and Pro Experimental -- offering advanced reasoning capabilities, faster response times, and improved efficiency for your experimentation and eval workflows.

Start using these models via the Google provider on Maxim:
✅ Go to Settings > Models > Select Google provider > Add Gemini 2.5 Pro, Flash, or Pro Experimental

📊 Revamped Log Dashboard

New

We've made the logs dashboard more customizable than ever with dynamic charts and custom metric widgets, giving you centralized control over the metrics that matter most to you and your agents' performance.

Key highlights:

📈 Dynamic charts: Create custom charts to visualize key metrics like evaluation scores and trace counts. Debug directly from these charts and drill down into logs for faster root cause analysis.

🔍 Advanced aggregations: Apply functions like Sum and Average to gain collective insights on metrics, and use Group by to aggregate logs by model, tag, and other attributes for deeper analysis.

📧 Routine email overviews: Configure daily, weekly, or monthly email summaries to stay informed about your application's performance trends without constant manual checks.

Start by going to the "Dashboards" window and creating a "Custom logs dashboard".

⏱️ Visualize Agentic flows with Waterfall View in Logs

New

We've introduced a new waterfall view for logs, providing a clear, step-by-step visualization of your workflow's execution. See how sessions, traces, generations, and other components progress over time, with a breakdown of the time spent at each step and their order of occurrence. This enhancement helps you:

Identify performance bottlenecks across single and multi-turn workflows.
Understand the sequence of events in your agentic system.
Click on any step's bar to access detailed information about that step and for easier debugging.

📝 Easy Annotations in Evaluation run report

Improvement

You can now add human scores, annotations, and rewrite output directly from the entry detail view -- no more jumping between detailed view and the report view to add human ratings.

Easily navigate across entries, review agent responses, and assign scores -- all in one seamless, unified interface.

🛠️ Debug multi-turn simulations with the new Logs panel

New

The new Logs panel in the Simulation window provides deeper insights into your HTTP endpoint interactions, offering greater visibility when simulating and debugging multi-turn conversations with your agent.

You can simply define console.log() statements in the Scripts tab, within the preScript, postScript, preSimulation, or postSimulation functions, depending on the stage at which you wish to log, and start capturing the parameters you want to monitor.

🚀 CrewAI and Maxim integration!

New

We’re excited to announce our native integration with CrewAI – bringing powerful evaluation & observability capabilities to every agent builder, with just one line of code!

Here's what you get out of the box:

End-to-end agent tracing: Track your agent’s complete lifecycle, including tool calls, agent trajectories, and decision flows effortlessly.
Performance analytics + evals: Run detailed evaluations on full traces or individual nodes for single- and multi-turn integration, as well as simulations for real-world testing
Built-in alerting: Set triggers on error, cost, token usage, user feedback, latency, and get real-time alerts via Slack or PagerDuty.

Get started now: CrewAI Docs (Maxim integration)

🧪 New evals for multi-turn and SQL-based use cases!

New

We’ve added a new set of evaluators (LLM-as-a-judge and statistical) to help you ship high-quality AI applications, with a strong focus on evals for agentic and NL-to-SQL workflows. Key highlights:

Multi-turn evals: Evaluate if an agent successfully completes user tasks, makes correct tool choices, executes and completes the required steps, and follows the correct trajectory to achieve user goals.
SQL evals: Validate the syntax and adherence to DB schema, and evaluate the correctness of SQL queries generated from natural language input.
Tool call evals: Check whether the model selected the correct tool with the right parameters, and measure how accurately it called the expected tools.

You can add these to your workspaces from the Evaluator Store and start using!

🔭 Public API for OpenTelemetry trace ingestion

New

You can now send your OpenTelemetry GenAI traces directly to Maxim with a single-line code change, unlocking comprehensive LLM observability. Maxim supports semantic conventions for generative AI systems, so you can set up observability for your LLM workflows with minimal setup.

tracer_provider = trace_sdk.TracerProvider()
span_processor = SimpleSpanProcessor(OTLPSpanExporter(
    endpoint="https://api.getmaxim.ai/v1/otel",
    headers={
        "x-maxim-api-key": f"{maxim_api_key}",
        "x-maxim-repo-id": f"{maxim_repo_id}",
    },
))
tracer_provider.add_span_processor(span_processor)

See our Ingesting via OTLP Endpoint documentation for details.

🖥️ Revamped Prompt Comparison UX!

New

We've simplified the UX for prompt comparison - no more switching to a different window. You can now compare multiple prompts and versions side by side within the Prompt Playground.

Click the '+' in the header of any prompt to add others for comparison, and run comparison tests directly from the same window to evaluate which prompt performs best for your use case.