Maxim AI release notes

🔢 Synthetic Data Generation

New

You can now generate synthetic datasets in Maxim to simplify and accelerate the testing and simulation of your Prompts and Agents. Use this to create inputs, expected outputs, simulation scenarios, personas, or any other variable needed to evaluate single and multi-turn workflows across Prompts, HTTP endpoints, Voice, and No-code agents on Maxim.

You can generate datasets from scratch by defining the required columns and their descriptions, or use an existing dataset as a reference context to generate data that follows similar patterns and quality.

You can include the description of your agent (or its system prompts) or add a file as a context source to guide generation quality and ensure the synthetic data remains relevant and grounded. (Read more on Synthetic data generation)

👥 Workspace-level RBAC

New

We’ve enhanced Role-Based Access Control (RBAC) in Maxim to give you finer control over access management.

Previously, roles were assigned only at the organization level, giving users the same scope of access across all workspaces they were part of. Now, you can assign different roles to users per workspace and also create custom roles with precise permissions for the resources and actions a member can access, specific to a workspace.

This gives teams granular, functional control over what users can view or do – e.g., allowing someone to create, deploy, or delete a prompt in one workspace while granting read-only access in another – making collaboration more secure, flexible, and scalable.

♻️ Retroactive evals on Logs

New

You can now attach evaluations to any log, trace, or session in Maxim. Instead of only evaluating new logs after setup, you can now run evals on historical logs – even if online evals weren’t previously configured. This enables you to analyze past data and gain granular, node-level insights into agent performance. Key highlights:

Run evals on past logs by simply selecting those traces/sessions and adding evaluators based on the key metrics you wish to track.
This helps you track agent performance over an extended timeframe to get a clear, metric-driven view of quality improvements or degradations.
Filter logs by failure scenarios and re-run or attach additional evals for iterative debugging and deeper analysis.

⚙️ Flexible evaluators on all test run entities

New

Running evals across Maxim is now fully configurable. We've extended the Flexi eval capability beyond logs to include Prompt, No-code, HTTP and Voice agent testing and simulations.

Instead of being limited to predefined parameters like input, output, context, etc, you can now decide exactly which value in the user query, agent response, or test dataset should be mapped to the corresponding field for your evaluators. This makes your evaluators highly reusable and helps eliminate noise from evaluation parameters.

🚅 LiteLLM support

New

We’ve added LiteLLM as a provider in Maxim, enabling you to use models already configured in LiteLLM without needing to add them separately in Maxim. This ensures unified model access, governance, and load balancing through a single configuration.

📈 Revamped Graphs and Omnibar for Logs

New

Graphs in the Log repository now feature a new, interactive UI that makes it easier to explore trends. You can click on any bar to drill down into specific logs or drag across a timeframe to instantly filter and visualize performance metrics within that period. We’ve also added new visualizations, including evaluator-specific (eg, total traces and sessions evaluated) and custom metric graphs, to help you monitor and analyze the metrics that matter most to your workflow.

We’ve enhanced the Log search Omnibar to make it easier to navigate logs, debug, and identify failure scenarios. You can now create complex filters using logical operators (AND, OR) and group multiple conditions together. We’ve also added advanced operators such as "contains", "begins with", etc, for more precise filtering.

🧾 Audit logs (Maxim Enterprise)

New

We’ve added Audit logs to the Maxim platform, giving you complete visibility and control over all the activity across your organization. You can now view a detailed trail of every action executed on the platform, including logins, runs, configuration updates, and resource changes, etc.

Filter logs by user, action (eg, create, delete, export), or resource type (eg, prompts, voice agents, logs) to quickly understand who did what, when, and where. Whether it’s a prompt update, evaluator creation, or a no-code agent run, Audit logs provide the transparency you need to track changes and maintain operational integrity and accountability.

Users can also export logs as CSV for offline analysis or compliance reporting.

🌐 Manage variables using Environments

New

Teams can now create and manage environments directly within Maxim, each with its own configurable variables. You can define and reuse environment-specific values like base URLs, auth headers, and settings in your HTTP agent evaluations and simulations, making it seamless to switch between testing, staging, production, or any other development stage, without changing your core setup.

💾 Sessions in Evaluator

New

Similar to Prompts and No-code agents, we’ve introduced Session history to Evaluators, giving teams a complete record of all updates made to any evaluator (AI, statistical, human, etc).

Each session captures every change -- from LLM-as-a-judge prompt edits and pass criteria updates to code changes in Programmatic evals -- creating a clear, chronological view of how an evaluator evolves over time. This provides greater transparency and traceability when collaboratively building and experimenting with evals.

🛠️ Responses API support

New

Users can now interact with OpenAI models using the Responses API and take advantage of features like web search for prompt experiments, simulations, evals and no-code agent creation in Maxim.

With the Chat Completions API expected to be phased out over time, this update helps teams migrate seamlessly to the new standard and continue experimenting and prototyping workflows on Maxim.