Maxim AI release notes
Maxim AI release notes
www.getmaxim.ai

🚀 xAI's Grok 4 model is live on Maxim!

 

New

  

Grok 4, xAI’s latest flagship LLM, is now available on Maxim. Access powerful capabilities like PhD‑level reasoning, a 256k token context window, and advanced math performance to supercharge your experimentation and evaluation workflows.

✅ Start using these models via the xAI provider on Maxim: Go to Settings > Models > Select xAI provider > Add Grok 4 to start using.

🚚 No More 1MB Log Size Limit – Unlimited Log Ingestion

 

Improvement

  

We’re excited to announce the removal of the 1MB size limit for log uploads! Previously, logs larger than 1MB were truncated or rejected, but now you can push logs of any size directly into Maxim — perfect for handling large traces, detailed debug output, or verbose agent sessions.

What’s changed?

  • Unlimited log size: Ingest logs of any size from your SDKs or integrations. No more splitting or trimming required.
  • Efficient storage & retrieval: Large logs are now seamlessly stored and indexed for fast access.
  • Partial display in UI: For performance and usability, only a snippet of large logs is shown in the timeline/table view. Need the whole thing? Just click “View full version” to open the complete log details in a new tab.
  • Automatic detection: Your existing implementation remains as is, the SDK now auto detects large logs and handles it accordingly. Just update the SDK version to the latest and you're good to go!

This unlocks deeper observability for complex workflows and ensures you never lose critical context due to log size constraints.

🤖 AI-powered Simulations in Prompt Playground

 

New

  

You can now test, automate, and scale your workflows more easily than ever. Simulate multi-turn interactions with your prompt and assess its performance in scenarios with user follow-ups.

Now you can bring your own prompts, connect MCP tools, integrate RAG pipelines, and launch thousands of scenario simulations in Maxim AI with a single click.

Scheduling runs support for simulations using HTTP endpoints

 

New

  

We've added support for scheduling automated runs on HTTP endpoints on which your AI Agent is deployed. You can now configure periodic simulations by providing an HTTP endpoint, attaching a dataset, and choosing one or more evaluators. The system will trigger these simulations at the defined interval, enabling continuous performance tracking and regression detection over time, without manual intervention.

Screenshot 2025-07-02 at 12.24.11 PM.png

Jinja2 Enhancements and Column Type Editing

 

Improvement

  
  1. Improved Jinja2 Parsing: We've enhanced our Jinja2 parser to deliver more precise variable extraction and cleaner, more reliable template rendering.

  2. Flexible Column Type Editing: You can now modify the data types of columns in your datasets (excluding file-type columns), giving you greater control over schema customization.

⚡️ Bifrost: The fastest LLM gateway

 

New

  

We're excited to announce the public release of Bifrost, the fastest, most scalable LLM gateway out there. We've engineered Bifrost specifically for high-throughput, production-grade AI systems, and we've optimized performance at every level:

🔸 ~0 heap allocation during live requests (configurable)
🔸 Actor pattern to avoid fetching the config at request time
🔸 Full use of Go’s concurrency primitives
🔸 Lightweight plugin system to keep the core minimal
🔸 Support for multiple transport protocols (HTTP, gRPC)

And it’s:
🔹 Open source
🔹 Written in pure Go (A+ code quality report)
🔹 40x lower overhead (based on LiteLLM’s published benchmarks)
🔹 9.5x faster, ~54x lower P99 latency, and uses 68% less memory than LiteLLM
🔹 Built-in Prometheus observability
🔹 Plugin store for easy extensibility

Check out our GitHub repo to get started. Read more about Bifrost benchmarks in our blog.

Frame 2147203100.png

🚀 Mistral AI and Maxim integration!

 

New

  

We're excited to announce our single line integration with Mistral – enabling fast, efficient, and observable LLM workflows, powered by Maxim's eval and observability stack. Here's what you get out of the box:

Here's what you get out of the box:

  • Traceability for LLM calls: Capture complete request-response traces with prompts, parameters, and metadata across all your Mistral calls.

  • Run evaluations: Run custom or out-of-the-box evals on every Mistral interaction, including multi-turn chains, to measure quality, coherence, safety, and more.

  • Usage & performance dashboards: Track latency, cost, token usage, and output metrics in real time.

  • Smart alerting: Set up Slack/PagerDuty alerts based on spikes in latency, failure rates, cost thresholds, or feedback scores.

📘 Get started now: Mistral + Maxim Integration Docs

🔐📁 Enhanced Security for CSV Exports

 

New

  

We've enhanced our data export functionality with built-in protection against CSV injection (formula injection) vulnerabilities. You can now safely export logs, datasets, and other data in CSV format, ensuring that even untrusted input won't trigger formulas or pose security risks when opened in spreadsheet tools like Excel or Google Sheets.

🛠️ Expected Tool Calls column in Datasets

 

New

  

The Expected Tool Calls column allows you to specify the tools you expect an agent to use in a scenario, ensuring the AI agent is choosing and invoking the correct tools as part of its reasoning process.

This column has been improved with the addition of powerful combinators for flexible evaluation of agent behavior:

  • inAnyOrder: Allows you to specify multiple tool calls that can be executed in any sequence while still being considered valid. This is perfect for scenarios where the order of operations doesn't matter.

  • anyOne: Enables you to define alternative tool calls where any single one satisfies the requirement. This is ideal for cases where multiple approaches can achieve the same outcome.

This provides greater flexibility when evaluating agent behavior, particularly in complex scenarios with multiple valid solution paths. Learn more about Expected Tool Calls column .

image.png

💬 Conversation History column in Datasets!

 

New

  

You can now define a "Conversation History" column in your test datasets to include prior multi-turn interactions between the user and LLM alongside your "Input" while running prompt tests.

Passing chat history to the LLM provides critical context, enabling it to understand the ongoing dialogue rather than treating each input as an isolated query. This allows you to simulate real-world interactions more accurately. Here’s how it works:

  • The Conversation History is sent to the LLM in sequence with the Prompt version and the Input column data.
  • The history must be formatted as a JSON array containing messages with roles (user, assistant, or tool) and the corresponding content.
  • The content can be either a simple string or a structured array supporting text and image attachments.

Read more about the Conversation History column.

image.png