Maxim AI release notes
Maxim AI release notes
www.getmaxim.ai

πŸͺ΅ Log errors in your AI workflow using Maxim SDK

 

New

  

Along with logging your sessions, traces, spans (generation, retrieval, etc.), and tool calls, you can now log errors in the traces of your AI application and track them using Maxim.

With a single log statement, you can capture errors at each step (generation, retrieval, tool call, etc.) of your workflow to simplify debugging. Learn more.

# example
generation.error(GenerationError(
    message="Rate limit exceeded. Please try again later.",
    type="RateLimitError",
    code="429",
))

πŸ› οΈ Tool call column in test runs

 

New

  

If you've attached tools to your prompts, you can now see which tools were called by your model during the evaluation run for each entry in your test dataset.

This new column in the test run report provides increased visibility into how your agentic workflows are functioning, making it easier to debug.

image.png

πŸ”Œ MCP Clients on Maxim

 

New

  

Maxim now supports the Model Context Protocol (MCP), enabling your agents to interact with external tools, access real-time data, and perform actions. Here's what you can do with MCP clients:

  • Connect to popular MCP providers like Composio and Gumloop, or use your own custom MCP server.
  • Automatically import tools from the MCP servers directly into your workspace.
  • Execute tool calls directly from MCP servers in your AI interactions and testing via the prompt playground.
  • Monitor connection status and logs for easy debugging.

Your AI agents can now send emails, create GitHub issues, search the web, and more - all through natural language.

image.png

βœ… Vertex AI provider and evaluators are live on Maxim!

 

New

  
  • We have added one more provider support - Vertex AI (now total 13) on Maxim.
  • Along with this addition, we have added 15 new evaluators by Vertex AI in the evaluator store
    • Vertex Exact Match – Checks if the model’s output exactly matches the expected answer.
    • Vertex Bleu – Measures n-gram overlap between the generated and reference texts for translation tasks.
    • Vertex Rouge – Evaluates content overlap between output and reference, especially for summaries.
    • Vertex Fluency – Assesses how naturally and grammatically correct the text reads.
    • Vertex Coherence – Evaluates logical consistency and flow across the generated response.
    • Vertex Safety – Flags potentially harmful, toxic, or unsafe content in the output.
    • Vertex Groundedness – Verifies if the response stays factual and rooted in provided context.
    • Vertex Fulfillment – Measures how well the output satisfies the user prompt or intent.
    • Vertex Summarization Quality – Holistic quality score for summaries, balancing coverage, fluency, and faithfulness.
    • Vertex Summarization Helpfulness – Assesses whether the summary effectively aids user understanding.
    • Vertex Summarization Verbosity – Evaluates whether the summary is overly verbose or too brief.
    • Vertex Question Answering Quality – Overall score for QA responses across relevance, correctness, and clarity.
    • Vertex Question Answering Relevance – Checks if the answer is contextually relevant to the input question.
    • Vertex Question Answering Helpfulness – Assesses if the answer improves understanding or provides value.
    • Vertex Question Answering Correctness – Evaluates factual accuracy and truthfulness of QA responses.

πŸ“Š Live Dashboards on Maxim!

 

New

  

Monitor how your application's quality scores change across experiments and in production. Build dashboards with custom charts tailored to your needs, and gain full control over the analysis of your logs and performance metrics.

Key features:

πŸ“ˆ Custom logs dashboard: Visualize production logs with custom charts. Filter logs on errors, performance metrics (cost, latency, etc), and quality metrics (clarity, tone, etc.)
πŸ” Test run comparison: Create live dashboards for your test runs, allowing you to track the live trends of various evaluation metrics (bias, toxicity, etc) across different runs.

This feature provides a centralized view of your application's performance for better analysis and decision making.

πŸ”„ Introducing Prompt Partials for Efficient Prompt Management!

 

New

  

Prompt partials are versioned, reusable text blocks you can directly reference in the prompt playground. Key benefits:

πŸ“¦ Reusability: Store commonly used content snippets into partials and reference them in prompts using {{partials.<-name->.<-version->}}, saving time and effort.

πŸ”„ Independent Iteration: Update partials independently without the need to modify every prompt across Maxim, ensuring consistency.

🧊 GPT 4.1 is live on Maxim!

 

New

  

OpenAI's latest GPT-4.1 model is now available on Maxim. Leverage its improved reasoning and lower latency to design custom evaluators and run smarter prompt experiments.

Start using this model via the OpenAI or Azure provider:
βœ… Go to Settings > Models > Select OpenAI or Azure provider > Add GPT 4.1

🧩 Snowflake Data Connector!

 

New

  

We are excited to introduce our new Snowflake data connector, enhancing our 100% compatibility with OpenTelemetry. This feature allows you to seamlessly stream all incoming commit logs directly into your Snowflake cluster.

Here's what you can expect:

πŸ“… Structured Timeline: Enjoy a well-organized timeline of your logs for easy tracking and analysis.
πŸ” Full Log Fidelity: Access complete and detailed commit logs, ensuring no data is lost or overlooked.

πŸ“¦ Maxim AIs Python SDK version 3.4.12 is live

 

New

  

Here's what's new:

  • Connect to Bedrock with a single line of code using the Maxim Python SDK.
  • Design custom evaluators and prompt experiments using Bedrock models.
  • Check out this cookbook to get started.

πŸ”— Prompt chains: Now more powerful

 

New

  

Maxim’s revamped Prompt Chains lets you prototype every step of your complex agentic workflow with greater clarity and controlβ€”right from our more intuitive UI. Key highlights:

πŸ”€ Create parallel chains to execute concurrent or conditional tasks.
πŸ” Flexibly transfer data between portsβ€”choose whether outputs act as variables, inputs, or context in other blocks.
πŸ§ͺ Experiment with prompts directly within the prompt block and publish new versions instantly.

Our new UI is designed for cross-functional teams to dive in and start building with ease.