Maxim AI release notes

📧 Log Repo Stats Emails

New

We now offer Log Repo Stats Emails, enabling you to configure recipients for weekly email updates on your log repository statistics. These emails, sent every Monday, provide a comprehensive overview of key metrics to help you stay informed and monitor performance.

Log repo stats mail.gif

Here’s what you get with Log Repo Stats Emails:

Traces overview: See how many traces were logged during the week.
User feedback summary: Get insights into the average user feedback score.
Latency reports: Monitor latency trends and other performance metrics.
Periodic updates: Receive a concise, automated summary in your inbox every Monday.

Stay on top of your log statistics effortlessly with Log Repo Stats Emails, ensuring better visibility and control over your data.

{{Jinja 2}} variables are now supported on Maxim

New

We now support Jinja 2 variables; with this enhancement, you can use {{ }} double curly braces to seamlessly insert variables anywhere, making your workflows, prompts, datasets, and configurations more flexible and customizable.

Jinja 2.gif

Here’s what you can do:

Dynamic prompts: Personalize and adapt your prompts by adding variables like {{user_name}} or {{context}} for tailored responses.
Flexible workflows: Streamline workflows by dynamically injecting variables.
Dataset customization: Include variables directly within datasets.

We are adding support for other jinja 2 commands in upcoming releases.

✅ New Async Eval Changes

New

async eval.gif

We introduce Async evaluation filters, empowering you to refine and customize log evaluations on the platform. This new enhancement allows you to filter evaluated logs based on specific criteria, enabling more precise evaluation workflows and streamlined dataset curation.

Here’s what you can do with new changes:

Apply advanced filters: Refine your evaluation results by specifying conditions like Punctuation Evaluator > 5 to focus only on the entries you need.
Seamless dataset integration: Directly add filtered evaluation results to a dataset, simplifying the curation process for further analysis.

📤 Dataset export feature

New

Maxim launches the Dataset Export feature, giving you the ability to curate and export high-quality datasets directly from the platform. With this new feature, you can now easily curate datasets from various sources and export them for use elsewhere.

dataset export.gif

Here’s what you can do with the Dataset Export feature:

Curate from human evaluation: Select entries with human ratings greater than 8 out of 10 to create a high-quality dataset based on human evaluation.
Curate from Async evaluations: Easily compile data from asynchronous evaluations.
Curate from Logs: Choose specific entries from your logs to build a tailored dataset that fits your needs.
Curate from Test Runs: Select entries performing well on a particular evaluator and add them to a dataset for more targeted analysis.
Export as CSV: Once your dataset is curated, export it as a CSV file to use in other tools or workflows.

With the Dataset Export feature, you can now seamlessly create and share customized datasets for further use, improving flexibility in your data workflows.

🔎 OmniSearch & saved filters

New

We are introducing OmniSearch with Saved Filters, which is designed to make log analysis faster and more efficient. With this update, the search bar becomes a powerful tool for filtering and searching through logs effortlessly.

Here’s what you can do with OmniSearch:

Search and filter together: Place the cursor in the search bar and instantly access a list of filtering options depending on your logs. You can filter your logs based on conditions like "user feedback > 5" and search for specific terms simultaneously.
Save your search configurations: After configuring your search, you can save the filter setup for future use, so you don’t have to recreate it each time.
Quick access to recent searches: The Omni bar will display your last three searches, allowing for quicker access to past configurations.

🏛️ Add structured output in prompts

Maxim now supports structured outputs in prompts with the JSON Schema option for the Response Format parameter. With this update, you can:

Define JSON schema: Supply a JSON Schema via json_schema to structure model responses for precise data formatting.
Model compatibility: This feature is available for models that support structured output, enabling streamlined data handling and integration.

structured output.gif

🛠️ Tool call accuracy with tool schema

New

Maxim now introduces a new prompt tool type called Schema. This feature lets users directly input their tool schema to evaluate tool call accuracy.

Here's what you can do :

Direct schema input: Provide your tool's schema directly, streamlining the setup process.
Evaluate tool call accuracy: Assess the accuracy of tool call responses based on the provided schema.

✨Latest Claude Sonnet 3.5 is now live on Maxim

New

You can use Cluade Sonnet 3.5 (latest version) on Maxim now via Anthropic provider.

Claude sonnet 3.5.gif

🧑‍🔧 Add human annotations with the revamped flow

New

Maxim has revamped the human annotation process. With this update, you can:

Select sample rate: Choose a sample rate to specify that only a certain percentage of entries must be human-annotated.
Custom filters: Use custom filters to send only entries that meet specific criteria for human annotation, for example, those with toxicity scores greater than 0.8.
Self-annotate option: The self-annotate option allows you to annotate entries that have already been run, enabling a more tailored evaluation process.

💬 Add multiple messages to a Prompt version

New

With this new update, users can now save multiple user and assistant messages in the published version of prompts, allowing users to maintain a complete flow of their prompt structure.

Key benefits include:

Save and publish iterative prompts: Users can now save prompts with iterative or few-shot prompting and publish them as different versions, enabling more flexible prompt management.
Refine outputs for complex cases: This is particularly useful in complex cases where a single prompt may not yield the best answer, as multiple prompts help achieve more refined results.