Maxim AI release notes

💾 Evaluation presets for reusing your test configs

New

To streamline your testing, Maxim now supports saving a preset for your testing configuration and can be reused across the workspace.
- A preset is combination of
  - Dataset
  - RAG source
  - Set of evaluators
A test config preset can be used for all the entities in Maxim (Workflow, Prompt chains and Prompts).
To create a preset,
- Go to any workflow/prompt/prompt chain
- Click on presets and click on “Create new preset”.
- Or create a test config and click on “Save as a preset”.

✅ Pass-fail criteria for evaluators

New

Adding evaluation thresholds is helpful for quick decisions. Today, we are shipping pass-fail workspace-specific criteria on custom and built-in evaluators to speed up decision-making.

How to set pass-fail criteria?

Go to any evaluator in your workspace.
You will see a section called, Pass criteria.
The first value is on each entry-level i.e. in the given image, a test run entry will be marked as passed, if the "Clarity" score for that test run entry is >= 0.8
The second value is at the test run report level, i.e., "Clarity" is marked as passed if 80% of the entries are marked as passed.

You can view the pass-fail result in the top section of Test Run Report

🌅 Prompt chains now support image inputs

New

Prompt chain playground now supports user messages with image inputs.

prompt chain image.gif

⌗ Datasets editor got a major upgrade

New

We have launched an all-new Excel-like editing experience for our datasets.

Directly copy and paste from and to Excel sheets.
Shortcuts compatibility.
Add/remove columns based on your use cases.

Supported column types

Input - this goes as a user message in test runs.
Expected Output.
Expected Tools Call.
Variable - These are variables used in input/workflow/prompt/prompt-chains/prompt-experiments.
Images - These go as an attachment with user input if the selected model supports it.

GPT-4o is available on Maxim

New

We have enabled GPT-4o model.
Along with GPT-4o, we have enabled
- Llama-3 (8b), (70b) on Groq

🔗 Visual Prompt Chains now have a new home

New

You can access them easily by clicking on the Prompts menu on the top bar.

Improvement

Test run console logs got some visual uplift.

🏷️ Now create Prompt Version with multiple messages

New

We have updated prompt version management to handle multiple messages in a version and return them in the same order they were selected using the SDK.

👩‍🔬 All new Prompt Experiments now support vision models

New

Our Prompt Experiment tool has been given a makeover to help you compare prompts side-by-side like never before. You're going to love the new and improved version!

📦 Meta Llama 3 is available via Together Provider

New

We have enabled Meta Llama 3 on Together provider. Both Llama 3 (8B) and Llama 3 (70B) are available to use.
You can add them to the platform by going to Settings > Model Config > Together.

📞 Tools calling accuracy is now available on evaluator store

New

We have added a new evaluator called Tools Calling Accuracy in our evaluator store that helps you find function calling accuracy for a given model and prompt.
Install "Tools Calling Accuracy" evaluator from Evaluator store.

Create a dataset with "Expected Tool Calls" column

Trigger the test run