New
- For simplicity and accuracy of custom LLM as a judge evaluator, we configure scores on a scale of 1-5 if it's a numeric score.
- But for reporting purposes, you may want to normalize the score between 0-1.
- With this release, we have added an extra setting on scale-based LLM as a judge evaluator to enable normalizing the scores across the test runs and online evaluation.