Skip to content

Calculating custom aggregate metrics for evals #1413

Open
@olinguyen

Description

@olinguyen

Question

Hello!

What's the recommendation for implementing custom aggregate metrics like precision/recall for evals?
There's an existing ReportCaseAggregate but that seems specific for calculating the average of scores.

There's a few workarounds on top of my head:

  1. Implement my own EvaluationReport and overwrite the dataset's evaluate() function to call an aggregate report
  2. Write my own custom script to calculate metrics from the list of cases (ignoring any Evaluator)

Ideally, it seems we should be able to have an Evaluator compute function that runs on a list of predictions and labels, similar to sklearn's precision_score(y_true, y_pred) or huggingface's evaluate's compute(predictions, references)

Additional Context

No response

Metadata

Metadata

Assignees

Labels

evalsquestionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions