evals.evaluators#
- class HallucinationEvaluator(model)#
Bases:
LLMEvaluator
Leverages an LLM to evaluate whether a response (stored under an “output” column) is a hallucination given a query (stored under an “input” column) and one or more retrieved documents (stored under a “reference” column).
- class LLMEvaluator(model, template)#
Bases:
object
Leverages an LLM to evaluate individual records.
- async aevaluate(record, provide_explanation=False, use_function_calling_if_available=True, verbose=False)#
Evaluates a single record.
- Parameters:
record (Record) – The record to evaluate.
provide_explanation (bool, optional) – Whether to provide an explanation.
use_function_calling_if_available (bool, optional) – If True, use function calling (if available) as a means to constrain the LLM outputs. With function calling, the LLM is instructed to provide its response as a structured JSON object, which is easier to parse.
verbose (bool, optional) – Whether to print verbose output.
- Returns:
- A tuple containing:
label
score (if scores for each label are specified by the template)
explanation (if requested)
- Return type:
Tuple[str, Optional[float], Optional[str]]
- evaluate(record, provide_explanation=False, use_function_calling_if_available=True, verbose=False)#
Evaluates a single record.
- Parameters:
record (Record) – The record to evaluate.
provide_explanation (bool, optional) – Whether to provide an explanation.
use_function_calling_if_available (bool, optional) – If True, use function calling (if available) as a means to constrain the LLM outputs. With function calling, the LLM is instructed to provide its response as a structured JSON object, which is easier to parse.
verbose (bool, optional) – Whether to print verbose output.
- Returns:
- A tuple containing:
label
score (if scores for each label are specified by the template)
explanation (if requested)
- Return type:
Tuple[str, Optional[float], Optional[str]]
- class QAEvaluator(model)#
Bases:
LLMEvaluator
Leverages an LLM to evaluate whether a response (stored under an “output” column) is correct or incorrect given a query (stored under an “input” column) and one or more retrieved documents (stored under a “reference” column).
- class RelevanceEvaluator(model)#
Bases:
LLMEvaluator
Leverages an LLM to evaluate whether a retrieved document (stored under a “reference” column) is relevant or irrelevant to the corresponding query (stored under the “input” column).
- class SQLEvaluator(model)#
Bases:
LLMEvaluator
Leverages an LLM to evaluate whether a generated SQL query (stored under the “query_gen” column) and a response (stored under the “response” column) appropriately answer a question (stored under the “question” column).
- class SummarizationEvaluator(model)#
Bases:
LLMEvaluator
Leverages an LLM to evaluate whether a summary (stored under an “output” column) provides an accurate synopsis of an input document (stored under a “input” column).
- class ToxicityEvaluator(model)#
Bases:
LLMEvaluator
Leverages an LLM to evaluate whether the string stored under the “input” column contains racist, sexist, chauvinistic, biased, or otherwise toxic content.