evals.classify#

llm_classify(data, model, template, rails, data_processor=None, system_instruction=None, verbose=False, use_function_calling_if_available=True, provide_explanation=False, include_prompt=False, include_response=False, include_exceptions=False, max_retries=10, exit_on_error=True, run_sync=False, concurrency=None, progress_bar_format=get_tqdm_progress_bar_formatter('llm_classify'))#

Classifies each input row of the dataframe using an LLM. Returns a pandas.DataFrame where the first column is named label and contains the classification labels. An optional column named explanation is added when provide_explanation=True.

Parameters:
  • data (Union[pd.DataFrame, List[Any]) – A collection of data which can contain template variables and other information necessary to generate evaluations. If a passed a DataFrame, there must be column names that match the template variables. If passed a list, the elements of the list will be mapped to the template variables in the order that the template variables are defined.

  • model (BaseEvalModel) – An LLM model class.

  • template (Union[ClassificationTemplate, PromptTemplate, str]) – The prompt template as either an instance of PromptTemplate, ClassificationTemplate or a string. If a string, the variable names should be surrounded by curly braces so that a call to .format can be made to substitute variable values.

  • rails (List[str]) – A list of strings representing the possible output classes of the model’s predictions.

  • data_processor (Optional[Callable[[T], T]]) – An optional callable that is used to process the input data before it is mapped to the template variables. This callable is passed a single element of the input data and can return either a pandas.Series with indices corresponding to the template variables or an iterable of values that will be mapped to the template variables in the order that the template variables are defined.

  • system_instruction (Optional[str], optional) – An optional system message.

  • verbose (bool, optional) – If True, prints detailed info to stdout such as model invocation parameters and details about retries and snapping to rails. Default False.

  • use_function_calling_if_available (bool, default=True) – If True, use function calling (if available) as a means to constrain the LLM outputs. With function calling, the LLM is instructed to provide its response as a structured JSON object, which is easier to parse.

  • provide_explanation (bool, default=False) – If True, provides an explanation for each classification label. A column named explanation is added to the output dataframe.

  • include_prompt (bool, default=False) – If True, includes a column named prompt in the output dataframe containing the prompt used for each classification.

  • include_response (bool, default=False) – If True, includes a column named response in the output dataframe containing the raw response from the LLM.

  • max_retries (int, optional) – The maximum number of times to retry on exceptions. Defaults to 10.

  • exit_on_error (bool, default=True) – If True, stops processing evals after all retries are exhausted on a single eval attempt. If False, all evals are attempted before returning, even if some fail.

  • run_sync (bool, default=False) – If True, forces synchronous request submission. Otherwise evaluations will be run asynchronously if possible.

  • concurrency (Optional[int], default=None) – The number of concurrent evals if async submission is possible. If not provided, a recommended default concurrency is set on a per-model basis.

  • progress_bar_format (Optional[str]) – An optional format for progress bar shown. If not specified, defaults to: llm_classify |{bar}| {n_fmt}/{total_fmt} ({percentage:3.1f}%) ” “| ⏳ {elapsed}<{remaining} | {rate_fmt}{postfix}”. If ‘None’ is passed in specifically, the progress_bar log will be disabled.

Returns:

A dataframe where the label column (at column position 0) contains

the classification labels. If provide_explanation=True, then an additional column named explanation is added to contain the explanation for each label. The dataframe has the same length and index as the input dataframe. The classification label values are from the entries in the rails argument or “NOT_PARSABLE” if the model’s output could not be parsed. The output dataframe also includes three additional columns in the output dataframe: exceptions, execution_status, and execution_seconds containing details about execution errors that may have occurred during the classification as well as the total runtime of each classification (in seconds).

Return type:

pandas.DataFrame

run_evals(dataframe, evaluators, provide_explanation=False, use_function_calling_if_available=True, verbose=False, concurrency=None)#

Applies a list of evaluators to a dataframe. Outputs a list of dataframes in which each dataframe contains the outputs of the corresponding evaluator applied to the input dataframe.

Parameters:
  • dataframe (DataFrame) – A pandas dataframe in which each row represents a record to be evaluated. All template variable names must appear as column names in the dataframe (extra columns unrelated to the template are permitted).

  • evaluators (List[LLMEvaluator]) – A list of evaluators.

  • provide_explanation (bool, optional) – If True, provides an explanation for each evaluation. A column named “explanation” is added to each output dataframe.

  • use_function_calling_if_available (bool, optional) – If True, use function calling (if available) as a means to constrain the LLM outputs. With function calling, the LLM is instructed to provide its response as a structured JSON object, which is easier to parse.

  • verbose (bool, optional) – If True, prints detailed info to stdout such as model invocation parameters and details about retries and snapping to rails.

  • concurrency (Optional[int], default=None) – The number of concurrent evals if async submission is possible. If not provided, a recommended default concurrency is set on a per-model basis.

Returns:

A list of dataframes, one for each evaluator, all of

which have the same number of rows as the input dataframe.

Return type:

List[DataFrame]