Inferences / Schema#

phoenix.Inferences#

class Inferences(dataframe, schema, name=None)#

Bases: object

A dataset to use for analysis using phoenix. Used to construct a phoenix session via px.launch_app.

Typical usage example:

primary_inferences = px.Inferences(
    dataframe=production_dataframe, schema=schema, name="primary"
)
Parameters:
  • dataframe (pandas.DataFrame) – The pandas dataframe containing the data to analyze

  • schema (phoenix.Schema) – the schema of the dataset. Maps dataframe columns to the appropriate model inference dimensions (features, predictions, actuals).

  • name (str, optional) – The name of the dataset. If not provided, a random name will be generated. Is helpful for identifying the dataset in the application.

Returns:

dataset – The dataset object that can be used in a phoenix session

Return type:

Dataset

Examples

Define inferences ds from a pandas dataframe df and a schema object schema by running:

ds = px.Inferences(df, schema)

Alternatively, provide a name for the inferences that will appear in the application:

ds = px.Inferences(df, schema, name="training")

ds is then passed as the primary or reference argument to launch_app.

property dataframe#
property name#
property schema#

phoenix.Schema#

class Schema(prediction_id_column_name: str | None = None, id_column_name: str | None = None, timestamp_column_name: str | None = None, feature_column_names: List[str] | None = None, tag_column_names: List[str] | None = None, prediction_label_column_name: str | None = None, prediction_score_column_name: str | None = None, actual_label_column_name: str | None = None, actual_score_column_name: str | None = None, prompt_column_names: inferences.schema.EmbeddingColumnNames | inferences.schema.RetrievalEmbeddingColumnNames | NoneType = None, response_column_names: str | inferences.schema.EmbeddingColumnNames | NoneType = None, document_column_names: inferences.schema.EmbeddingColumnNames | None = None, embedding_feature_column_names: Optional[Dict[str, ForwardRef('EmbeddingColumnNames')]] = None, excluded_column_names: List[str] | None = None)#

Bases: object

phoenix.EmbeddingColumnNames#

class EmbeddingColumnNames(vector_column_name, raw_data_column_name=None, link_to_data_column_name=None)#

Bases: Dict[str, Any]

A dataclass to hold the column names for the embedding features. An embedding feature is a feature that is represented by a vector. The vector is a representation of unstructured data, such as text or an image

phoenix.TraceDataset#

class TraceDataset(dataframe, name=None, evaluations=())#

Bases: object

A TraceDataset is a wrapper around a dataframe which is a flattened representation of Spans. The collection of spans trace the LLM application’s execution.

Typical usage example:

from phoenix.trace.utils import json_lines_to_df

with open("trace.jsonl", "r") as f:
    trace_ds = TraceDataset(json_lines_to_df(f.readlines()))
px.launch_app(trace=trace_ds)
__init__(dataframe, name=None, evaluations=())#

Constructs a TraceDataset from a dataframe of spans. Optionally takes in evaluations for the spans in the dataset.

Parameters:
  • dataframe (pandas.DataFrame) – The pandas dataframe containing the tracing data. Each row of which is a flattened representation of a span.

  • name (str) – The name used to identify the dataset in the application. If not provided, a random name will be generated.

  • evaluations (Optional[Iterable[SpanEvaluations]]) – An optional list of evaluations for the spans in the dataset. If provided, the evaluations can be materialized into a unified dataframe as annotations.

get_evals_dataframe()#

Creates a flat dataframe of all the evaluations for the dataset.

get_spans_dataframe(include_evaluations=True)#

converts the dataset to a dataframe of spans. If evaluations are included, the evaluations are merged into the dataframe.

Parameters:

include_evaluations (bool) – if True, the evaluations are merged into the dataframe

name#

A human readable name for the dataset.