Inferences / Schema#
phoenix.Inferences#
- class Inferences(dataframe, schema, name=None)#
Bases:
object
A dataset to use for analysis using phoenix. Used to construct a phoenix session via px.launch_app.
Typical usage example:
primary_inferences = px.Inferences( dataframe=production_dataframe, schema=schema, name="primary" )
- Parameters:
dataframe (pandas.DataFrame) – The pandas dataframe containing the data to analyze
schema (phoenix.Schema) – the schema of the dataset. Maps dataframe columns to the appropriate model inference dimensions (features, predictions, actuals).
name (str, optional) – The name of the dataset. If not provided, a random name will be generated. Is helpful for identifying the dataset in the application.
- Returns:
dataset – The dataset object that can be used in a phoenix session
- Return type:
Dataset
Examples
Define inferences ds from a pandas dataframe df and a schema object schema by running:
ds = px.Inferences(df, schema)
Alternatively, provide a name for the inferences that will appear in the application:
ds = px.Inferences(df, schema, name="training")
ds is then passed as the primary or reference argument to launch_app.
- property dataframe#
- property name#
- property schema#
phoenix.Schema#
- class Schema(prediction_id_column_name: str | None = None, id_column_name: str | None = None, timestamp_column_name: str | None = None, feature_column_names: List[str] | None = None, tag_column_names: List[str] | None = None, prediction_label_column_name: str | None = None, prediction_score_column_name: str | None = None, actual_label_column_name: str | None = None, actual_score_column_name: str | None = None, prompt_column_names: inferences.schema.EmbeddingColumnNames | inferences.schema.RetrievalEmbeddingColumnNames | NoneType = None, response_column_names: str | inferences.schema.EmbeddingColumnNames | NoneType = None, document_column_names: inferences.schema.EmbeddingColumnNames | None = None, embedding_feature_column_names: Optional[Dict[str, ForwardRef('EmbeddingColumnNames')]] = None, excluded_column_names: List[str] | None = None)#
Bases:
object
phoenix.EmbeddingColumnNames#
- class EmbeddingColumnNames(vector_column_name, raw_data_column_name=None, link_to_data_column_name=None)#
Bases:
Dict
[str
,Any
]A dataclass to hold the column names for the embedding features. An embedding feature is a feature that is represented by a vector. The vector is a representation of unstructured data, such as text or an image
phoenix.TraceDataset#
- class TraceDataset(dataframe, name=None, evaluations=())#
Bases:
object
A TraceDataset is a wrapper around a dataframe which is a flattened representation of Spans. The collection of spans trace the LLM application’s execution.
Typical usage example:
from phoenix.trace.utils import json_lines_to_df with open("trace.jsonl", "r") as f: trace_ds = TraceDataset(json_lines_to_df(f.readlines())) px.launch_app(trace=trace_ds)
- __init__(dataframe, name=None, evaluations=())#
Constructs a TraceDataset from a dataframe of spans. Optionally takes in evaluations for the spans in the dataset.
- Parameters:
dataframe (pandas.DataFrame) – The pandas dataframe containing the tracing data. Each row of which is a flattened representation of a span.
name (str) – The name used to identify the dataset in the application. If not provided, a random name will be generated.
evaluations (Optional[Iterable[SpanEvaluations]]) – An optional list of evaluations for the spans in the dataset. If provided, the evaluations can be materialized into a unified dataframe as annotations.
- get_evals_dataframe()#
Creates a flat dataframe of all the evaluations for the dataset.
- get_spans_dataframe(include_evaluations=True)#
converts the dataset to a dataframe of spans. If evaluations are included, the evaluations are merged into the dataframe.
- Parameters:
include_evaluations (bool) – if True, the evaluations are merged into the dataframe
- name#
A human readable name for the dataset.