evals.models.litellm#

class LiteLLMModel(default_concurrency=20, _verbose=False, _rate_limiter=<factory>, model='gpt-3.5-turbo', temperature=0.0, max_tokens=256, top_p=1, num_retries=0, request_timeout=60, model_kwargs=<factory>, model_name=None)#

Bases: BaseModel

An interface for using LLM models with the LiteLLM interface.

This class wraps the LiteLLM library for use with Phoenix LLM evaluations. Requires the litellm package to be installed.

⚠️ Warning: Due to the number of supported models and variations in rate limit handling, we do not catch rate limit exceptions and throttle requests.

Supports Async: ❌

litellm provides an async interface for making LLM calls. However, because we cannot reliably catch and throttle requests when encountering rate limit errors, we do not asyncronously make requests using litellm to avoid exceeding rate limits.

Parameters:
  • model (str) – The model name to use.

  • temperature (float, optional) – Sampling temperature to use. Defaults to 0.0.

  • max_tokens (int, optional) – Maximum number of tokens to generate in the completion. Defaults to 256.

  • top_p (float, optional) – Total probability mass of tokens to consider at each step. Defaults to 1.

  • num_retries (int, optional) – Maximum number to retry a model if a RateLimitError, OpenAIError, or ServiceUnavailableError occurs. Defaults to 0.

  • request_timeout (int, optional) – Maximum number of seconds to wait when retrying. Defaults to 60.

  • model_kwargs (Dict[str, Any], optional) – Model specific params. Defaults to an empty dict.

Example

# configuring a local llm via litellm
os.environ["OLLAMA_API_BASE"] = "http://localhost:11434"

from phoenix.evals import LiteLLMModel
model = LiteLLMModel(model="ollama/llama3")
model_name = None#

Deprecated since version 3.0.0.

use model instead. This will be removed in a future release.