Documentation Index
Fetch the complete documentation index at: https://docs.plexe.ai/llms.txt
Use this file to discover all available pages before exploring further.
This page provides comprehensive reference documentation for the core classes and functions in the Plexe Python library.
Core Classes
Model
The primary class in the Plexe library, representing a machine learning model.
class Model:
def __init__(
self,
intent: str,
input_schema: Type[BaseModel] | Dict[str, type] = None,
output_schema: Type[BaseModel] | Dict[str, type] = None,
constraints: List[Constraint] = None,
distributed: bool = False
)
Parameters:
| Parameter | Type | Description | |
|---|
intent | str | Natural language description of what the model should do. | |
input_schema | `Type[BaseModel] | Dict[str, type]` | Schema defining input data structure. Can be a dictionary or Pydantic model. Default: None. |
output_schema | `Type[BaseModel] | Dict[str, type]` | Schema defining output data structure. Can be a dictionary or Pydantic model. Default: None. |
constraints | List[Constraint] | List of constraints the model should adhere to. Default: None. | |
distributed | bool | Whether to use distributed execution (Ray) when building. Default: False. | |
Methods:
build
def build(
self,
datasets: List[Union[pd.DataFrame, DatasetGenerator]],
provider: Union[str, ProviderConfig] = "openai/gpt-4o-mini",
timeout: Optional[int] = None,
max_iterations: Optional[int] = None,
run_timeout: int = 1800,
callbacks: Optional[List[Callback]] = None,
verbose: bool = False,
chain_of_thought: bool = True
) -> None
Builds the model using the provided datasets and configuration.
Parameters:
| Parameter | Type | Description |
|---|
datasets | List[Union[pd.DataFrame, DatasetGenerator]] | List of pandas DataFrames or DatasetGenerator objects for training data. |
provider | Union[str, ProviderConfig] | LLM provider to use, either as a string (“openai/gpt-4o-mini”) or ProviderConfig object. Default: “openai/gpt-4o-mini”. |
timeout | Optional[int] | Maximum total time (in seconds) for the entire build process (all iterations). |
max_iterations | Optional[int] | Maximum number of iterations to attempt. |
run_timeout | int | Maximum time (in seconds) for each individual training run. Default: 1800 (30 minutes). |
callbacks | Optional[List[Callback]] | List of callback objects for monitoring the build process. |
verbose | bool | Whether to display detailed agent logs. Default: False. |
chain_of_thought | bool | Whether to enable verbose output of agent reasoning. Default: True. |
Returns: None
Note: At least one of timeout or max_iterations must be provided.
predict
def predict(
self,
x: Dict[str, Any],
validate_input: bool = False,
validate_output: bool = False
) -> Dict[str, Any]
Makes a prediction using the trained model.
Parameters:
| Parameter | Type | Description |
|---|
x | Dict[str, Any] | Input data for prediction. |
validate_input | bool | Whether to validate input against schema. Default: False. |
validate_output | bool | Whether to validate output against schema. Default: False. |
Returns: Dict[str, Any] - Prediction result
get_state
def get_state(self) -> str
Returns the current state of the model.
Returns: str representing model state: "draft", "building", "ready", or "error"
def get_metadata(self) -> Dict[str, Any]
Returns metadata about the model.
Returns: Dictionary containing metadata
get_metrics
def get_metrics(self) -> Optional[Dict[str, Any]]
Returns metrics for the trained model if available.
Returns: Dictionary containing metrics or None
describe
def describe(self) -> ModelDescription
Returns a detailed description of the model.
Returns: ModelDescription object
DatasetGenerator
Class for generating synthetic data or augmenting existing data.
class DatasetGenerator:
def __init__(
self,
description: str,
provider: str,
schema: Type[BaseModel] | Dict[str, type] = None,
data: pd.DataFrame = None
) -> None
def generate(self, num_samples: int):
"""Generates synthetic data if a provider is available."""
Constructor Parameters:
| Parameter | Type | Description | |
|---|
description | str | Human-readable description of the dataset. | |
provider | str | LLM provider used for synthetic data generation. | |
schema | `Type[BaseModel] | Dict[str, type]` | The schema the data should match, if any. Default: None. |
data | pd.DataFrame | A dataset of real data on which to base the generation, if available. Default: None. | |
Methods:
| Method | Description |
|---|
generate(num_samples: int) | Generates the specified number of synthetic data samples. |
data property | Returns the dataset as a pandas DataFrame. |
Callback
Base class for callbacks that monitor the build process.
class Callback:
def on_build_start(self, info: BuildStateInfo) -> None:
pass
def on_iteration_start(self, info: BuildStateInfo) -> None:
pass
def on_iteration_end(self, info: BuildStateInfo) -> None:
pass
def on_build_end(self, info: BuildStateInfo) -> None:
pass
See Callbacks Reference for more details.
Constraint
Represents rules or conditions that the model should satisfy.
class Constraint:
def __init__(self, condition: Callable[[Any, Any], bool], description: str)
Parameters:
| Parameter | Type | Description |
|---|
condition | Callable[[Any, Any], bool] | Function that evaluates the constraint. |
description | str | Human-readable description of the constraint. |
Core Functions
save_model
def save_model(model: Model, path: str | Path) -> str
Saves a trained model to a tar archive.
Parameters:
| Parameter | Type | Description | |
|---|
model | Model | The model to save. | |
path | `str | Path` | Path where the model should be saved. Must end with .tar.gz. |
Returns: str - Path where the model was saved
load_model
def load_model(path: str | Path) -> Model
Instantiate a model from a tar archive.
Parameters:
| Parameter | Type | Description | |
|---|
path | `str | Path` | Path to the saved model archive (.tar.gz file). |
Returns: Model - The loaded model
def configure_logging(
level: Union[str, int] = logging.INFO,
file: Optional[str] = None
) -> None
Configures logging for the Plexe library.
Parameters:
| Parameter | Type | Description |
|---|
level | Union[str, int] | Logging level (from logging module or as string). |
file | Optional[str] | Path to a log file. If provided, logs will be written to this file in addition to stdout. |
Returns: None
Enums and Constants
ModelState
Enum representing the possible states of a model. The get_state() method returns the string value.
class ModelState(Enum):
DRAFT = "draft" # Initial state, before building
BUILDING = "building" # During the build process
READY = "ready" # Build complete, model ready for use
ERROR = "error" # Build failed
Note: When checking model state, compare against the string values:
if model.get_state() == "ready":
# Ready to make predictions
Provider Configuration
ProviderConfig
Class for configuring different LLM providers for different agent roles (imported from plexe.internal.common.provider).
from plexe.internal.common.provider import ProviderConfig
class ProviderConfig:
def __init__(
self,
default_provider: str,
orchestrator_provider: Optional[str] = None,
research_provider: Optional[str] = None,
engineer_provider: Optional[str] = None,
ops_provider: Optional[str] = None,
tool_provider: Optional[str] = None
)
Parameters:
| Parameter | Type | Description |
|---|
default_provider | str | Default provider for all roles (e.g., “openai/gpt-4o-mini”). |
orchestrator_provider | Optional[str] | Provider for the orchestrator agent (coordinates the overall process). |
research_provider | Optional[str] | Provider for the research agent (analyzes problems and plans solutions). |
engineer_provider | Optional[str] | Provider for the engineer agent (generates training code). |
ops_provider | Optional[str] | Provider for the ops agent (generates inference code). |
tool_provider | Optional[str] | Provider for tool agents (performs specialized tasks). |
Metric
Class representing a performance metric for a model.
class Metric:
def __init__(
self,
name: str,
value: float = None,
comparator: MetricComparator = None,
is_worst: bool = False
)
Parameters:
| Parameter | Type | Description |
|---|
name | str | Name of the metric (e.g., “accuracy”, “rmse”). |
value | float | Numeric value of the metric. Default: None. |
comparator | MetricComparator | Comparison logic for determining which metric values are better. Default: None. |
is_worst | bool | Whether this is the worst possible value for the metric. Default: False. |
MetricComparator
Encapsulates comparison logic for metrics.
class MetricComparator:
def __init__(
self,
comparison_method: ComparisonMethod,
target: float = None,
epsilon: float = 1e-9
)
Parameters:
| Parameter | Type | Description |
|---|
comparison_method | ComparisonMethod | The method used to compare metrics (HIGHER_IS_BETTER, LOWER_IS_BETTER, etc.). |
target | float | The target value for TARGET_IS_BETTER comparisons. Default: None. |
epsilon | float | Small value used for floating point comparisons. Default: 1e-9. |
Type Hints
The library uses the following type hints:
SchemaType = Union[Dict[str, Any], Type[BaseModel]]
DatasetType = Union[pd.DataFrame, DatasetGenerator]
ProviderType = Union[str, ProviderConfig]
Usage Example
import plexe
import pandas as pd
# Load data
df = pd.read_csv("housing.csv")
# Create model
model = plexe.Model(
intent="Predict house prices based on features",
input_schema={"square_footage": float, "bedrooms": int, "bathrooms": float},
output_schema={"price": float}
)
# Build model
model.build(
datasets=[df],
provider="openai/gpt-4o-mini",
max_iterations=3,
timeout=600,
run_timeout=180,
chain_of_thought=True,
verbose=False
)
# Make prediction
prediction = model.predict({"square_footage": 2000, "bedrooms": 3, "bathrooms": 2})
print(f"Predicted price: {prediction}")
# Save model
save_path = plexe.save_model(model, "housing_model.tar.gz")
# Load model
loaded_model = plexe.load_model(save_path)
For more details on specific components, see the other reference sections: