This page provides comprehensive reference documentation for the core classes and functions in the Plexe Python library.

Core Classes

Model

The primary class in the Plexe library, representing a machine learning model.

class Model:
    def __init__(
        self,
        intent: str,
        input_schema: Type[BaseModel] | Dict[str, type] = None,
        output_schema: Type[BaseModel] | Dict[str, type] = None,
        constraints: List[Constraint] = None,
        distributed: bool = False
    )

Parameters:

ParameterTypeDescription
intentstrNatural language description of what the model should do.
input_schema`Type[BaseModel]Dict[str, type]`Schema defining input data structure. Can be a dictionary or Pydantic model. Default: None.
output_schema`Type[BaseModel]Dict[str, type]`Schema defining output data structure. Can be a dictionary or Pydantic model. Default: None.
constraintsList[Constraint]List of constraints the model should adhere to. Default: None.
distributedboolWhether to use distributed execution (Ray) when building. Default: False.

Methods:

build

def build(
    self,
    datasets: List[Union[pd.DataFrame, DatasetGenerator]],
    provider: Union[str, ProviderConfig] = "openai/gpt-4o-mini",
    timeout: Optional[int] = None,
    max_iterations: Optional[int] = None,
    run_timeout: int = 1800,
    callbacks: Optional[List[Callback]] = None,
    verbose: bool = False,
    chain_of_thought: bool = True
) -> None

Builds the model using the provided datasets and configuration.

Parameters:

ParameterTypeDescription
datasetsList[Union[pd.DataFrame, DatasetGenerator]]List of pandas DataFrames or DatasetGenerator objects for training data.
providerUnion[str, ProviderConfig]LLM provider to use, either as a string (“openai/gpt-4o-mini”) or ProviderConfig object. Default: “openai/gpt-4o-mini”.
timeoutOptional[int]Maximum total time (in seconds) for the entire build process (all iterations).
max_iterationsOptional[int]Maximum number of iterations to attempt.
run_timeoutintMaximum time (in seconds) for each individual training run. Default: 1800 (30 minutes).
callbacksOptional[List[Callback]]List of callback objects for monitoring the build process.
verboseboolWhether to display detailed agent logs. Default: False.
chain_of_thoughtboolWhether to enable verbose output of agent reasoning. Default: True.

Returns: None

Note: At least one of timeout or max_iterations must be provided.

predict

def predict(
    self,
    x: Dict[str, Any],
    validate_input: bool = False,
    validate_output: bool = False
) -> Dict[str, Any]

Makes a prediction using the trained model.

Parameters:

ParameterTypeDescription
xDict[str, Any]Input data for prediction.
validate_inputboolWhether to validate input against schema. Default: False.
validate_outputboolWhether to validate output against schema. Default: False.

Returns: Dict[str, Any] - Prediction result

get_state

def get_state(self) -> str

Returns the current state of the model.

Returns: str representing model state: "draft", "building", "ready", or "error"

get_metadata

def get_metadata(self) -> Dict[str, Any]

Returns metadata about the model.

Returns: Dictionary containing metadata

get_metrics

def get_metrics(self) -> Optional[Dict[str, Any]]

Returns metrics for the trained model if available.

Returns: Dictionary containing metrics or None

describe

def describe(self) -> ModelDescription

Returns a detailed description of the model.

Returns: ModelDescription object

DatasetGenerator

Class for generating synthetic data or augmenting existing data.

class DatasetGenerator:
    def __init__(
        self,
        description: str,
        provider: str,
        schema: Type[BaseModel] | Dict[str, type] = None,
        data: pd.DataFrame = None
    ) -> None
    
    def generate(self, num_samples: int):
        """Generates synthetic data if a provider is available."""

Constructor Parameters:

ParameterTypeDescription
descriptionstrHuman-readable description of the dataset.
providerstrLLM provider used for synthetic data generation.
schema`Type[BaseModel]Dict[str, type]`The schema the data should match, if any. Default: None.
datapd.DataFrameA dataset of real data on which to base the generation, if available. Default: None.

Methods:

MethodDescription
generate(num_samples: int)Generates the specified number of synthetic data samples.
data propertyReturns the dataset as a pandas DataFrame.

Callback

Base class for callbacks that monitor the build process.

class Callback:
    def on_build_start(self, info: BuildStateInfo) -> None:
        pass

    def on_iteration_start(self, info: BuildStateInfo) -> None:
        pass

    def on_iteration_end(self, info: BuildStateInfo) -> None:
        pass

    def on_build_end(self, info: BuildStateInfo) -> None:
        pass

See Callbacks Reference for more details.

Constraint

Represents rules or conditions that the model should satisfy.

class Constraint:
    def __init__(self, condition: Callable[[Any, Any], bool], description: str)

Parameters:

ParameterTypeDescription
conditionCallable[[Any, Any], bool]Function that evaluates the constraint.
descriptionstrHuman-readable description of the constraint.

Core Functions

save_model

def save_model(model: Model, path: str | Path) -> str

Saves a trained model to a tar archive.

Parameters:

ParameterTypeDescription
modelModelThe model to save.
path`strPath`Path where the model should be saved. Must end with .tar.gz.

Returns: str - Path where the model was saved

load_model

def load_model(path: str | Path) -> Model

Instantiate a model from a tar archive.

Parameters:

ParameterTypeDescription
path`strPath`Path to the saved model archive (.tar.gz file).

Returns: Model - The loaded model

configure_logging

def configure_logging(
    level: Union[str, int] = logging.INFO,
    file: Optional[str] = None
) -> None

Configures logging for the Plexe library.

Parameters:

ParameterTypeDescription
levelUnion[str, int]Logging level (from logging module or as string).
fileOptional[str]Path to a log file. If provided, logs will be written to this file in addition to stdout.

Returns: None

Enums and Constants

ModelState

Enum representing the possible states of a model. The get_state() method returns the string value.

class ModelState(Enum):
    DRAFT = "draft"        # Initial state, before building
    BUILDING = "building"  # During the build process
    READY = "ready"        # Build complete, model ready for use
    ERROR = "error"        # Build failed

Note: When checking model state, compare against the string values:

if model.get_state() == "ready":
    # Ready to make predictions

Provider Configuration

ProviderConfig

Class for configuring different LLM providers for different agent roles (imported from plexe.internal.common.provider).

from plexe.internal.common.provider import ProviderConfig

class ProviderConfig:
    def __init__(
        self,
        default_provider: str,
        orchestrator_provider: Optional[str] = None,
        research_provider: Optional[str] = None,
        engineer_provider: Optional[str] = None,
        ops_provider: Optional[str] = None,
        tool_provider: Optional[str] = None
    )

Parameters:

ParameterTypeDescription
default_providerstrDefault provider for all roles (e.g., “openai/gpt-4o-mini”).
orchestrator_providerOptional[str]Provider for the orchestrator agent (coordinates the overall process).
research_providerOptional[str]Provider for the research agent (analyzes problems and plans solutions).
engineer_providerOptional[str]Provider for the engineer agent (generates training code).
ops_providerOptional[str]Provider for the ops agent (generates inference code).
tool_providerOptional[str]Provider for tool agents (performs specialized tasks).

Performance Metrics

Metric

Class representing a performance metric for a model.

class Metric:
    def __init__(
        self, 
        name: str, 
        value: float = None, 
        comparator: MetricComparator = None,
        is_worst: bool = False
    )

Parameters:

ParameterTypeDescription
namestrName of the metric (e.g., “accuracy”, “rmse”).
valuefloatNumeric value of the metric. Default: None.
comparatorMetricComparatorComparison logic for determining which metric values are better. Default: None.
is_worstboolWhether this is the worst possible value for the metric. Default: False.

MetricComparator

Encapsulates comparison logic for metrics.

class MetricComparator:
    def __init__(
        self, 
        comparison_method: ComparisonMethod, 
        target: float = None, 
        epsilon: float = 1e-9
    )

Parameters:

ParameterTypeDescription
comparison_methodComparisonMethodThe method used to compare metrics (HIGHER_IS_BETTER, LOWER_IS_BETTER, etc.).
targetfloatThe target value for TARGET_IS_BETTER comparisons. Default: None.
epsilonfloatSmall value used for floating point comparisons. Default: 1e-9.

Type Hints

The library uses the following type hints:

SchemaType = Union[Dict[str, Any], Type[BaseModel]]
DatasetType = Union[pd.DataFrame, DatasetGenerator]
ProviderType = Union[str, ProviderConfig]

Usage Example

import plexe
import pandas as pd

# Load data
df = pd.read_csv("housing.csv")

# Create model
model = plexe.Model(
    intent="Predict house prices based on features",
    input_schema={"square_footage": float, "bedrooms": int, "bathrooms": float},
    output_schema={"price": float}
)

# Build model
model.build(
    datasets=[df],
    provider="openai/gpt-4o-mini",
    max_iterations=3,
    timeout=600,
    run_timeout=180,
    chain_of_thought=True,
    verbose=False
)

# Make prediction
prediction = model.predict({"square_footage": 2000, "bedrooms": 3, "bathrooms": 2})
print(f"Predicted price: {prediction}")

# Save model
save_path = plexe.save_model(model, "housing_model.tar.gz")

# Load model
loaded_model = plexe.load_model(save_path)

For more details on specific components, see the other reference sections: