Skip to main content
This page provides comprehensive reference documentation for the core classes and functions in the Plexe Python library.

Core Classes

Model

The primary class in the Plexe library, representing a machine learning model.
class Model:
    def __init__(
        self,
        intent: str,
        input_schema: Type[BaseModel] | Dict[str, type] = None,
        output_schema: Type[BaseModel] | Dict[str, type] = None,
        constraints: List[Constraint] = None,
        distributed: bool = False
    )
Parameters:
ParameterTypeDescription
intentstrNatural language description of what the model should do.
input_schema`Type[BaseModel]Dict[str, type]`Schema defining input data structure. Can be a dictionary or Pydantic model. Default: None.
output_schema`Type[BaseModel]Dict[str, type]`Schema defining output data structure. Can be a dictionary or Pydantic model. Default: None.
constraintsList[Constraint]List of constraints the model should adhere to. Default: None.
distributedboolWhether to use distributed execution (Ray) when building. Default: False.
Methods:

build

def build(
    self,
    datasets: List[Union[pd.DataFrame, DatasetGenerator]],
    provider: Union[str, ProviderConfig] = "openai/gpt-4o-mini",
    timeout: Optional[int] = None,
    max_iterations: Optional[int] = None,
    run_timeout: int = 1800,
    callbacks: Optional[List[Callback]] = None,
    verbose: bool = False,
    chain_of_thought: bool = True
) -> None
Builds the model using the provided datasets and configuration. Parameters:
ParameterTypeDescription
datasetsList[Union[pd.DataFrame, DatasetGenerator]]List of pandas DataFrames or DatasetGenerator objects for training data.
providerUnion[str, ProviderConfig]LLM provider to use, either as a string (“openai/gpt-4o-mini”) or ProviderConfig object. Default: “openai/gpt-4o-mini”.
timeoutOptional[int]Maximum total time (in seconds) for the entire build process (all iterations).
max_iterationsOptional[int]Maximum number of iterations to attempt.
run_timeoutintMaximum time (in seconds) for each individual training run. Default: 1800 (30 minutes).
callbacksOptional[List[Callback]]List of callback objects for monitoring the build process.
verboseboolWhether to display detailed agent logs. Default: False.
chain_of_thoughtboolWhether to enable verbose output of agent reasoning. Default: True.
Returns: None Note: At least one of timeout or max_iterations must be provided.

predict

def predict(
    self,
    x: Dict[str, Any],
    validate_input: bool = False,
    validate_output: bool = False
) -> Dict[str, Any]
Makes a prediction using the trained model. Parameters:
ParameterTypeDescription
xDict[str, Any]Input data for prediction.
validate_inputboolWhether to validate input against schema. Default: False.
validate_outputboolWhether to validate output against schema. Default: False.
Returns: Dict[str, Any] - Prediction result

get_state

def get_state(self) -> str
Returns the current state of the model. Returns: str representing model state: "draft", "building", "ready", or "error"

get_metadata

def get_metadata(self) -> Dict[str, Any]
Returns metadata about the model. Returns: Dictionary containing metadata

get_metrics

def get_metrics(self) -> Optional[Dict[str, Any]]
Returns metrics for the trained model if available. Returns: Dictionary containing metrics or None

describe

def describe(self) -> ModelDescription
Returns a detailed description of the model. Returns: ModelDescription object

DatasetGenerator

Class for generating synthetic data or augmenting existing data.
class DatasetGenerator:
    def __init__(
        self,
        description: str,
        provider: str,
        schema: Type[BaseModel] | Dict[str, type] = None,
        data: pd.DataFrame = None
    ) -> None
    
    def generate(self, num_samples: int):
        """Generates synthetic data if a provider is available."""
Constructor Parameters:
ParameterTypeDescription
descriptionstrHuman-readable description of the dataset.
providerstrLLM provider used for synthetic data generation.
schema`Type[BaseModel]Dict[str, type]`The schema the data should match, if any. Default: None.
datapd.DataFrameA dataset of real data on which to base the generation, if available. Default: None.
Methods:
MethodDescription
generate(num_samples: int)Generates the specified number of synthetic data samples.
data propertyReturns the dataset as a pandas DataFrame.

Callback

Base class for callbacks that monitor the build process.
class Callback:
    def on_build_start(self, info: BuildStateInfo) -> None:
        pass

    def on_iteration_start(self, info: BuildStateInfo) -> None:
        pass

    def on_iteration_end(self, info: BuildStateInfo) -> None:
        pass

    def on_build_end(self, info: BuildStateInfo) -> None:
        pass
See Callbacks Reference for more details.

Constraint

Represents rules or conditions that the model should satisfy.
class Constraint:
    def __init__(self, condition: Callable[[Any, Any], bool], description: str)
Parameters:
ParameterTypeDescription
conditionCallable[[Any, Any], bool]Function that evaluates the constraint.
descriptionstrHuman-readable description of the constraint.

Core Functions

save_model

def save_model(model: Model, path: str | Path) -> str
Saves a trained model to a tar archive. Parameters:
ParameterTypeDescription
modelModelThe model to save.
path`strPath`Path where the model should be saved. Must end with .tar.gz.
Returns: str - Path where the model was saved

load_model

def load_model(path: str | Path) -> Model
Instantiate a model from a tar archive. Parameters:
ParameterTypeDescription
path`strPath`Path to the saved model archive (.tar.gz file).
Returns: Model - The loaded model

configure_logging

def configure_logging(
    level: Union[str, int] = logging.INFO,
    file: Optional[str] = None
) -> None
Configures logging for the Plexe library. Parameters:
ParameterTypeDescription
levelUnion[str, int]Logging level (from logging module or as string).
fileOptional[str]Path to a log file. If provided, logs will be written to this file in addition to stdout.
Returns: None

Enums and Constants

ModelState

Enum representing the possible states of a model. The get_state() method returns the string value.
class ModelState(Enum):
    DRAFT = "draft"        # Initial state, before building
    BUILDING = "building"  # During the build process
    READY = "ready"        # Build complete, model ready for use
    ERROR = "error"        # Build failed
Note: When checking model state, compare against the string values:
if model.get_state() == "ready":
    # Ready to make predictions

Provider Configuration

ProviderConfig

Class for configuring different LLM providers for different agent roles (imported from plexe.internal.common.provider).
from plexe.internal.common.provider import ProviderConfig

class ProviderConfig:
    def __init__(
        self,
        default_provider: str,
        orchestrator_provider: Optional[str] = None,
        research_provider: Optional[str] = None,
        engineer_provider: Optional[str] = None,
        ops_provider: Optional[str] = None,
        tool_provider: Optional[str] = None
    )
Parameters:
ParameterTypeDescription
default_providerstrDefault provider for all roles (e.g., “openai/gpt-4o-mini”).
orchestrator_providerOptional[str]Provider for the orchestrator agent (coordinates the overall process).
research_providerOptional[str]Provider for the research agent (analyzes problems and plans solutions).
engineer_providerOptional[str]Provider for the engineer agent (generates training code).
ops_providerOptional[str]Provider for the ops agent (generates inference code).
tool_providerOptional[str]Provider for tool agents (performs specialized tasks).

Performance Metrics

Metric

Class representing a performance metric for a model.
class Metric:
    def __init__(
        self, 
        name: str, 
        value: float = None, 
        comparator: MetricComparator = None,
        is_worst: bool = False
    )
Parameters:
ParameterTypeDescription
namestrName of the metric (e.g., “accuracy”, “rmse”).
valuefloatNumeric value of the metric. Default: None.
comparatorMetricComparatorComparison logic for determining which metric values are better. Default: None.
is_worstboolWhether this is the worst possible value for the metric. Default: False.

MetricComparator

Encapsulates comparison logic for metrics.
class MetricComparator:
    def __init__(
        self, 
        comparison_method: ComparisonMethod, 
        target: float = None, 
        epsilon: float = 1e-9
    )
Parameters:
ParameterTypeDescription
comparison_methodComparisonMethodThe method used to compare metrics (HIGHER_IS_BETTER, LOWER_IS_BETTER, etc.).
targetfloatThe target value for TARGET_IS_BETTER comparisons. Default: None.
epsilonfloatSmall value used for floating point comparisons. Default: 1e-9.

Type Hints

The library uses the following type hints:
SchemaType = Union[Dict[str, Any], Type[BaseModel]]
DatasetType = Union[pd.DataFrame, DatasetGenerator]
ProviderType = Union[str, ProviderConfig]

Usage Example

import plexe
import pandas as pd

# Load data
df = pd.read_csv("housing.csv")

# Create model
model = plexe.Model(
    intent="Predict house prices based on features",
    input_schema={"square_footage": float, "bedrooms": int, "bathrooms": float},
    output_schema={"price": float}
)

# Build model
model.build(
    datasets=[df],
    provider="openai/gpt-4o-mini",
    max_iterations=3,
    timeout=600,
    run_timeout=180,
    chain_of_thought=True,
    verbose=False
)

# Make prediction
prediction = model.predict({"square_footage": 2000, "bedrooms": 3, "bathrooms": 2})
print(f"Predicted price: {prediction}")

# Save model
save_path = plexe.save_model(model, "housing_model.tar.gz")

# Load model
loaded_model = plexe.load_model(save_path)
For more details on specific components, see the other reference sections: