plexe) provides a powerful way to build machine learning models using natural language. Understanding these core concepts will help you use it effectively.
Model (plexe.Model)
This is the central class you interact with. A Model object represents a machine learning task and, once built, the resulting trained model.
- Initialization: You create a
Modelby specifying itsintentand optionally itsinput_schema,output_schema, andconstraints. - State: A
Modelprogresses through states:DRAFT(initial),BUILDING(duringbuild()call),READY(successfully built),ERROR(build failed). - Building: The
build()method triggers the agentic workflow to generate, train, and evaluate the model based on the intent and provided data. - Prediction: The
predict()method uses the trained model (onceREADY) to make predictions on new data. - Persistence:
save_model()andload_model()allow you to store and retrieve trained models.
Intent
Theintent is a natural language string describing what you want the machine learning model to do. It’s the primary instruction given to the Plexe agent system.
- Example:
"Predict the likelihood of customer churn based on their recent activity and subscription plan." - Clarity is Key: A clear and specific intent leads to better model planning and results. Include context about the goal, inputs, and outputs if possible.
Schemas (input_schema, output_schema)
Schemas define the structure and data types of the model’s expected inputs and outputs.
- Purpose: They ensure data consistency and help the agents understand the data format.
- Formats: Can be provided as Pydantic models (recommended) or Python dictionaries.
- Inference: If not provided, Plexe attempts to infer schemas from the
datasetssupplied during thebuildcall, using LLM analysis to identify the likely target variable(s). Explicitly defining schemas is often more reliable.
Build Process (model.build())
This method orchestrates the core model creation workflow.
- Inputs: Requires
datasets(list of Pandas DataFrames orDatasetGeneratorobjects) and aproviderconfiguration (LLM to use). Optional arguments includemax_iterations,timeout,callbacks, etc. - Agent System (
PlexeAgent): Internally,buildinvokes a multi-agent system (PlexeAgent) comprising specialized roles:- Orchestrator: Manages the overall workflow, delegating tasks.
- ML Researcher: Analyzes the problem, proposes solution plans.
- ML Engineer: Writes and refines the model training code based on the plan.
- ML Ops Engineer: Generates the inference code for the final model.
- (Tool Agents): Smaller, specialized agents perform tasks like metric selection, schema inference, code validation, and execution using defined “tools”.
- Iteration: The system may try multiple approaches (
max_iterations) to find the best model according to the selected performance metric. Each attempt involves planning, code generation, execution, and evaluation. - Output: Updates the
Modelobject’s state (READYorERROR), populates itspredictor,metric,metadata,artifacts, and source code attributes.
Provider (provider, ProviderConfig)
Specifies the Large Language Model(s) (LLMs) used by the agent system.
- Simple String:
"openai/gpt-4o-mini"or"anthropic/claude-3-haiku-20240307". Uses this model for most agent tasks. ProviderConfigObject: Allows assigning different models to different agent roles (Orchestrator, Researcher, Engineer, Ops, Tool) for fine-grained control over cost and capability.- Dependencies: Relies on LiteLLM for multi-provider support. Requires appropriate API keys set as environment variables.
Datasets (datasets, DatasetGenerator)
Data used for training and evaluation.
- Input: Pass data to
model.build()as a list containing Pandas DataFrames. DatasetGenerator: A class (plexe.DatasetGenerator) can be used to define requirements for a dataset, including generating synthetic data based on a schema or augmenting existing data using an LLM provider. Pass instances of this class in thedatasetslist if needed.
Callbacks (plexe.Callback, MLFlowCallback)
Classes that allow you to hook into the build() process lifecycle (on_build_start, on_iteration_start, on_iteration_end, on_build_end).
- Purpose: Used for logging, monitoring, custom artifact handling, etc.
- Built-in: Includes
MLFlowCallbackfor easy integration with MLflow tracking. TheChainOfThoughtModelCallbackprovides verbose agent step logging whenchain_of_thought=Trueis set inbuild(). - Custom: You can create your own callbacks by inheriting from
plexe.Callback.
Constraints (plexe.Constraint)
Represents rules or conditions that a model’s input/output pairs should satisfy. Constraints enable you to specify business rules and validation criteria for your model’s behavior.
- Definition: Create constraints with a
conditionfunction that takes input and output data and returns a boolean - Composability: Combine constraints using logical operators (
&for AND,|for OR,~for NOT) - Usage Example:

