model.build()
method requires data to train and evaluate the machine learning model. You can provide this data in two main ways:
- List of Pandas DataFrames: The simplest way for tabular data.
- List of
plexe.DatasetGenerator
objects: Useful for generating synthetic data or augmenting existing datasets.
Using Pandas DataFrames
If your data is already loaded into Pandas DataFrames, pass a list containing these DataFrames to thedatasets
argument of model.build()
.
dataset_0
, dataset_1
) to these DataFrames and use them during the build process. If schemas are not provided, they will be inferred from these DataFrames.
Using DatasetGenerator
The DatasetGenerator
class allows you to define requirements for a dataset, potentially generating synthetic data or augmenting existing data using an LLM provider.
Use Case 1: Generating purely synthetic data
DatasetGenerator
and potentially use it as a base for generating more samples (although the exact augmentation mechanism within build
needs confirmation).