Provide training and validation data to the model build process using Pandas DataFrames or DatasetGenerator.
model.build()
method requires data to train and evaluate the machine learning model. You can provide this data in two main ways:
plexe.DatasetGenerator
objects: Useful for generating synthetic data or augmenting existing datasets.datasets
argument of model.build()
.
dataset_0
, dataset_1
) to these DataFrames and use them during the build process. If schemas are not provided, they will be inferred from these DataFrames.
DatasetGenerator
DatasetGenerator
class allows you to define requirements for a dataset, potentially generating synthetic data or augmenting existing data using an LLM provider.
Use Case 1: Generating purely synthetic data
DatasetGenerator
and potentially use it as a base for generating more samples (although the exact augmentation mechanism within build
needs confirmation).