Detailed reference documentation for dataset handling in the Plexe Python library.
datasets
parameter of model.build()
:
DatasetGenerator
class allows you to generate synthetic data or augment existing data using LLMs.
Parameter | Type | Description | |
---|---|---|---|
description | str | Human-readable description of the dataset | |
provider | str | LLM provider used for synthetic data generation | |
schema | `Type[BaseModel] | Dict[str, type]` | The schema the data should match, if any. Can be a Pydantic model or dictionary. |
data | pd.DataFrame | A dataset of real data on which to base the generation, if available |
description
and schema
provided. The description should give clear guidance about:
DatasetGenerator
, use Pydantic’s Field
attributes to provide rich information:
datasets
parameter:
DatasetGenerator
objects are provided