datasets#

OutputDataset and OutputIterableDataset dataset objects are returned as outputs from Step objects under the output attribute.

Tip

You never need to construct a dataset object yourself. They are returned as output from Step objects. If you need to convert in-memory Python data or data in files to a DataDreamer dataset object, see the DataSource steps available in datadreamer.steps.

Accessing Columns#

To access a column on the dataset objects you can use the __getitem__ operator like so: step.output['column_name']. This will return a OutputDatasetColumn or OutputIterableDatasetColumn column object that can be passed as an input to the inputs argument of a Step.

class datadreamer.datasets.OutputDataset(step, dataset, pickled=False)[source]#

Bases: OutputDatasetMixin

property dataset: Dataset[source]#

The underlying Hugging Face Dataset.

property num_rows: int[source]#

The number of rows in the dataset.

__getitem__(key)[source]#

Get a row or column from the dataset.

Parameters:

key (int | slice | str | Iterable[int]) – The index or name of the column to get.

Return type:

Any

Returns:

The row or column from the dataset.

property column_names: list[str][source]#

The column names in the dataset.

property num_columns: int[source]#

The number of columns in the dataset.

property step: Step[source]#

The step that produced the dataset.

class datadreamer.datasets.OutputDatasetColumn(step, dataset, pickled=False)[source]#

Bases: OutputDatasetColumnMixin, OutputDataset

class datadreamer.datasets.OutputIterableDataset(step, dataset, pickled=False, total_num_rows=None)[source]#

Bases: OutputDatasetMixin

property dataset: IterableDataset[source]#

The underlying Hugging Face IterableDataset.

property num_rows: None | int[source]#

The number of rows in the dataset.

class datadreamer.datasets.OutputIterableDatasetColumn(step, dataset, pickled=False, total_num_rows=None)[source]#

Bases: OutputDatasetColumnMixin, OutputIterableDataset