datadreamer#

DataDreamer Sessions#

You can run prompting, synthetic data generation, and training workflows within a DataDreamer session using a context manager like so:

from datadreamer import DataDreamer

with DataDreamer('./output/'):
    # ... run steps or trainers here ...

Inside the with block, you can run any Step or Trainer you want. DataDreamer will automatically organize, cache, and save the results of each step run within a session to the output folder.

In-Memory Sessions#

Optionally, you can run DataDreamer fully in-memory, without it saving anything to disk, by passing ':memory:' as the output_folder_path argument like with DataDreamer(':memory:'):.

Sessions in Interactive Environments#

As an alternative to using a Python context manager (with block), you can also structure your code with start() and stop() to achieve the same result. Using the context manager, however, is recommended and preferred. Using start() and stop() may be useful if you want to run DataDreamer in a Jupyter or Google Colab notebook or other interactive environments.

from datadreamer import DataDreamer

dd = DataDreamer('./output/')
dd.start()
# ... run steps or trainers here ...
dd.stop()

Caching#

DataDreamer caches the results of each step or trainer run within a session to the output folder. If a session is interrupted and re-run, DataDreamer will automatically load the results of previously completed steps from disk and resume where it left off.

datadreamer.__version__#

The version of DataDreamer installed.

Type:

str

class datadreamer.DataDreamer(output_folder_path, verbose=None, log_level=None, log_date=False, hf_log=False)[source]#

Bases: object

Constructs a DataDreamer session.

Parameters:
  • output_folder_path (str) – The output folder path to organize, cache, and save results of each step or trainer run within a session.

  • verbose (Optional[bool], default: None) – Whether or not to print verbose logs.

  • log_level (Optional[int], default: None) – The logging level to use (DEBUG, INFO, etc.).

  • log_date (bool, default: False) – Whether or not to include the date and time in the logs.

  • hf_log (default: False) – Whether to override and silence overly verbose Hugging Face logs within the session. Defaults to True. Set to False to debug issues related to Hugging Face libraries.

static initialized()[source]#

Queries whether or not a DataDreamer session is currently active.

Return type:

bool

Returns:

Whether or not a DataDreamer session is currently active.

static get_output_folder_path()[source]#

Gets the output folder path of the current DataDreamer session.

Return type:

str

Returns:

The output folder path of the current DataDreamer session.

start()[source]#

Starts a DataDreamer session. This is an alternative to using a Python context manager. Using the context manager, however, is recommended and preferred. This method might be useful if you want to run DataDreamer in an interactive environment where a with block is not possible or cumbersome.

stop()[source]#

Stops a DataDreamer session. This is an alternative to using a Python context manager. Using the context manager, however, is recommended and preferred. This method might be useful if you want to run DataDreamer in an interactive environment where a with block is not possible or cumbersome.

Subpackages#