datadreamer#
DataDreamer Sessions#
You can run prompting, synthetic data generation, and training workflows within a DataDreamer session using a context manager like so:
from datadreamer import DataDreamer
with DataDreamer('./output/'):
# ... run steps or trainers here ...
Inside the with
block, you can run any Step
or
Trainer
you want. DataDreamer will automatically
organize, cache, and save the results of each step run within a session to the output
folder.
In-Memory Sessions#
Optionally, you can run DataDreamer fully in-memory, without it saving anything to disk,
by passing ':memory:'
as the output_folder_path
argument like
with DataDreamer(':memory:'):
.
Sessions in Interactive Environments#
As an alternative to using a Python context manager (with
block), you can also
structure your code with start()
and stop()
to achieve the same result. Using the context manager, however, is recommended and
preferred. Using start()
and stop()
may be
useful if you want to run DataDreamer in a Jupyter or Google Colab notebook or
other interactive environments.
from datadreamer import DataDreamer
dd = DataDreamer('./output/')
dd.start()
# ... run steps or trainers here ...
dd.stop()
Caching#
DataDreamer caches the results of each step or trainer run within a session to the output folder. If a session is interrupted and re-run, DataDreamer will automatically load the results of previously completed steps from disk and resume where it left off.
- class datadreamer.DataDreamer(output_folder_path, verbose=None, log_level=None, log_date=False, hf_log=False)[source]#
Bases:
object
Constructs a DataDreamer session.
- Parameters:
output_folder_path (
str
) β The output folder path to organize, cache, and save results of each step or trainer run within a session.verbose (
Optional
[bool
], default:None
) β Whether or not to print verbose logs.log_level (
Optional
[int
], default:None
) β The logging level to use (DEBUG
,INFO
, etc.).log_date (
bool
, default:False
) β Whether or not to include the date and time in the logs.hf_log (default:
False
) β Whether to override and silence overly verbose Hugging Face logs within the session. Defaults toTrue
. Set toFalse
to debug issues related to Hugging Face libraries.
- static initialized()[source]#
Queries whether or not a DataDreamer session is currently active.
- Return type:
- Returns:
Whether or not a DataDreamer session is currently active.
- static get_output_folder_path()[source]#
Gets the output folder path of the current DataDreamer session.
- Return type:
- Returns:
The output folder path of the current DataDreamer session.
- start()[source]#
Starts a DataDreamer session. This is an alternative to using a Python context manager. Using the context manager, however, is recommended and preferred. This method might be useful if you want to run DataDreamer in an interactive environment where a
with
block is not possible or cumbersome.
- stop()[source]#
Stops a DataDreamer session. This is an alternative to using a Python context manager. Using the context manager, however, is recommended and preferred. This method might be useful if you want to run DataDreamer in an interactive environment where a
with
block is not possible or cumbersome.