Running Models on Multiple GPUs#
There are various ways to run models on multiple GPUs in DataDreamer.
Large LLMs on Multiple GPUs#
To split a large model that cannot fit on a single GPU you can set the device_map
parameter of the
HFTransformers
class to 'auto'
. This will automatically split the model by layer
onto your available GPUs. You can also manually specify
how and where the model should be split.
Smaller Models#
For smaller models, the ParallelLLM
wrapper takes in multiple LLM
objects
and behaves like a single unified LLM
object that can then be passed to a step like Prompt
.
ParallelLLM
will run any inputs it recieves against all of the models in parallel. This is useful for running smaller models on multiple GPUs
as each LLM
passed to the wrapper can be on a different GPU. Your model must be able to fit on a single GPU
for this to work.
Similarly, we have other parallelization wrappers for other types of models like ParallelEmbedder
,
ParallelRetriever
, etc.