Running Models on Multiple GPUs#

There are various ways to run models on multiple GPUs in DataDreamer.

Large LLMs on Multiple GPUs#

To split a large model that cannot fit on a single GPU you can set the device_map parameter of the HFTransformers class to 'auto'. This will automatically split the model by layer onto your available GPUs. You can also manually specify how and where the model should be split.

Smaller Models#

For smaller models, the ParallelLLM wrapper takes in multiple LLM objects and behaves like a single unified LLM object that can then be passed to a step like Prompt. ParallelLLM will run any inputs it recieves against all of the models in parallel. This is useful for running smaller models on multiple GPUs as each LLM passed to the wrapper can be on a different GPU. Your model must be able to fit on a single GPU for this to work.

Similarly, we have other parallelization wrappers for other types of models like ParallelEmbedder, ParallelRetriever, etc.