Concept of Data Parallelism#
Authors: Jinwon Kim
Data Parallelism is a widely-used technique for training deep learning models in parallel. It involves distributing the training data across multiple processing units, such as GPUs, each of which has a copy of the model parameters. The data is divided into subsets, and each unit independently computes the gradients for its subset. The gradients are then aggregated to update the model parameters. This approach enables efficient parallelization of the training process and can accelerate the training of deep learning models on large datasets.
Oslo supports Zero Redundancy Optimizer (ZeRO) to easily scale deep learning model.