Today's distributed memory machines are typically multiple interconnected many-core shared memory nodes. The most common programming model to progam distributed memory machines is MPI, whereas the most common programming model to program shared memory machines is OpenMP. One approach to use the full parallelism of these machines, and to guarantee overlap of communication with computation, is mixing MPI with OpenMP.
However, when a shared-memory programming model is used on top of MPI, the MPI library must be initialized with MPI_THREAD_MULTIPLE
to provide full thread-safety, causing several performance-related issues, such as internal locking.
Recently, Partitioned Communication [1] was accepted to the MPI 4.0 standard to address some of these issues. Partitioned Communication is a form of Persistent Communication, which means that the communication pattern is static and can be reused, for example along multiple iterations of a loop. The communication buffer of partitioned communications is divided into multiple parts, each handled typically by a separate thread.
Another promising approach is Software Offloading [2], where one or more of the cores of the many-core node are dedicated only to communication. Compute threads forward their communication operations to a thread running on the dedicated communication core, which allows to avoid the internal locking and to guarantee progress in non-blocking communication for improved overlap of communication and computation.
Performance matters! During this thesis you can:
We have developed a high-performance C++ library for software offloading at LMU named libMMCSO
on top of MPI.
However, Partitioned Communication, as well as the recently proposed Partitioned Collective Communication [3], are not implemented yet in libMMCSO
. Additionally, the benefits of using Partitioned Communication with Offloading compared to other communication modes remain an open question.
In the context of this thesis, you will:
libMMCSO
libMMCSO
libMMCSO
with other communcation modes in micro-benchmarks and suitable use-caseslibMMCSO
as a plugin.