IEEE 10- Optimized Scheduling for Group Communication in Data Parallelism


Group communication significantly influences the
performance of data parallel applications. Nevertheless, the
important factor that influences the efficiency of group
communication is often neglected: a larger communication idle
time may occur when there is node contention and difference
among message lengths during one particular communication
step. Group communication scheduling has attracted more and
more attentions. In previous works, researchers can’t completely
avoid communication conflict or they only focus on some special
cases. This paper is devoted to develop a universal and efficient
scheduling strategy concerning with the situation where array
distributions are block-cyclic. Base on the proof for the recursive
theorems of communication table elements, this strategy
generates a communication scheduling table so that each column
is a permutation of receiving node number in each
communication step. And the messages with the close size are put
into a communication step as near as possible. This indicates that
our strategy not only avoids inter-processor contention, but it
also minimizes real communication cost in each communication
step. Finally, experimental results show that our strategy has
better performance than the general method and the
implementation of all-to-all based scheduling, and greedy