Algorithms for scalable synchronization on shared-memory multiprocessors pdf
The existence of scalable algorithms greatly weakens the case for costly special-purpose hardware support for synchronization, and provides a case against so-called "dance hall" architectures, in which shared memory locations are equally far from all processors. Contributor s : John M. Mellor-Crummey - - Author Michael L. Computer Science Department.
All Rights Reserved. TR Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-memory parallel programs. Scheduler-conscious synchronization. Highly Influenced. View 7 excerpts, cites background and methods. Queue locks on cache coherent multiprocessors.
View 3 excerpts, cites background. We consider algorithms for implementing mutual exclusion on the Cray-T3E virtual shared memory using various atomic operations. View 1 excerpt, cites methods. Algorithms for scalable synchronization on shared-memory multiprocessors.
View 13 excerpts, cites background. IEEE Trans. Parallel Distributed Syst. The author examines the questions of whether there are efficient algorithms for software spin-waiting given hardware support for atomic instructions, or whether more complex kinds of hardware support … Expand. Dynamic decentralized cache schemes for mimd parallel processors. ISCA ' A survey of synchronization methods for parallel computers. An examination is given of how traditional synchronization methods influence the design of MIMD multiple-instruction multiple-data-stream multiprocessors.
Efficient synchronization primitives for large-scale cache-coherent multiprocessors. This machine uses an enhanced message switching network … Expand. Unfortunately, t ypical implementations of busy-waiting tend to produce large amounts of memory and interconnect contention, introducing performance bottlenecks that become markedly more pronounced as applications scale.
We argue that this problem is not fundamental, and that one can in fact construct busy-wait synchronization algorithms that induce no memory or more » The key to these algorithms is for every processor to spin on separate locally-accessible ag variables, and for some other processor to terminate the spin with a single remote write operation at an appropriate time. Flag variables may be locally-accessible as a result of coherent c a c hing, or by virtue of allocation in the local portion of physically distributed shared memory.
We present a new scalable algorithm for spin locks that generates O 1 remote references per lock acquisition, independent of the number of processors attempting to acquire the lock. Our algorithm provides reasonable latency in the absence of contention, requires only a constant a m o u n t of space per lock, and requires no hardware support other than a s w ap-with-memory instruction.
0コメント