Tuesday, March 11, 2008

Richard Cole's Parallel Merge Sort.

[Excerpted from an article entitled with Simulation and Visualization Tools for Teaching Parallel Merge Sort by Robin Trahan and Susan Rodger]
Copyright (C) 1993 ACM Press.

Overview of Parallel Merge Sort

We give a brief description of the algorithm and refer the reader to [1] for the complete description. Cole's parallel merge sort achieves O(log n) running time using n processors by pipelining the merges of sorted lists through a complete binary tree which we call the flow tree. The flow tree is not an actual data structure; it describes the flow path of the data. By pipelining the merges, information from the previous merge can be used to speed up the current merge. With this additional information, merging two sorted lists in the pipeline takes constant time.

Initially, the elements to be sorted are found at the leaf nodes of the flow tree. Let u be an internal node of the tree. During the execution of the algorithm, u will contain sorted subsets of the elements initially found at the leaf nodes of the subtree rooted at u. We will refer to the sorted subset associated with node u as u's current list. The task of u is to compute a current list containing all the elements originally at the leaf nodes of its subtree.

The algorithm consists of 3logn stages. At each stage, an internal node is active if the node has received some elements in the pipeline but has not yet received all the elements in its subtree. Evey active node u calculates a new current list that will contain twice as many elements as its previous current list in the following manner. First u saves its current list by copying it into what we call u's previous list. Then the two children of u pass up a sorted sample of their current list, and u merges these sample lists to create a new current list. This process takes place at all active nodes in parallel and is carried out by processors assigned to each element of the previous, the current, and the sample lists.

The sample passed up usually consists of every fourth element of a node's current list. If a node's current list contains all the elements initially at the leaves of its subtree, then in the next three consecutive stages the node passes every fourth, then every second, and then every element of its list to its parent. After a node has passed up every element it is no longer active and the processors assigned to that node can be reallocated.

To merge two sorted lists, the rank (or position) of an element in the resulting merged list is calculated. An element's rank in a list is a count of the number of elements in the list whose values are less than or equal to its own value. The rank of an element in the resulting merged list is the sum of its rank in its own list and its rank in the other list. To perform the merge of two sample lists in constant time, Cole maintains information that allows the elements in the sample lists to quickly calculate their ranks in the resulting merged list. This information consists of the ranks of the elements from the previous list in the sample lists. Using these ranks, an element in one sample list can determine an interval of consecutive elements in the other sample list that its value lies within. Cole proves that the number of elements within this interval is at most three, and calls it the 3-cover property. Therefore, in constant time an element in a sample list can rank itself in the other sample list by examining the elements within the appropriate interval in the other list.

Blog: Processor Architect

http://babyworm.net/tatter/

프로세서, SoC, ASIC 설계 + alpha.

Creative Stuff.

http://creativestuff.co.kr/

Tron

http://www.tron.org/index.html T-Engline 소스 코드 공개