清华大学Zhixin Ba ,Haichang Zhou ,Huai Zhang and Zhenxiao Yang IEEE会议文章剽窃



所有跟贴·加跟贴·新语丝读书论坛http://www.xys.org/cgi-bin/mainpage.pl

送交者: BerkeleyWolf 于 2005-3-11, 03:52:47:

清华大学 Zhixin Ba ,Haichang Zhou ,Huai Zhang and Zhenxiao Yang IEEE会议文章剽窃

BerkeleyWolf


BerkeleyWolf详细比较了一篇发表在IEEE上的剽窃文章。剽窃文章的作者为署名清华大学的Zhixin Ba ,Haichang Zhou ,Huai Zhang and Zhenxiao Yang,发表于2000 The Fourth International Conference on High-Performance Computing in the Asia-Pacific Region-Volume 1 。被剽窃的文章发表于1994 Scalable Parallel Libraries Conference,作者Natawut Nupairoj and Lionel M. Ni。2000年的文章是比较稀疏的3页,1994年的文章是比较密集的8页。本文中,未加 【 】的是2000年的剽窃文章,因为xys格式所限除去了图表和公式,为文章全文。加【 】的是1994年文章,仅为和2000年文章对应部分。因为PDF转换TXT技术所限,个别单词可能有所错误。


该剽窃事件已经通过电子邮件发送通知现在仍然在密西根州立的Lionel M. Ni教授。 (http://www.cse.msu.edu/~ni/)。

剽窃文章由 anarch 最早发现报料,并发在MITBBS上。感谢XYS的读者Chen先生(未经许可,故不提供全名,见谅)提供了两篇文章的电子文本。


Performance Evaluation of some MPI Implementations
on Workstation Clusters
Zhixin Ba ,Haichang Zhou ,Huai Zhang and Zhenxiao Yang
High performance computing center
Cernet, Tsinghua University, 100084
bazx@chpcc.edu.cn

http://csdl.computer.org/comp/proceedings/hpc/2000/0589/01/05890392abs.htm
The Fourth International Conference on High-Performance Computing in the Asia-Pacific Region-Volume 1
May 14 - 17, 2000
Beijing, China


【Performance Evaluation of Some MPI Implementations on
Workstation Clusters *
Natawut Nupairoj and Lionel M. Ni
Department of Computer Science
Michigan State University
East Lansing, MI 48824-1027
{nupairoj, ni}@cps.msu.edu
http://ieeexplore.ieee.org/xpl/abs_free.jsp?arNumber=376999
Scalable Parallel Libraries Conference, 1994., Proceedings of the 1994

Abstract
Message Passing Interface (MPI) has already become a standard of the communication library for distributed-memory computing systems.
【Message Passing Interface (MPI) is an attempt to standardize the communication library for distributed-memory computing systems. 】

Since the release of the new versions of MPI specification, several MPI implementations have been made public available. .
【Since the re-lease of the recent MPI specification, several MPI implementations have been made publicly available.】

Different implementations employ different approaches. It is critical to selecting an appropriate MPI implementation for message passing based of communication is extremely crucial to these applications.

【Different implementations employ different approaches, and thus, the performance of each implementation may vary. Since the performance of communication is extremely crucial to message-passing based applications, selecting an appropriate MPI implementation becomes critical. 】

Our study is intended to provide a guideline on how to submit a task and how to perform such a task, economically and effectively, on workstation clusters in high performance computing.

【Our study is intended to provide a guideline on how to perform such a task on workstation clusters which are known to be an economical and effective platform in high performance computing.】

We investigate several MPI aspects including its implementations, supporting hardware environment and derived datatype which affect the communication performance. In the end, our results point out the strength and weakness of different implementations on our experimental system.

【We investigate several MPI aspects including its functionalities and performance. Our results also point out the strength and weakness of each implementation on our experimental system. 】

1. Introduction
In our study, four popular MPI implementations, shown in Figure 1 , are considered. Our testing environment is based on IBM SP2 system interconnected via both Ethernet and high performance switch. The high performance switch can provide up to 1 OOMbps per channel.

【Our testing environment consists of 6 DEC Alpha workstations interconnected via both Ethernet and a DEC GIGAswitch. The DEC GIGA switch can provide up to 100 Mbps per channel. 】


Figure 'I. The model of the communication modes

【identical to figure in 1994 paper: Figure 1. The model of the communication modes】

We have developed a set of benchmarks to evaluate the performance of both point-to-point and collective communication services. These benchmark programs include:

【We have developed a set of benchmarks to evaluate the performance of both point-to-point and collective communication services. These benchmark programs include: 】

1. Ping: to measure the peak performance of the point-to-point communication over a communication channel;

【1. Ping: to measure the peak performance of the point-to-point communication over a communication channel; 】


2. PingPong: to evaluate the end-to-end communication latency which include the effect of the communication protocol;
【2. PingPong: to evaluate the end-to-end communication latency which includes the effect of the communication protocol; and】


3. Collective: to evaluate the performance of some collective communication, including broadcast, and barrier synchronization.
【3. Collective: to evaluate the performance of some collective communication, including broadcast, and barrier synchronization.】

The rest of this paper is organized as follow. In section 2, we discuss the model and performance metrics used in our study. Section 3 introduces experimental results. In Section 4, we conclude our paper.

Due to space limitation, only partial results are presented in this paper.
【Due to space limitation, only partial results are presented. Interested readers may refer to [6] for additional performance results. 】

2 Model and Metrics

【3 Model and Metrics 】

2.1 Measurement Model

【3.1 Measurement Model】

Figure 2. The model of the communication

【identical to 1994 paper: Figure 2. The measurement model 】

2.2 Performance Metrics

【3.2 Performance Metrics】

We compare with our program benchmarks on different communication models. We also focus on the difference of communication performance of different communication models. The following two metrics are sufficient for the evaluation.

【Comparing two communication systems requires measuring several metrics. In our study, we compare the implementation of different communication libraries. Thus, only two metrics are sufficient for the evaluation.】

Communication latency (t)
The communication latency (t) is defined to be the time that a process spend when it sends or receives (or both) a message. The communication latency is proportional to the message size, which is given by

【We define the communication latency (t ) to be the time that a process has to spend when it sends or receives (or both) a message. The communication latency is proportional to the message size which is given by 】

t = t_s +n x t i +[n /p] J X t_p (1)

【t = t_s +n x t i +[n /p] J X t_p (1) 】

where t, is the start-up latency which is fixed for each message, n indicates the size of message, t, is the transmission latency (usually much less then t_s), and t_p, is the packaging latency. The start-up latency also includes the fixed cost of system call and initialization overhead.

【where t , is the start-up latency which is fixed for each message, n indicates the size of the message, t t is the transmission latency (usually much less than t S ), and t , is the packetization latency. The start-up latency also includes the fixed cost of system call and initialization overhead. 】

Channel throughput ( p)
Channel throughput (p ) or bandwidth is the rate at which the network can deliver data (usually in Mbits per second). It IS widely used among the vendors because of its simplicity. We use this metrics when we compare the performance of different message sizes. The throughput can be directly computed from the communication latency by

【The channel throughput (p ) or bandwidth is the rate at which the network can deliver data (usually in Mbits per second). It is widely used among the vendors because of its simplicity. We use this metric when we compare the performance of different message sizes. The throughput can be directly computed from the communication latency by】


/####### formula 2 is omitted here ########### (2)
【identical formula 】
if we substitute t with Equation (1 ), the throughput becomes
【if we substitute t with Equation (1 ), the throughput becomes 】

/####### formula 2 is omitted here ########### (3)
【identical formula 】


So the peak throughput will be limited to (10^-6)/t_t when the message size is infinite.
【Thus the peak throughput will be limited to (10^-6)/t_t when the message size is infinite.】

Furthermore, the maximum throughput that can be achieved is defined as the sustained throughput.
By sending messages as fast as possible, such as in the buffered mode, we can compute the sustained throughput form Equation (2).
【We further define the sustained throughput as the maximum throughput that can be achieved. By injecting messages to the communication channel as fast as possible, such as repeatedly sending messages in the buffered mode. we can compute the sustained throughput from Equation (2 ).】

2.3 Communication Parameters
【3.3 Communication Parameters】

There may be relationship between some communication parameters and communication performance. The communication performance can be greatly improved when appropriate values are set for the parameters. In our benchmarks, we focus on two parameters: message size and the buffer size.
【Some communication parameters may have dramatic impact on the communication performance. The
communication performance can be greatly improved when appropriate values are used for the parameters.
In our benchmarks, we study two major parameters: message size and the buffer size.】

3 Experiments
3.1 Testing Environment
In our study, we perform our experiments on workstation clusters of IBM SP2, which consists of 28 RS/6000 nodes interconnected via network and high performance switches, including 4 broad-nodes with 512 Mbytes of main memory and 24
narrow-nodes with 256M Bytes of main memory. The parallel programs are performed on the narrow-nodes. The Operation System is AIX 4.1.5.


3.2 Experiments Results
In this section, we mainly present the results from our experiments and then analysis these results. Each data-point in our results is average of 10 testing data. More exact to gain, the maximum length of messages is 10 Kbytes.

Figure 3. Sending Latency (short messages)
4 Conclusion
【7 Conclusion】

In this paper, we discuss the performance of some MPI implementations publicly available on workstation clusters. From the analysis, we can figure out that the software overhead is very high and it plays very important pole on improving the overall network throughput.

【In this paper, we discuss the evaluation of some MPI implementations which are currently publicly available on workstation clusters. Our results indicate that the software overhead is very high and has to be greatly reduced in order to fully exploit the bandwidth of the high-speed switch. 】

Among all these implementations, we suggest you selecting the buffered mode as the best communication mode on IBM SP2 machine, because this mode can efficiently employ the bandwidth of high performance switches, so as to improve the overall programs communication throughput. Certainly, choosing this communication mode, you should be carefully because this mode requests a lots of memory.

If the communication is between end-to-end, we can replace the standard functions with sendrecv function, so as to simplify the program and to prevent communication from deadlock.

Since the space of this paper is limited, we cannot discuss the situation which performing end-to-end communication with non-blocking communication function rather than with blocking functions. Moreover, some other communication modes MPI provided, such as non-contiguous datatype and pack/unpack, will be discussed later.

【Because of time limitation, we could not conduct an extensive set of experiments on different distribution of non-contiguous datatypes. But our initial results based on simple vector datatype show that the cost of sending non-contiguous datatype is not much higher than sending contiguous datatype of the same size. Further investigation on the impact of the noncontiguous datatype is needed. We are also investigating the performance of other collective communication services. .】


References
1. M. P . I. Forum. MPI: A Message-Passing Interface Standard. Mar. 1994
2. B. Gropp. R. Lusk. T. Skjellum. and N. Doss. Portable MPI Model Implementation .
Argonne National Laboratory, July 1994
3 , H I Nupairoj and L a N i p " Performance evaluation of some MPI implementations I “Tech. Rep I MSUCPS-ACS-94. Department of Computer Science . Michigan State University, Sept I 1994





所有跟贴:


加跟贴

笔名: 密码(可选项): 注册笔名请按这里

标题:

内容(可选项):

URL(可选项):
URL标题(可选项):
图像(可选项):


所有跟贴·加跟贴·新语丝读书论坛http://www.xys.org/cgi-bin/mainpage.pl