Parallel programming models

送交者: AA 于 2005-9-28, 09:27:47:

Recently for solely personaly interests, I am thinking to introduce some new programming model to Java in order to better support parallel computer (or multicore processor, or CMP with SMT). After some studies, I found there is no universal solution for all parallel applications. While this finding is nothing new, but my understanding is, all the different parallel programming models can be transformed back and forth into each other.

Traditionally there are two main programming models for parallel applications, one is threading (e.g. pthread, Java monitor), the other is message passing (e.g. MPI, PVM). Both are very primitive. Pthread just exposes the lowest level of execution unit (thread of control) to programmer with some well-defined APIs. MPI is the same thing except that it has to deal with data communications between threads over network.

DSM (Distributed Shared Memory, or called SVM when it's invented) tries to hide the message passing from programmer by providing a layer of runtime that automatically retrieves data remotely. DSM itself is not a programming model, it's still threading model.

A new model that wants to simplify parallel programming is data parallel support in language (e.g., OpenMP, HPF). It doesn't require programmer explicitly create threads;instead, one just needs to tell where the data parallelisms are. It has a problem that data parallelisms are not always available or expressible.

Pthread model is usually considered to expose task parallelisms, but actually not, because programmer need extra efforts to express the task parallelisms correctly. The main effort is locking for critical sections. This is too primitive to have all the synchronizations done by programmer.

One significant improvement over thread is CSP (communicating sequential processes). CSP doesn't see the parallelisms from control flow point of view; it sees it from data flow point of view basically. It expresses data communications between data processing units explicitly. Here communications are replacing the tricky thread synchronizations, so the CSP model is very well backed by mathematical theory.

But CSP is not much easier either for parallel programming. One reason is it tries to cover all of the parallelism forms: task parallelism, data parallelism, pipeline parallelism. One simplified form of CSP is Process Network (PN), which tries to cover only pipeline parallelism. That is, processes communicating through channels that mostly have only single reader and single writer.

The idea of PN is like a processor pipeline that partitions a task into substages; when data items come into the task successively, the substages can process different data items in parallel. This model is pretty common actually. For example, in an application server, requests from clients come in for transation processing. The processing can be partitioned into pipeline stages, so that multiple transactions can be processed in parallel.

Pthread model is very well supported by current multicore processor, because thread of control (or trace) is also the basic execution unit of the processor hardware. PN model is hard to run fast on these processors, because it usually involves large volume of data movement and can't capture the locality properties of applications. Now my question is, is it possible to transform a PN modeled program into a Pthread modeled program (automatically)? In this way, one can program parallel applications easily in PN while still achieve the similar performance as Pthread. (btw, Pthread is not necessarily always outperfoming PN, just in common cases.) I realized the answer is yes though I haven't proved it. I will do it when time permits, or my friends or anybody who is interested in it will prove it...

所有跟贴:

I think the solution is in hardware. - 筋斗云 (536 bytes) 2005-9-30, 00:12:33
- Maybe you are right. - AA (527 bytes) 2005-9-30, 01:55:52
Does the sync cost lots of resource? - xj (407 bytes) 2005-9-28, 10:22:05
- Yes, but that's not the concern of programming model - AA (3662 bytes) 2005-9-28, 23:18:07
  - CORBA is more close to concurrent programming than - steven (305 bytes) 2005-9-29, 12:04:51
    - hehe, the claim is for parallel programming, you'd be dataflow oriented - AA (3521 bytes) 2005-9-29, 23:15:19

加跟贴

所有跟贴·加跟贴·新语丝科技论坛