This page last updated on 3/6/00

ECE 520.426

Parallel Processing Architecture

3 Credits

Spring 2000, meeting in room Barton-114, Thursday 1:30-4:00 PM

Instructor: Robert Jenkins, Senior Lecturer ECE.

Office: Barton Hall 404

Telephone: (410) 516-7380

E-Mail: robert.jenkins@jhunix.hcf.jhu.edu

Course Catalog Description

Textbooks and other class material

Course Objectives

Course Syllabus Outline

Grading

Ethical Standards

Homework and Schedule

Notes and Supplementary Material - Class handouts and other stuff. (Correction made 2/8/00)  


Description

A study of parallel hardware/software computing architectures. Topics include: performance measures; parallel computer models; scalability; fine-grained versus coarse-grained parallelism; instruction level parallelism and VLIW architectures; linear and non-linear pipelining; vector machines; structures, interconnect, and algorithms for SIMD and MIMD multiprocessor and multicomputer systems.

Prerequisites:

520.422 Computer Architecture, or 600.333-334 Computer Systems, or equivalent BG

A working knowledge of computer programming and applications.

 

Textbook Required

1. Hwang, Advanced Computer Architecture - parallelism, scalability, programmability, McGraw-Hill, 1993, ISBN 0-07-031622-8

2. Classic published papers from the literature as reading assignments, e.g.

Back to top


Course Objectives

This is a lecture course, whose overall objective is to provide students with a thorough grounding in the computing architectures and algorithms associated with parallel processing. A primary objective is an understanding of how parallelism is limited by architectural features and by computational algorithms and compiled codes. Mapping algorithms on to parallel machines receives a good deal of attention.

A second objective is to create an understanding of the tradeoffs involved in parallel computer design, along with an understanding of the methods used to evaluate performance and obtain performance data. Particular emphasis is given to modeling the operation of specific parallel machines and to the scalability of performance on a given algorithm.

A thorough treatment is made of linear and non-linear pipelining with the objective of giving the student a background in the design and application of pipelines for high throughput digital hardware, such as high speed pipelined memory, floating point arithmetic, recursive filters or graphics processing.

A final explicit goal, and not the least, is to sufficiently prepare the students in the contemporary issues in parallel computation to be able to read and understand the current literature. Achieving this goal is considered a major success for such a rapidly evolving field.

 

Back to top


Course Syllabus and Schedule

Dates

Lecture Topics

Week1 -

Overview, definition of terms, architectural classification schemes, measures of performance, space-time diagrams, multiprocessor models - e.g.PRAM, UMA, COMA, PSP model, example problem.

Week 2 -

Data and control dependencies, parallelism conditions, Bernstein conditions, grain size versus latency, grain packing, multiprocessor scheduling.

Week 3 -

Mean performance, harmonic versus arithmetic mean for MIPS rate, workload distribution and software parallelism, Amdahl's law, scalability - fixed load, fixed time, fixed memory.

Week 4 -

Introduction to pipelining, pipeline classifications - linear, non-linear, static, multifunctional, pipeline clocking and control, pipeline performance - speedup, efficiency, and throughput, computational-pipeline hazards and collisions, reservation and latency analysis of non-linear pipelines.

Week 5 -

Latency analysis review and examples, delay stage effects, memory interleaving, memory pipelining - S, C and C/S access structures and performance.

Week 6 -

Instruction pipelining, pipeline hazards and interlocks, effect of stalls on CPI and speedup, pre-fetching and data forwarding, static code reordering, delayed branches and NOP inserts.

Week 7 -

2 1/2 hour midterm exam.

Week 8 -

Extensive midterm discussion, floating point and arithmetic pipelines, pipelined adders and multipliers, CSA trees, begin discussion of instruction level parallelism (ILP)

Week 9 -

Continue ILP discussion, VLIW architectures, superscalar and superpipeline architectures, vector machines and vector processing, architectural classes and example vector machines, vector/scalar performance

Week 10 -

Vectorized software, compound vector functions and pipeline chaining, space-time diagrams of pipelined vector operations, vectorized FORTRAN-90 constructs, pipelined recursion, matrix multiply end-to-end example

Week 11 -

Introduction to SIMD machines, distributed versus shared memory, PSP model of SIMD computers, routing networks and routing functions - rings, binary tree, Illiac mesh, hypercube, and shuffle-exchange geometry's

Week 12 -

Circuit switched versus packet switched, wormhole routing, multi-stage versus single stage recirculation hardware, blocking and non-blocking multistage networks - Benes, Omega, Baseline, etc., network control algorithm

Week 13

Parallel algorithm implementations - e.g. matrix multiply, parallel sorting, FFT on a hypercube, forward substitution on a shared memory system, linear recurrence on a mesh, matrix transpose on a shuffle network

Week 14

Review, more example architectures - MPP, associative processors, 2 1/2 final exam.

Back to top


Grading

Two 2.5 hour Exams, Midterm and Final, each worth 100 points toward final grade.

Approx. 11 homework assignments, each worth 3 points if turned in. Weekly homework assignments are due the following week, and will be gone over in class. Homework will be accepted 1 week late if a student misses class for a valid reason.

There may be an in-class or take-home quiz problem that will be worth 30-40 points.

Students must do the homework assignments themselves, as the problems are important to learning the course material, and are similar to the problems on the exams.

 

Ethics

As an Engineering student at this University, you carry the obligation to uphold the highest standard of academic and professional integrity. For this class, you are expected to follow the general University guidelines regarding ethical behavior. Unless given specific instructions to the contrary, it is not permitted to collaborate with other students in the class when solving homework or graded take-home projects. It is not permitted, under any circumstances, to consult or plagiarize past homework or take-home design problems. Cheating during an exam is considered a serious violation of ethical integrity. Academic misconduct will be reported to the University's academic ethics board for further consideration. For more information, please refer to the following material:

The Johns Hopkins University Undergraduate and Graduate Programs Catalogue, 1998-1999, pp. 31, 32, 40.

The Johns Hopkins University Undergraduate Advising Manual, Fall 1998, pp. 36, 37.

http://www.jhu.edu/~wse1/student/eng101_97/geninfo.html

Students with questions and/or concerns regarding the ethics board or its policies can contact the board at ethicsbd@jhunix.hcf.jhu.edu.

Back to top


.........Homework and Reading Assignments updated 3/6/00

ACA = Advanced Computer Architecture, by Kai Hwang

Wk No.

Date

Problems

Due

Reading (Notes handed out in class are also posted below)

1

 1/27

 1-4, 1-6, 1-8

2/3

 ACA-Chap 1, Handout Notes on CPI/MIPS, Study notes on Architectures and Models

2

 2/3

 2.5, 2.6, 2-7

2/10

 ACA-Chap 2 (skip section 2.4, to be covered later). SIMD/MIMD PRAM example given in class

3

 2/10

Hand-out prob. on workload distributions, plus 2.9, 3.1, 3.3

2/17

 ACA-Chap 3.

4

 2/17

 3.7, 6.1, 6.8

2/24

Finish ACA-Chap3. Review Sect. 4.1 and 4.2. Start Chap 6 (Sect. 6.1, 6.2)

5

 2/24

 6.10, 6.18, plus

3/2

Continue ACA-Chap6. Also Sect. 5.3, on shared memory systems, and Sect. 8.1.2, on C/S access.

6

 3/2

 6.11, problems in class handout

 3/9

Section ACA-Section6.3, Class handout on Instruction Pipelining

7

 3/9

 

 

 Midterm given this week.

8

 3/16

 

 

 

9

 3/30

 

 

 

10

 4/6

 

 

 

11

 4/13

 

 

 

12

 4/20

 

 

 

13

 

 

 

 

 

 

 

 

 

Notes and Examples

1.PRAM example (Errors corrected and additions made 2/8/00)

2.Notes on Architectures and Performance Models (Additions made 2/8/00)

 

Things you should know for the Midterm

1. General classes and properties of parallel machines - Flynn's classifications and Hwang's classification of parallel systems, distributed versus shared memory, generic block diagrams of various classes.

2. Performance models(PRAM, PSP, UMA, COMA, etc), how to apply them to specific cases.

3. Conditions for parallelism - how to apply them, what limits parallelism, SW versus HW parallelism.

4. Parallelism profile and workload distribution - how to generate them, how to use them, space-time diagrams.

5. Grain size and grain packing, how granularity affects latency, what contributes to latency.

6. Harmonic vs. Arithmetic mean performance, performance measures-CPI, MIPS rate, Throughput, Asymptotic and N-node Speedup, Efficiency, Redundancy, Memory Bandwidth, Parallelism overhead.

7. Scalability of parallel algorithms and machines, Amdahl's law, Fixed load versus fixed time versus memory bounded speedup.

8. General principles of linear and non-linear pipelining, pipeline hazards, reservation and latency analysis, delay stages for improved performance.

9. Interleaved memory systems and pipelined memory access, C-access versus S-access versus CS-access, how to estimate pipelined memory bandwidth for contiguous and random address streams.

10. Instruction execution pipelining, pipeline hazards and their effect on CPI and speedup, hardware and compiler strategies for stall avoidance.

11. All examples illustrated by the homework.

 

Back to top