# PRINCE OF SONGKLA UNIVERSITY FACULTY OF ENGINEERING

Midterm Examination: Semester 2 Date: 29 February 2016 Subject Number: 242-441 Subject Title: Advanced Computer Architecture and Organization

Academic Year: 2015

**Room**: S817

**Time**: 9.00 – 12.00 (3 hours)

Exam Duration: 3 hours

This paper has 13 pages, 4 questions and 150 marks (25%).

## Authorised Materials:

- Writing instruments (e.g. pens, pencils).
- Textbooks, a notebook, handouts, and dictionaries are permitted.

## **Instructions to Students:**

- Scan all the questions before answering so that you can manage your time better.
- Answers **must** be written in **Thai**.
- Write your name and ID on every page.
- Any unreadable parts will be considered wrong.

When drawing diagrams or coding, use good layout, and short comments; marks will not be deducted for minor syntax errors.

## Cheating in this examination

Failed in this subject and courses dropped for next Lowest punishment: semester.

Expelled. Highest punishment:

| NO    | Time (Min) | Marks | Collected | NO | Time (Min) | Marks | Collected |
|-------|------------|-------|-----------|----|------------|-------|-----------|
| 1     | 20         | 15    |           | 3  | 50         | 42    |           |
| 2     | 70         | 60    |           | 4  | 35         | 33    |           |
| Total | 175        | 150   |           |    | 25%        |       | 1         |

**Question 1 Fundamentals of Quantitative Design and Analysis** (15 marks; 20 minutes)

Explain the following issues.

a) Explain the differences how an 8-bit processor and a 16-bit processor add two 16-bit integers.
 (3 marks)

b) Compare *Data Parallelism* and *Task Parallelism*. (2 marks)

c) A server failed after 52 weeks due to hard disk failure and assuming that it takes an hour to remove and replace the hard disk. Find Mean time to failure, Mean time to repair (MTTF), Mean time to repair (MTTR), and Mean time between failures (MTBF).

d) Give examples of the *Principle of Locality* and *Principle of Parallelism* (2 marks)

e) If the new processor is 8 times faster than the original process, and we assume that the original processor is busy with computation 50% of the time and is waiting for I/O 50% of the time, what is the *overall speedup* gained by incorporating the enhancement? Also, what is the *upper bound* of the overall speedup? (5 marks)

Name\_\_\_\_\_

2

#### **Question 2 Memory Management**

#### (60 marks; 70 minutes)

Answer the following questions.

hame

a) Use the following picture to explain *Hit Time* and *Miss Penalty*. (5 marks)



------

d) What are Causes of misses? Also, explain how they happen. (3 marks) e) Giving Reads Priority over Writes on Misses means serving reads before writes have been completed. Compare how to implement it when using Write-Through and Write-Back strategies in cache. (2 marks) f) What are Three Advantages of Virtual Memory? (3 marks) g) What are the differences between Hypervisor Type I and II? (2 marks) h) According to 2:1 Cache Rule, the miss rate of a direct mapped cache of 64 Kbytes equals to the miss rate of 2-way set associative cache of what size? (2 marks) i) From the following code fragment, identify the problem and suggest a suitable solution. (2 marks) for (i=5; i>0; i--)for (j=50000; j>0; j--)a[i][j] = f(a[i][j]): Name\_\_\_\_ 

j) From the following code fragment, identify the problem and suggest a suitable solution. (4 marks)
for (i=0; i< 500000; i++)</li>

for (j =0; j < 500000; j++) { total = 0; for (k=0; k<500000; k++) total = total+ F[k][j]/G[i][k]; H[i][j] = total;

ļ

k) Give an example of Loop Fusion in code fragment. (2 marks)

1) How to improve Cache Performance?

Name\_\_\_\_

(3 marks)

-----

| m)     | m) How can larger block size help reduce cache misses? (3 marks)               |  |  |  |  |
|--------|--------------------------------------------------------------------------------|--|--|--|--|
| n)     | Why increasing block size does not help when the cache size is larger? (1 mark |  |  |  |  |
| o)<br> | What are 4 disadvantages of higher associativity? (4 mark)                     |  |  |  |  |
| p)     | What are requirements of a Virtual Machine Monitor? (5 marks)                  |  |  |  |  |
|        |                                                                                |  |  |  |  |
| q)     | How does Table Look-Aside Buffer work with Page Table Entry? (5 marks)         |  |  |  |  |
|        |                                                                                |  |  |  |  |
|        |                                                                                |  |  |  |  |
|        |                                                                                |  |  |  |  |
|        |                                                                                |  |  |  |  |

D

Name\_

## **Question 3 Instruction-level Parallelism**

(42 marks; 50 minutes)

Compare the following items.

a) Both Complex Instruction Set Computer (CISC) and Reduced Instruction Set Computer (RISC) try to enable new High Level Languages programs to be compiled and executed efficiently. Explain their different approaches to do so. (2 marks)

b) What are differences among Structural, Data and Control Hazards? (3 marks)

c) Compare Branch History Table (BHT), 2-bit Scheme Dynamic Branch Prediction, and Correlated Predictor. (3 marks)

d) Demonstrate how we can improve the performance of the following program. (2 marks)

| • • • • • • • • •          |    |  |  |  |
|----------------------------|----|--|--|--|
| for (i=500000; i>0; i=i-1) |    |  |  |  |
|                            |    |  |  |  |
| a[i] = b[i] / c[i];        |    |  |  |  |
|                            |    |  |  |  |
|                            |    |  |  |  |
|                            |    |  |  |  |
|                            |    |  |  |  |
|                            |    |  |  |  |
|                            |    |  |  |  |
|                            |    |  |  |  |
|                            |    |  |  |  |
|                            |    |  |  |  |
|                            |    |  |  |  |
|                            |    |  |  |  |
|                            |    |  |  |  |
|                            |    |  |  |  |
|                            |    |  |  |  |
| Name                       | ID |  |  |  |

- e) Give two examples each from the following code of true data dependencies, anti-dependencies, and output dependencies. (3 marks)
- S1: LD F1,0(R1)
- S2: SUBD F3,F1,F4
- S3: ADDD F5,F1,F3
- S4: LD F4,0(R2)
- S5: ADDI R2,R2,#4
- S6: MULT F5,F0,F4
- S7: LD F4,0(R2)

- f) Explain how we can improve the performance of the following program and also demonstrate the improve code. (4 marks)
- I: sub r0, r2, r4
- J: add r6, r0, r4
- K: div r0, r2, r4
- L: mul r0, r2, r4

g) Which kind of branch prediction works best for the below code? (2 marks) for (i=100000; i>0; i=i-1)

x[i] = x[i] + s;

h) Which kind of branch prediction is suitable for the below code? (2 marks) for (i=100; i>0; i--)

for (j=3: j>0: j--) a[i] = b[j] \* s:

hante\_\_\_\_\_

i) Consider the following instructions, fill in the first five cycles (20 marks)

LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1 SUBI R1 R1 #8 LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1 SUBI R1 R1 #8



# Clock Cycle 0



D

Clock Cycle 1

Name



## Clock Cycle 2







\_\_'C\_\_

Clock Cycle 4

Name\_\_\_\_



Clock Cycle 5

hame.....

#### **Question 4 Data-Level Parallelism**

#### (33 marks; 35 minutes)

Explain the following concepts in Data-Level Parallelism and also give examples.

a) Discuss and compare the following SIMD architectures: Vector Architectures. Multimedia Extensions, and Graphics Processor Units. (9 marks)

| h). Commend Wester Chaining   | (4 montra) |
|-------------------------------|------------|
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |
| b) Convoy and Vector Chaining | (4 marks)  |

:D

- c) Suppose that the total number of data is 2048 and the size of Vector Length is 64, derive the following code using Vector Length Registers. (4 marks)
- for (i=0; i< 2048; i++)

C[i] = A[i] + B[i];

d) From the following code, explain how to apply Vector Mask Register. (4 marks) for (i = 0; i <=256; i++)</li>

if (A[i] != 0)

A[i] = 8\*A[i];

e) Give an example of code that has more than 1 stride. (4 marks)

\_\_\_\_\_

Name\_\_\_\_

f) From the following scatter and gather operations, identify a related code fragment in a high-level language. (2 marks)

Use LVI/SVI: load/store vector indexed/gather

LV Vm, Rm

LVI Va, (Ra+Vm)

LV Vn, Rn

Niame\_\_\_\_

LVI Vb, (Rb+Vn)

ADDVV.D Va, Va, Vb

 g) Compare Vector Processor, SIMD Multimedia Extension and GPU according to the following list (6 marks)

| List                 | Vector<br>Processor | SIMD Multimedia<br>Extension | GPU |
|----------------------|---------------------|------------------------------|-----|
| CPU                  |                     |                              |     |
| Vector Mask Register |                     |                              |     |
| Scalar Register      |                     |                              |     |
| Vector Register      |                     |                              |     |

----End of Examination----

Pichaya Tandayya Lecturer

\_\_\_\_C\_\_\_