INHIBITORS OF VECTORIZATIO N
dependency
ambiguous subscript
Some IF statements
READ or WRITE statement
no vector array reference on the left-hand of
side of an equal sign
non constant stride through memory
subroutine call
function subprogram reference
PROCESSING ORDER
Example: D O 10 I = 1,3
A (k =B (D + C (I)
D (0 = E (0 + F (I)
10 C O N. TrN LnE
S. C A L A R
A (1) = B (1) + C (1)
D (1) = E (1) + F (1)
A (2) = B (2) + C (2)
D (2) = E (2) + F (2)
A (3) = B (3) + C (3)
D (3) = E (3) + F (3)
VECTOR
A (1) = B (1) + C (1)
A (2) = B (2) + C (2)
A (3) = B (3) + C (3)
D (1) = E (1) + F (1)
D (2) = E (2) + F (2)
D (3) = E (3) + F (3)
Note: In this example the results are the sarne.
At the end of the loop the contents of
A and D are the same whether you used
scalar or vector processing.
EXAMPLE OF A DEPENDENCY
This type of dependency is named "recursion'
DIMENSION A(5)
DATA (A(I), I=1,3)/1.,2.,3./,X/6./
DO 10 I=1,3
10 A (I+2)=A(I)+X
S C A L A R
Processing Order
A(3)=A(1)+X
A (4)--A(2) +X
A(5)=A(3)+X
Values
7=1+6
8=2+6
13=7+6
uses new value that
has been redefined in
a previous pass
..
V E C T O R
Processing Order
A(3)=A(l)+X
A(4)=A(2)+X
A(5)=A(3)+X
uses preloaded
old value
Values
7=1+6
8=2+6
9=3+6
Example: DO 10 I- 1,1 )0
A(t)=B(I)/3.1
10 C(I)=D(I)**2
SCALAR
A(1)=B(1)/3.1
C(1)=D(1)**2
A(2)=B(2)/3.1
C(2)=D(2)**2
A(99)=B(99)/3.1
C(99)=D(99)**2
A(100)=B(100)/3.1
C(100)=D(100)**2
VECTOR
A(1)=B(1)/3.1
A(36)=B(36)/3.1
C(1)=D(1)**2
C(36)=D(36)**2
A(37)=B(37)/3.1
A(100)=B(100)/3.1
C(37)=D(37)**2
C(100)=D(100)**2
Reduction of Instruction Count
Through Vectorization
Example: DO 10 I=: ,10
10 A(I)=5.*B(I)+C
First process this in scalar mode and calculate
the number of instIuctions. The Sn refers to
scalar registers.
Instruction#
3
4
6
7
8
70
nstruction
IC=1
S1=5
S2=C
S3=B(1)
S4=S3*S 1
S5=S4+S2
A(1)=S5
IC=2
A(10)=S5
Description
Loop control variable
Load constant 5
Load C
Load B(I)
5.*B(I)
5.*B(I)+C
Store A(I)
Increment loop counter
Repeat Example: Do 10 I=1,10
l 0 A(I)=5.*B(I)+
A(I) =S.*B (I)~C
Now process this loop in vector mode and calculate
the number of instructions. The Sn refer to scalar
registers, and the Vn refer to vector registers.
Instruction #
Instiuction
S1=5
S2=C
VL=10
V0=B
Vl=Sl*V0
V2= V1+S2
A=V2
Description
Load constant 5
Load C
Set vector length
Load array B
Multiply by 5
Add C
Store array A
We have reduced the instruction count from 70 to 7.
There's a ten to one difference by performing the
computations in vector mode. Notice that vector
processing reduced the overhead associated with
incrementing and checking the count of the loop
control variable.
FLOWTRACE
Use to gather statistics about your program.
You enable FLOWTRACE with CFT ON=F
FLOWTRACE gives the following information:
time spent in each subroutine
% of total time spent in each subroutine
number of times the subroutine was called
average time per call spent in the subroutine
USE THE BUILT-IN FUNCTIONS
The built-in functions have optimized code.
This statement is slower:
Y= A**0.S
This statement is faster:
Y = SQRT(A)
MULTIPLY INSTEAD OF DIVIDE
Since division is performed by taking a reciprocal and
then multiplying, substituting multiplications for
divisions where possible will improve performance.
This loop is slower:
DO 100 I= 1,N
100 A(I) = B(I)/4.0
This loop is faster:
DO 100 I = 1,N
100 A(I) = B(I)*0.25
DATA INITIALIZATION
The DATA statement is a faster way to initialize
data than a DO loop.
This is less efficient:
REAL A(100)
DO 10 I = 1,100
10 A(I) = 1.0
This is more efficient:
REAL A(100)
DATA A/100*1.0/