Performance Programming Overheads




INHIBITORS OF VECTORIZATIO N



      dependency

      
       ambiguous subscript

       
       Some IF statements

       
       READ or WRITE statement

       
      no vector array reference on the left-hand of

          side of an equal sign

          
       non constant stride through memory

       
       subroutine call

       
       function subprogram reference

       









          PROCESSING ORDER

          
   Example:    D O 10 I = 1,3
               A (k =B (D + C (I)
               D (0 = E (0 + F (I)
               10 C O N. TrN LnE
   
S. C A L A R

A (1) = B (1) + C (1)
D (1) = E (1) + F (1)
A (2) = B (2) + C (2)
D (2) = E (2) + F (2)
A (3) = B (3) + C (3)
D (3) = E (3) + F (3)
                                VECTOR 
                                
                                 A (1) = B (1) + C (1)
                                 A (2) = B (2) + C (2)
                                 A (3) = B (3) + C (3)
                                 D (1) = E (1) + F (1)
                                 D (2) = E (2) + F (2)
                                 D (3) = E (3) + F (3)
                                 
Note: In this example the results are the sarne.
        At the end of the loop the contents of
        A and D are the same whether you used
        scalar or vector processing.
        








      EXAMPLE OF A DEPENDENCY

      

         This type of dependency is named "recursion' 
         

          DIMENSION A(5)
           DATA (A(I), I=1,3)/1.,2.,3./,X/6./
           DO 10 I=1,3
          
       10 A (I+2)=A(I)+X
       


       
     S C A L A R
     
Processing Order

A(3)=A(1)+X

A (4)--A(2) +X

A(5)=A(3)+X

                  Values

                  7=1+6

                  8=2+6

                  13=7+6

                  
uses new value that
has been redefined in
a previous pass
                                                       ..
                                                       


                                                       


                                                       


                                                       
                                  V E C  T O R 
                                                   
                                  Processing Order
                                  
                                  A(3)=A(l)+X
                                  A(4)=A(2)+X
                                  A(5)=A(3)+X
                                  
                                       uses preloaded
                                       old value
                                                        Values
                                                        
                                                        7=1+6
                                                        8=2+6
                                                        9=3+6
                                                        








      Example: DO 10 I- 1,1 )0
                  A(t)=B(I)/3.1
                  10 C(I)=D(I)**2
      
    SCALAR
    
A(1)=B(1)/3.1
C(1)=D(1)**2
A(2)=B(2)/3.1
C(2)=D(2)**2


A(99)=B(99)/3.1
C(99)=D(99)**2
A(100)=B(100)/3.1
C(100)=D(100)**2
                                         VECTOR
                                         
                                    A(1)=B(1)/3.1
                                    

                                    A(36)=B(36)/3.1
                                    C(1)=D(1)**2
                                    

                                    C(36)=D(36)**2
                                    A(37)=B(37)/3.1
                                    

                                    A(100)=B(100)/3.1
                                    C(37)=D(37)**2
                                    

                                    C(100)=D(100)**2
                                    






          Reduction of Instruction Count

               Through Vectorization

          

         Example: DO 10 I=: ,10
                 10 A(I)=5.*B(I)+C
         
   First process this in scalar mode and calculate
   the number of instIuctions. The Sn refers to
   scalar registers.
   
Instruction#

  3

  4

  6

  7

  8



 70
                    nstruction
                    
                   IC=1
                   S1=5
                   S2=C
                   S3=B(1)
                   S4=S3*S 1
                   S5=S4+S2
                   A(1)=S5
                   IC=2
                   

                    A(10)=S5
                                        Description

                                        Loop control variable

                                        Load constant 5

                                        Load C

                                        Load B(I)

                                        5.*B(I)

                                        5.*B(I)+C

                                        Store A(I)

                                        Increment loop counter

                                        






Repeat Example: Do 10 I=1,10

                 l 0 A(I)=5.*B(I)+

                      A(I) =S.*B (I)~C
                      
Now process this loop in vector mode and calculate
the number of instructions. The Sn refer to scalar
registers, and the Vn refer to vector registers.


Instruction #
                   Instiuction
                   
                     S1=5
                     S2=C
                     VL=10
                     V0=B
                     Vl=Sl*V0
                     V2= V1+S2
                     A=V2
                                       Description
                                       
                                       Load constant 5
                                       Load C
                                       Set vector length
                                       Load array B
                                       Multiply by 5
                                       Add C
                                       Store array A
                                       
We have reduced the instruction count from 70 to 7.
There's a ten to one difference by performing the
computations in vector mode. Notice that vector
processing reduced the overhead associated with
incrementing and checking the count of the loop
control variable.







              FLOWTRACE

        


        
Use to gather statistics about your program.

You enable FLOWTRACE with CFT ON=F

FLOWTRACE gives the following information:

    time spent in each subroutine
   
    % of total time spent in each subroutine
   
    number of times the subroutine was called
   
    average time per call spent in the subroutine

   






  USE THE BUILT-IN FUNCTIONS

  


  
The built-in functions have optimized code.


  This statement is slower:

         Y= A**0.S
        

This statement is faster:

        Y = SQRT(A)
        










   MULTIPLY INSTEAD OF DIVIDE

   
Since division is performed by taking a reciprocal and
then multiplying, substituting multiplications for
divisions where possible will improve performance.


This loop is slower:

      DO 100 I= 1,N
100 A(I) = B(I)/4.0




This loop is faster:

      DO 100 I = 1,N
100 A(I) = B(I)*0.25











         DATA INITIALIZATION

         


         
The DATA statement is a faster way to initialize
data than a DO loop.


This is less efficient:

       REAL A(100)
     
      DO 10 I = 1,100
10 A(I) = 1.0




This is more efficient:

       REAL A(100)
     
     DATA A/100*1.0/