Parallel versus sequential assignments - or beware of "loop"-statements

Should I use indexed assignments or loop statements in GAMS?

Indexed assignments within GAMS are done using simultaneous or parallel assignments. In most cases you don’t need explicit loops, an assignment is an implicit loop. The loop statement is necessary when parallel assignments are not sufficient - an example is provided below. However, a loop statement is not executed in parallel, but sequentially. This may slow down the execution of your code tremendously. Below are some examples:

A simple example

set i / i1*i100000 /;
parameter u(i);
* bad!
loop(i, u(i) = uniform(0,2));
* good
u(i) = uniform(0,2);

If you run this fragment with the profile option set to one, you’ll get something like:

----      4 Loop                     7.235     7.235 SECS     14 MB 
----      6 Assignment u             0.000     7.235 SECS     14 MB  100000

The execution of the loop statement was significantly slower than the parallel assignment! Thus parallel assignments should be used wherever possible.

Note: That “parallel” assignment that use the same symbol on the left and on the right creates a copy of the symbol for the use on the right. Hence the updated value on the left is not used when referencing on the right as shown in the next example. If this is required a loop statement must be used. Here is an example, which illustrates this:

set j      /j1*j4/
parameters a(j);
a('j1')=1;
* wrong!
a(j)$(ord(j)>1)=a(j-1)+1;
display 'Wrong:',a;
* correct!
loop(j$(ord(j)>1), a(j)=a(j-1)+1);
display 'Correct:',a;



----      6 Wrong:
----      6 PARAMETER a  
j1 1.000,    j2 2.000,    j3 1.000,    j4 1.000

----      9 Correct:
----      9 PARAMETER a  
j1 1.000,    j2 2.000,    j3 3.000,    j4 4.000

Another example
Let’s define three sets, stores, products, and weeks. How does the following iteration work exactly?

found =0;
loop((stores, products, weeks) $ (condition1(weeks)
                              and condition2(stores)
                              and condition3(products)), found =1);

Assuming that we have 6000 stores, 20 products, and 100 weeks. And only one week, week1, satisfies condition1; only one store, store1, satisfies condition2; only one product, product1, satisfies condition3. Will the iteration go through ALL the stores, products, and weeks even though only one element satisfies all the conditions. Given the conditional control, will the dimensions of the sets have a big impact on run time?

GAMS tries to be efficient when it comes to restricting the domains it needs to search. There is some optimization in place, but with a general construct as you have it there, GAMS will go through all the tuples of the loop. You have a better chance when you do:

set w(weeks), s(stores), p(products);
w(weeks)    = condition1(weeks);
s(stores)   = condition2(stores);
p(products) = condition3(products);
found = card(w) and card(s) and card(p) // or card(w)*card(s)*card(p)

Even if GAMS can’t be smart about the individual conditions, it only walks them once card(weeks)+card(stores)+card(products) in the original loop you probably get * instead of +). If the conditions are simple and can be optimized by the GAMS compiler you get constant time. It is a little like code optimization in other language compilers (C, FORTRAN, etc).

A third example
Here we have an example, where we have to match elements from two different sets. We show different formulation, which are much faster than using a loop statement.

set k /k1*k90000/, t /t1*t10000/;

parameter a(k), b(t), bref(t);
a(k)=normal(0,1);

* expansive loop
loop(k, b(t)$(ord(t) eq ord(k)-card(t))=a(k));
bref(t) = b(t);
option clear=b;

* better, but not yet optimal
b(t) = sum(k$(ord(k)=ord(t)), a(k+card(t)));
abort$(smax(t,abs(b(t)-bref(t)))>1e-6) b,bref;
option clear=b;

* much faster using element matching introduced in distribution 227
set sk(k) 'shifted k', kt(k,t);
sk(k+card(t)) =  yes;
option kt(sk:t);
b(t)=sum(k$kt(k,t),a(k));
abort$(smax(t,abs(b(t)-bref(t)))>1e-6) b,bref;
option clear=b, clear=kt;

* this formulation is more compact
option kt(k:t);
b(t)=sum(kt(k,t), a(k+card(t)));
abort$(smax(t,abs(b(t)-bref(t)))>1e-6) b,bref;

Running the code using the GAMS profiler (profile 1) yields:

----      7 Loop                    29.218    29.234 SECS     14 MB 
----     12 Assignment b            20.188    49.422 SECS     16 MB  10000
----     18 Assignment sk            0.015    49.437 SECS     17 MB  80000
----     20 Assignment b             0.000    49.437 SECS     21 MB  10000
----     26 Assignment b             0.125    49.562 SECS     21 MB  10000

A fourth example
In this example, a large matrix with a small band of non-zeros off the diagonal is copied from the name space j,jj to a new parallel matching name space x,xx where ‘j’ and ‘x’ match perfectly. The band matrix (PDmatrix_jj) is created and the copied to the same matrix with different labels (PDmatrix_xx). This is done once with the combination of two j:x mappings (jx and jxx) in a loop statement and once with a sequence of parallel assignment utilizing a temporary 4-dimensional map symbol ‘j_jj_x_xx’:

set x / x1*x50000 /;
singleton set xone(x) / x1 /;
$eval xCard card(x)
set j /1*%xCard% /;
set jx(j,x) / #j:#x /;
alias (x,xx,xxx), (j,jj,jjj), (jx,jxx);

parameter
	LTmatrix_jj(j,jj)
	PDmatrix_jj(j,jj)
    
scalar LTwidth / 3 /;
$eval LTWIDTH LTwidth
set jsub(j) / 1*%LTWIDTH% /;

LTmatrix_jj(j,jj+(ord(j)-LTwidth))$[jsub(jj)] = normal(0,1);
PDmatrix_jj(j,jj+(ord(j)-LTwidth))$[jsub(jj)] = sum(jjj, LTmatrix_jj(j,jjj)*LTmatrix_jj(jj+(ord(j)-LTwidth),jjj));

Parameter
    PDmatrix_xx1(x,xx)
    PDmatrix_xx2(x,xx);

* Slow loop
loop((jx(j,x),jxx(jj,xx))$PDmatrix_jj(j,jj), PDmatrix_xx1(x,xx) = PDmatrix_jj(j,jj));

* Fast sequence of 
set j_jj_x_xx(j,jj,x,xx), x_xx(x,xx);
j_jj_x_xx(j,jj,x+(ord(j)-1),x+(ord(jj)-1))$xone(x) = PDmatrix_jj(j,jj);
option x_xx<j_jj_x_xx;
PDmatrix_xx2(x_xx(x,xx)) = SUM(j_jj_x_xx(j,jj,x,xx), PDmatrix_jj(j,jj));

abort$(smax(x_xx, abs(PDmatrix_xx1(x_xx)-PDmatrix_xx2(x_xx)))>1e-6) 'bad calculation';

The relevant part of the GAMS profiler report yields:

----     24 Loop                    14.625    18.657 SECS     32 MB 
----     28 Assignment j_jj_x_xx     3.250    21.907 SECS     45 MB  149997
----     29 Other                    0.000    21.907 SECS     49 MB 
----     30 Assignment PDmatrix_xx2  0.093    22.000 SECS     69 MB  149997

So the loop is more than 4 times slower than the sequence of parallel assignment statements.