Help Needed: Understanding Indexing in GAMS for Large Data Sets

Hi all,

I am developing a GAMS model with a big dataset, and I am having issues with efficient indexing. In particular, I have several sets and parameters, and I want to make sure that my model executes efficiently without any redundant calculations.

Below is a simplified illustration of my problem:

GAMS
Copy
Edit
Sets
i /11000/,
j /1
500/;

Parameters
A(i, j), B(i, j);

A(i, j) = uniform(0, 1);
B(i, j) = A(i, j) * 2;
When working with larger datasets, computation becomes very slow. I believe my method of indexing may not be efficient. Are there any standard methods of working with large-scale data in a more efficient manner within GAMS? Should dynamic sets be used or another method of parameter definition to make computation faster? And are there any Power BI tutorials that help visualize GAMS output effectively?

Any tips or references would be more than welcome!

Thanks in advance.
Regards
michael

Hi Michael,

I’m going to modify your example just a bit for clarity… say we have the example:

set i / i1*i1000 /; 
set j / j1*j1500 /;

parameter A(i,j);
parameter B(i,j);

A(i,j) = uniform(0,1);
B(i,j) = A(i,j) * 2;

This actually sets A and B in a dense way (so there are 1000 * 1500 = 1.5M records) for each of these parameters. This little program finishes (on my machine) in much less than 1 sec.

If you actually have dense data, then this operation is about as efficient as it will get. However, if your “records” for A and B are not actually dense then you will be better served by using filtering sets on the left hand side of the equals sign to prevent GAMS from walking a dense data structure during assignment.

Perhaps you can share a bit more about your use case? Can you provide an example of when you see performance slow down?

best,
adam

For curiosity, I did some tests with different sizes and plotted the time vs. size, see my results below:

Indeed, it looks like the GAMS execution time is superlinear (seemingly with a gradually increasing exponent, here getting to about O(N1.4)). I was glad to see that GAMS outperformed Python with sizes below 150 Million records (comparing to a straightforward Python code doing similar assignments), but the Python execution times did remain almost linear. This test thus did seem to confirm the GAMS execution becoming slower with larger dense parameters. Why GAMS cannot maintain more closely linear performance here is not quite clear to me, but I guess it may be just because of GAMS being optimized for sparse data, which is what modellers usually need most.