Dealing with multiple CSV file imports for parameters, how?

I am defining several dozens of parameters in GAMS Studio up to four dimensions. Let’s use three of them as examples.
abcst(abx,ghg) description1
ablim(ghg,abx,tp,rg) description2
abmlt(tp) description3

For each parameter, I have an Excel table (named the same as the parameters) with the following headings: Dim1, Dim2, Dim3, Dim4, Value
So there would be a table called abcst with the first rows:
|Dim1|Dim2|Dim3|Dim4|Value|
|ab1|co2|||10|
|ab1|ch4|||0.07|
|ab1|n2o|||0.81|

Then I created a macro to export each of the tables to CSV into a folder: /data/csvfolder
The macro also creates a file with the names of the tables/parameters called parameterlist.txt saved under /data/parameterlist.txt

I want to create a code in GAMS that reads each of the CSV files in the folder, extracts the info and stores it in a GDX file. The code I have now is this, but it is not working.


* Set up paths to load parameters
$set csvDirectory "Data\Parameters"
$set paramListPath "Data\parameterlist.txt"

* Check if the parameterlist.txt file exists
$if not exist %paramListPath% $abort "parameterlist.txt not found!"

* Define a set to hold the parameter names from parameterlist.txt
SET paramList /read %paramListPath%/;

* Loop over each parameter in the list and load its data
$onEcho > load_parameters.gms
loop(paramList(param),
    $setLocal paramName param.tl
    $setLocal csvFile "%csvDirectory%\%paramName%.csv"
    $if not exist %csvFile% $abort "%paramName%.csv not found!"

* Use ondelim to import CSV data into a table
    table tempData(*,*,*,*)  Temporary table to hold CSV data
    $ondelim
    $include %csvFile%
    $offdelim

* Transfer data from table to parameter based on the correct dimensionality
    if (card(tempData) > 0,
        if (card(Dim4) > 0,
            %paramName%(Dim1, Dim2, Dim3, Dim4) = tempData(Dim1, Dim2, Dim3, Dim4);
        else if (card(Dim3) > 0,
            %paramName%(Dim1, Dim2, Dim3) = tempData(Dim1, Dim2, Dim3);
        else if (card(Dim2) > 0,
            %paramName%(Dim1, Dim2) = tempData(Dim1, Dim2);
        else if (card(Dim1) > 0,
            %paramName%(Dim1) = tempData(Dim1);
        );
    );
);
$offEcho

* Include the generated GAMS script to load parameters
$include load_parameters.gms

* Save all parameters into a GDX file
execute_unload "referenceParamComplete_test.gdx";

When it comes to importing data there is no right or wrong. Do it in a way that works best for you. If the data is in a single Excel file and the parameter names are the sheet names with varying domain I would probably use GAMS/Connect’s ExcelReader. I would also arrange the data in Excel in a natural way without the artificial empty columns. You can still read this in a generic way.

I have prepared some examples that work with the just (beta) released GAMS 48. Here we read directly from Excel in the fixed 4 column way (data.xlsx) or the more natural way (data2.xlsx):

image

image

In GAMS you declare the symbols (and perhaps leave the @Excel in the explanatory text to indicate that the parameter comes from Excel):

set i / i1*i3 /;
set j / j1*j3 /;
set k / k1*k3 /;
set l / l1*l3 /;

parameter p1(i) "This is p1 (@Excel)";
parameter p2(i,j) "This is p2";
parameter p3(i,j,k) "This is p3 (@Excel)";
parameter p4(i,j,k,l) "This is p4 (@Excel)";

Next, you instruct GAMS/Connect to read directly from Excel via the ExcelReader agent. This can be done explicitly for the fixed 4-dim data.xlsx:

$onEmbeddedCode Connect:
- ExcelReader:
    file: data.xlsx
    columnDimension: 0
    symbols:
      - {name: p1, rowDimension: 1, ignoreColumns: [2, 3, 4]}
      - {name: p3, rowDimension: 3, ignoreColumns: [4]}
      - {name: p4, rowDimension: 4}
- GDXWriter:
    file: out1.gdx
#- GAMSWriter:
#    symbols: all
$offEmbeddedCode

and the more natural data2.xlsx:

$onEmbeddedCode Connect:
- ExcelReader:
    file: data2.xlsx
    columnDimension: 0
    symbols:
      - {name: p1, rowDimension: 1}
      - {name: p3, rowDimension: 3}
      - {name: p4, rowDimension: 4}
- GDXWriter:
    file: out2.gdx
#- GAMSWriter:
#    symbols: all
$offEmbeddedCode

If you comment the GDXWriter and uncomment the GAMSWriter, the data for p1, p3, and p4 is loaded directly into GAMS.

If the list of symbols is long you need to list each symbols twice (the declaration and the import instruction). Connect can generate the instruction list for the ExcelReader and get the dimension information directly from GAMS. Moreover, the marker @Excel provides the list of parameters that should be retrieved from Excel. The code is a little more complicated (and both data.xlsx and data2.xlsx are here in a commented form) but I guess you get the gist:

$onEmbeddedCode Connect:
- PythonCode:
    code: |
      # symbols = ['p1','p3','p4']
      symbols = [ sym.name for sym in gams.db if "@Excel" in sym.text] # get list from @Excel marker in symbol text
      instructions.append(
        {
          "ExcelReader": {
            "file": "data2.xlsx",
            # "file": "data.xlsx",
            "columnDimension": 0,
            "symbols": [
              {
                "name": sym,
                "rowDimension": gams.db[sym].dimension,
                # "ignoreColumns": list(range(gams.db[sym].dimension+1, 5))
              }
              for sym in symbols
            ]
          }
        }
      )
- GDXWriter:
    file: out3.gdx
#- GAMSWriter:
#    symbols: all
$offEmbeddedCode

I have attached both Excel files and the entire GAMS script (that at the end compares the data for the various methods).

Good luck,
-Michael

read_multiple_excel.gms (1.9 KB)
data.xlsx (13.6 KB)
data2.xlsx (13.6 KB)

Thanks a lot for the well-thought response!

I haven’t seen anywhere this kind of connectivity commands, or embedding Python code. Very useful, I will give it a try.

I made it work now by declaring the parameters in GAMS and then having a macro in Excel to export each parameter table to CSV. Then GAMS connects to each CSV but I have to declare each CSVs dimensions, then load them individually as GDX files and finally unload them.

Quite messy.

GAMS/Connect has been around for some time, but with GAMS 48 it is out of beta and ready for prime time.

Let us know if you run into any problems with Connect.

-Michael

Hi Michael,

I came back to make my code easier implementing this Connect functionality, but I am struggling a bit. I am following your advice of using the “more natural way” method.

My Excel file is for instance

  • 1 file for sets: Set.xlsm (27.2 KB)
  • multiple tabs, each tab for a type of sets (time, economic, energy, etc.)
  • Multiple Excel tables (not range) in each tab. So in time: yr, t, pp, t_10, t_20, etc. All of which are tables named exactly the same as the actual set

I am using something like this below but getting issues saying “yr” range doesn’t exist.

*  This section of the code contains the sets definition
* -------------------------------------------------------------------------

**** Period sets **** from Set.xlsm
Sets
yr           Years
t(yr)       Time periods (starting year)
t_5(t)      time periods from 2005
t_10(t)     time periods from 2010
t_15(t)     time periods from 2015
t_20(t)     time periods from 2020
t_30(t)     time periods from 2030
t_40(t)     time periods from 2040
t_50(t)     time periods from 2050
t_60(t)     time periods from 2060
t_70(t)     time periods from 2070
t_100(t)    time periods from 2100
t_110(t)    time periods from 2110
t_120(t)    time periods from 2120
lk(t)       Time periods for which activities are locked (starting year)
tb(t)       Base years
pp(t)       Projection periods (all periods but the base period)
tearly(t)   Early years
tbound(t)   time bound
tfirst(t)   First time period
tlast(t)    Last time period
ylast(yr)   Last year
;


$onEmbeddedCode Connect:
- ExcelReader:
    file: Data/Input/Set.xlsm
    columnDimension: 0
    symbols:
      - {name: yr, range: yr, rowDimension: 1}
- GDXWriter:
    file: Data/Input/set.gdx
#- GAMSWriter:
#    symbols: all
$offEmbeddedCode
`type or paste code here`

I have tried playing around with different options, but the best I can do is to use the range of the table, but it returns empty.

Any way to make connect reference the Excel tables directly without needing to specify spreadsheet names or cell ranges? It seems to me that if not, this would be a great addition to Connect.

Btw, the same applies to my parameters.

I don’t know how this workbook was created, but the trouble is that the package openpyxl does not recognize the named ranges that are contained in the workbook. I can see that Excel recognizes these, but openpyxl does not:

>>> from openpyxl import load_workbook
>>> wb = load_workbook(filename = 'Set.xlsm')
>>> print(wb.defined_names)
{}

If I add a new sheet (Sheet1) to your workbook and use the Excel Name Manager to create a named range called xyz:

and I redo the loading of the spreadsheet with openpyxl and print the named ranges I get:

>>> wb = load_workbook(filename = 'Set.xlsm')
>>> wb.defined_names
{'xyz': <openpyxl.workbook.defined_name.DefinedName object>
Parameters:
name='xyz', comment=None, customMenu=None, description=None, help=None, statusBar=None, localSheetId=None, hidden=None, function=None, vbProcedure=None, xlm=None, functionGroupId=None, shortcutKey=None, publishToServer=None, workbookParameter=None, attr_text='Sheet1!$A$2:$A$14'}

So the trouble is with the openpyxl module and the way you have created the named ranges.

You can experiment with that even if you are not a skilled Python user. GAMS comes with a Python interpreter in the GMSPython directory. From a shell prompt start the program “python” that resides in the GMSPython directory and copy and paste the lines from above and see if you get openpyxl to recognize your named ranges.

Good luck,
-Michael

Ok thanks Michael,

I will look into that.

For reference: the tables, they are just typical Excel tables. I like using these for data since they are recognised names in Excel and hold data dynamically, so there is no need to play with ranges and we can just reference data in the table directly. This is especially useful with PowerBI or PowerPivot when using calculated measures.

Since most of the data comes from an Excel data model that can easily update these tables dynamically, I was thinking that the best solution would be to use these tables as the inputs for GAMS.