Hi again, GAMS fanatics.
I posted a while ago about data input. I managed to solve some of my issues and I have a working process. First, I want to share it here with you. Second, I would like your advice to make it faster.
Context
I run a work-in-progress (and therefore changing all the time) model that requires multiple sets and parameters, sometimes with thousands of data each. I wanted a system that would allow me to add/remove/rename sets and parameters in the model, and then easily go back to the data, then re-import.
I checked GAMS Connect, but it seems to require constant keep-up and attention to naming declarations, locations, sizes of sets/parameters. I thus looked for a way to import any set/parameter, regardless of the size.
Data structure
- I download the original datasets from the sources.
- I connect all the datasets to a master data model in Excel, through Power Query and Power Pivot
- I recompose the data in the way I need for each set/parameter. Since all the datasets are connected, it is easy to make structural changes on the fly.
- I have two other master files. One for sets (Model/Data/Sets/Sets.xlsm). Another for parameters (Model/Data/Parameters/Parameters.xlsm)
- Once I am happy with the data model’s results, I copy-paste the data in the sets or parameters data files.
- In these sets and parameters files, I have different tabs for different parts of the model. Each tab has a combination of Excel tables (not ranges) and Pivot Tables. I add the name of the set/parameter on top of the table.
- I built a macro in each of the Excel files to go through all the pivot tables and tables, then extract them as CSV files in Model/Data/Sets/SetCSV and Model/Data/Parameters/ParametersCSV
- Once I am happy with the data for my model, I press a button and the macros create the CSV files. Now we have two folders full of CSVs for each set or parameter.
Modelling
- I define all my sets in Model/parameters/sets.set
- I run Data/LoadData.gms with this code this is slow:
* Declaring the sets
$include Model/parameters/sets.set
* Declaring the paths
$setLocal Set_CSV Data/Sets/SetCSV/
$setLocal Set_GDX Data/Sets/SetGDX
* Embedded Batch File: CSV-to-GDX Conversion for sets
$onEcho > Data\Sets\set_csv2gdx.bat
@echo off
setlocal enabledelayedexpansion
rem -- Jump to the folder where this script lives (Data/Sets)
cd /d %~dp0
rem -- Define paths relative to this script
set "CSV_PATH=%~dp0SetCSV"
set "GDX_PATH=%~dp0SetGDX"
echo Converting CSV files from: !CSV_PATH!
echo Output GDX files to: !GDX_PATH!
rem -- Delete any existing GDX files in the destination folder
del /f /q "!GDX_PATH!\*.*" 2>nul
if exist "!GDX_PATH!\sets.gdx" del /f /q "!GDX_PATH!\sets.gdx"
rem -- Create a temporary index file in the working directory
echo #FileName,HeaderCount,DataCount,SkipExtra > csv_index.txt
for %%f in ("!CSV_PATH!\*.csv") do (
set "filename=%%~nf"
rem –– count total lines (including header)
for /f %%L in ('find /v /c "" ^< "%%f"') do set totalLines=%%L
rem –– read header & data counts as before…
set "header="
set /p header=<"%%f"
for /f "tokens=1 delims=," %%H in ("!header!") do set "firstHeader=%%H"
if /I "!firstHeader!"=="!filename!" (set "skipExtra=1") else (set "skipExtra=0")
call :CountCommas "!header!" & set /a headerCount=!COMMA_COUNT!+1
set "dataLine="
for /f "usebackq skip=1 delims=" %%D in ("%%f") do if not defined dataLine set "dataLine=%%D"
call :CountCommas "!dataLine!" & set /a dataCount=!COMMA_COUNT!+1
rem –– now compute the Excel range using totalLines
if "!skipExtra!"=="1" (
set /a effdims=dataCount - 1
call :NumToCol !dataCount!
set "range=B2:!col!!totalLines!"
) else (
set /a effdims=dataCount
call :NumToCol !dataCount!
set "range=A2:!col!!totalLines!"
)
echo Importing !filename!.csv as a set with !effdims! dimensions using range !range!...
gdxxrw "!CSV_PATH!\!filename!.csv" ^
output="!GDX_PATH!\!filename!.gdx" ^
set=!filename! rng=!range! rdim=!effdims! maxdupeerrors=999999
)
rem -- Move the temporary index file into the GDX folder (cleanup)
move /Y csv_index.txt "!GDX_PATH!\"
endlocal
exit /b
:CountCommas
setlocal enabledelayedexpansion
set "line=%~1"
set "count=0"
:CCLoop
if "!line!"=="" goto CCDone
set "firstChar=!line:~0,1!"
set "line=!line:~1!"
if "!firstChar!"=="," set /a count+=1
goto CCLoop
:CCDone
endlocal & set COMMA_COUNT=%count%
exit /b
:NumToCol
setlocal enabledelayedexpansion
set "alphabet=ABCDEFGHIJKLMNOPQRSTUVWXYZ"
set /a idx=%1 - 1
set "col=!alphabet:~%idx%,1!"
endlocal & set col=%col%
exit /b
$offEcho
* Embedded Batch File: GDX Loader for sets
$onEcho > Data\Sets\set_gdxload.bat
@echo off
setlocal enabledelayedexpansion
rem -- Define the folder where GDX files are stored using the GAMS macro.
set "GDX_PATH=%cd%\%Set_GDX%"
rem -- Delete any existing load_sets.inc file in the GDX folder.
if exist "!GDX_PATH!\load_sets.inc" del /f /q "!GDX_PATH!\load_sets.inc"
rem -- Generate the include file from the parameter list (assumed to be in Data\Input\parameterslist.txt).
(
for /f "usebackq delims=" %%L in ("Data\Sets\setslist.txt") do (
echo $gdxin "%Set_GDX%/%%L.gdx"
echo $load %%L
echo $gdxin
)
) > "!GDX_PATH!\load_sets.inc"
rem Optionally, move the sets list file into the GDX folder:
rem move /Y "Data\Sets\setslist.txt" "!GDX_PATH!\"
endlocal
exit /b
$offEcho
* Calling the batch files
$if %import_sets% == Y
$call Data\Sets\set_csv2gdx.bat
$call Data\Sets\set_gdxload.bat
* Now include the generated load file so that all individual GDX sets are loaded.
$include "%Set_GDX%/load_sets.inc"
* Declaring the parameters
$include Model/parameters/parameters.par
* Declaring the paths
$setLocal Par_CSV Data/Parameters/ParametersCSV/
$setLocal Par_GDX Data/Parameters/ParametersGDX
* Embedded Batch File: CSV-to-GDX Conversion for parameters
$onEcho > Data/Parameters/par_csv2gdx.bat
@echo off
setlocal enabledelayedexpansion
rem -- Jump to the folder where this script lives (Data/Parameters)
cd /d %~dp0
rem -- Define paths relative to this script
set "CSV_PATH=%~dp0ParametersCSV"
set "GDX_PATH=%~dp0ParametersGDX"
echo Converting parameter CSV files from: !CSV_PATH!
echo Output GDX files to: !GDX_PATH!
rem -- Delete any existing GDX files in the destination folder
del /f /q "!GDX_PATH!\*.*" 2>nul
rem -- Create a temporary index file with headers
echo FileName,HeaderCount,DataCount,SkipExtra,TotalLines>csv_index.txt
rem -- 1) Build the index and capture TotalLines
for %%F in ("!CSV_PATH!\*.csv") do (
set "filename=%%~nF"
rem -- Count total lines (including header)
for /f %%L in ('find /v /c "" ^< "%%F"') do set "totalLines=%%L"
rem -- Read header and detect redundant first column
set /p "header="<"%%F"
for /f "tokens=1 delims=," %%H in ("!header!") do set "firstHeader=%%H"
if /I "!firstHeader!"=="!filename!" (set "skipExtra=1") else (set "skipExtra=0")
rem -- Count commas in header → headerCount
call :CountCommas "!header!"
set /a headerCount=COMMA_COUNT+1
rem -- Grab first data line and count commas → dataCount
set "dataLine="
for /f "usebackq skip=1 delims=" %%L in ("%%F") do if not defined dataLine set "dataLine=%%L"
call :CountCommas "!dataLine!"
set /a dataCount=COMMA_COUNT+1
rem -- Append everything (including totalLines) to index
echo !filename!,!headerCount!,!dataCount!,!skipExtra!,!totalLines!>>csv_index.txt
)
rem -- 2) Convert each CSV using the index
for /f "skip=1 tokens=1-5 delims=," %%a in (csv_index.txt) do (
set "fname=%%a"
set "headerCount=%%b"
set "dataCount=%%c"
set "skipExtra=%%d"
set "rowCount=%%e"
rem -- Trim any stray spaces
set "skipExtra=!skipExtra: =!"
set "rowCount=!rowCount: =!"
if "!skipExtra!"=="1" (
rem -- drop redundant first+value columns
set /a effdims=dataCount-2, colIndex=dataCount-1
call :NumToCol !colIndex!
set "range=B2:!col!!rowCount!"
) else (
rem -- use A through (dataCount-1)
set /a effdims=dataCount-1, colIndex=dataCount-1
call :NumToCol !colIndex!
set "range=A2:!col!!rowCount!"
)
echo Importing !fname!.csv as a parameter with !effdims! dimensions using range !range!...
gdxxrw "!CSV_PATH!\!fname!.csv" ^
output="!GDX_PATH!\!fname!.gdx" ^
par=!fname! rng=!range! rdim=!effdims! maxdupeerrors=999999
)
rem -- Move the index file for record-keeping
move /Y csv_index.txt "!GDX_PATH!\"
endlocal
exit /b
:::::::::::::::::::::::::::::::::::::::::
:CountCommas
setlocal enabledelayedexpansion
set "line=%~1"
set "count=0"
:CCLoop
if "!line!"=="" goto CCDone
if "!line:~0,1!"=="," set /a count+=1
set "line=!line:~1!"
goto CCLoop
:CCDone
endlocal & set COMMA_COUNT=%count%
exit /b
:::::::::::::::::::::::::::::::::::::::::
:NumToCol
setlocal enabledelayedexpansion
set "alphabet=ABCDEFGHIJKLMNOPQRSTUVWXYZ"
set /a idx=%1 - 1
set "col=!alphabet:~%idx%,1!"
endlocal & set col=%col%
exit /b
$offEcho
* Embedded Batch File: GDX Loader for parameters
$onEcho > Data/Parameters/par_gdxload.bat
@echo off
setlocal enabledelayedexpansion
rem -- Define the folder where GDX files are stored using the GAMS macro.
set "GDX_PATH=%cd%\%Par_GDX%"
rem -- Delete any existing load_params.inc file in the GDX folder.
if exist "!GDX_PATH!\load_params.inc" del /f /q "!GDX_PATH!\load_params.inc"
rem -- Generate the include file from the parameter list (assumed to be in Data\Input\parameterslist.txt).
(
for /f "usebackq delims=" %%L in ("Data\Parameters\parameterslist.txt") do (
echo $gdxin "%Par_GDX%/%%L.gdx"
echo $load %%L
echo $gdxin
)
) > "!GDX_PATH!\load_params.inc"
rem Optionally, move the parameters list file into the GDX folder:
rem move /Y "Data\Parameters\parameterslist.txt" "!GDX_PATH!\"
endlocal
exit /b
$offEcho
* Calling the batch files
$if %import_parameters% == Y
$call Data\Parameters\par_csv2gdx.bat
$call Data\Parameters\par_gdxload.bat
* Now include the generated load file so that all individual GDX sets are loaded.
$include "%Par_GDX%/load_params.inc"
This creates 2 batch files for sets, and another 2 for parameters.
The first batch file (set_csv2gdx.bat and par_csv2gdx.bat):
- converts the CSV files I just created to individual GDX files with the same name
e.g. my_parameter.csv → my_parameter.gdx
- puts them in the folder Data/Sets/SetGDX or Data/Parameters/ParameterGDX
- Creates a txt file with the full list of sets and parameters (setslist.txt and parameterslist.txt)
my_parameter
my_other_parameter
...
The second batch file (set_gdxload.bat and par_gdxload.bat):
- Creates an include file load_sets.inc and load_params.inc that looks like this:
$gdxin "Data/Parameters/ParametersGDX/my_parameter.gdx"
$load my_parameter
$gdxin
$gdxin "Data/Parameters/ParametersGDX/my_other_parameter.gdx"
$load my_other_parameter
$gdxin
...
- I run the model with tagged flags when I want to import sets and parameters or to skip them.
Thoughts
This is slow because I see that GAMS is running individual instances of conversion from CSV to GDX for each of the sets and parameters, taking around 1-3 seconds each! This isn’t a problem if I run the code and forget about it but sometimes I need quick runs and then this becomes a few minutes of waiting JUST to import the datasets.
The good part of the code is that it automatically does everything I need. The bat files are automatically detecting the CSVs in the determined folders, determining how many rows of data they have and how many dimensions so I don’t need to code that elsewhere since it is in the loop of the bat file.
I tried code to run the CSV to GDX conversion in parallel many times without success, thinking that maybe the computer can use the cores to process the conversion in parallel rather than doing one at a time.
I think that at this point I am extremely overcomplicating things. However, bottomline, is that this code does what I need beautifully, but sloooowly.