Tara got some help from GAMS support but I want to share some of our findings. First of all csv2gdx processes a single CSV file to create a single GDX file with a single parameter. So there is no batch processing in csv2gdx. Here are some other considerations when processing large CSV files. Python was an obvious choice to process CSV files. Here is the reply from support about this:
I don’t know about the speed Python in the best possible way would process the CSV files but my guess it that it will be hard to beat csv2gdx. So I would just call csv2gdx 20 times and get 20 different GDX file that you load into GAMS. From the initial mail I extracted that you have one key index in column 1 and get the value from column 11. But you will process all records in the csv file. The only issue I can see with csv2gdx if that if you have to read the same csv file multiple times for different value columns.
I experimented with a 200MB CSV file with 11 columns. I read the first three as index and the eleventh as value. With gdx2csv reading CSV and writing GDX takes about 3 secs on my machine. If I read this using pandas csv import routine (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html) just the reading takes about 4 secs.
If you want to experiment yourself:
a) Here is the GAMS program that creates the 200 MB CSV file:
set i /1*180/;
alias (i,j,k);
file fx /x.csv/; put fx;
loop((i,j,k), put i.tl:0 ‘,’ j.tl:0 ‘,’ k.tl:0 ‘,4,5,6,7,8,9,10,’ pi:8:6 /);
b) Here is the csv2gdx call:
csv2gdx x.csv id=b output=b.gdx index=(1,2,3) value=11 colcount=11
Output (with timing info) is:
No errors, CSV2GDX time = 2932ms
c) Here is what I did in Python:
import pandas as pd
from datetime import datetime
t = datetime.now()
x = pd.read_csv(‘x.csv’)
print datetime.now()-t
Output is
0:00:03.762000