Problem with round-tripping variables using rgdx / wgdx

alodi · September 20, 2018, 10:08am

Hello,

I try to read a variable from a gdx file, write this variable to another gdx file and read it in again using the R bindings.

> var <- rgdx("fulldata.gdx", list(name="v35_shEsPeT"))
> str(var)
List of 11
 $ name       : chr "v35_shEsPeT"
 $ type       : chr "variable"
 $ dim        : int 3
 $ val        : num [1:256, 1:4] 22 22 22 22 22 22 22 22 22 22 ...
 $ form       : chr "sparse"
 $ uels       :List of 3
  ..$ : chr [1:40] "1900" "1905" "1910" "1915" ...
  ..$ : chr [1:11] "AFR" "CHN" "EUR" "IND" ...
  ..$ : chr [1:114] "ngcc" "ngccc" "ngt" "gastr" ...
 $ domains    : chr [1:3] "ttot" "all_regi" "all_te"
 $ domInfo    : chr "full"
 $ field      : chr "l"
 $ varTypeText: chr "positive"
 $ typeCode   : int 3
> wgdx("test.gdx", var)
Error in wgdx("test.gdx", var) : 
  Inconsistent dimension found: 'dim'=3  doesn't match implied 'uels' dimension=2.

After some research, I figured the reason might be that only the level field of the variable is read, so I tried with the full variable.

> var <- rgdx("fulldata.gdx", list(name="v35_shEsPeT", field="all"))
> str(var)
List of 11
 $ name       : chr "v35_shEsPeT"
 $ type       : chr "variable"
 $ dim        : int 3
 $ val        : num [1:3135, 1:5] 22 22 22 22 22 22 22 22 22 22 ...
 $ form       : chr "sparse"
 $ uels       :List of 4
  ..$ : chr [1:40] "1900" "1905" "1910" "1915" ...
  ..$ : chr [1:11] "AFR" "CHN" "EUR" "IND" ...
  ..$ : chr [1:114] "ngcc" "ngccc" "ngt" "gastr" ...
  ..$ : chr [1:5] "l" "m" "lo" "up" ...
 $ domains    : chr [1:4] "ttot" "all_regi" "all_te" "_field"
 $ domInfo    : chr "full"
 $ field      : chr "all"
 $ varTypeText: chr "positive"
 $ typeCode   : int 3
> wgdx("test.gdx", var)
> str(rgdx("test.gdx"))
List of 6
 $ name: chr "*"
 $ type: chr "set"
 $ dim : int 1
 $ val : NULL
 $ form: NULL
 $ uels: chr [1:165] "1900" "1905" "1910" "1915" ...

Now the writing procedure seems to be successful, however, reading the variable again seems to have changed its structure, i.e., the UEL dimensions are lost. As a next step I tried to provide name and field information to rgdx in the last step when re-reading the variable:

> str(rgdx("test.gdx", list(name="v35_shEsPeT", field="all")))
List of 11
 $ name       : chr "v35_shEsPeT"
 $ type       : chr "variable"
 $ dim        : int 3
 $ val        : num [1:3135, 1:5] 22 22 22 22 22 22 22 22 22 22 ...
 $ form       : chr "sparse"
 $ uels       :List of 4
  ..$ : chr [1:165] "1900" "1905" "1910" "1915" ...
  ..$ : chr [1:165] "1900" "1905" "1910" "1915" ...
  ..$ : chr [1:165] "1900" "1905" "1910" "1915" ...
  ..$ : chr [1:5] "l" "m" "lo" "up" ...
 $ domains    : chr [1:4] "ttot" "all_regi" "all_te" "_field"
 $ domInfo    : chr "relaxed"
 $ field      : chr "all"
 $ varTypeText: chr "positive"
 $ typeCode   : int 3

Now all UEL dimensions contain all ranges. What am I missing here?

I’m using gams 25.1 on Ubuntu 18.04.1 LTS (gdxrrw_1.0.2)
Thanks for your help!

Renger · September 26, 2018, 7:16am

Hi
Perhaps you should try the option compress = TRUE

if TRUE, compress the factors in the data frame so they only include required levels. For the default compress=FALSE, each factor includes levels for the entire universe of UELs in the GDX file

Hope this helps

Renger

alodi · September 26, 2018, 8:02am

Hello,

thanks for your help. In gdxrrw’s version 1.0.2 (part of gams 25.1), there is, according to the documentation, no compress flag for rgdx available. This seems to be true:

Error in rgdx(gdx, list(name = "v35_shEsPeT"), compress = T) : 
  unused argument (compress = T)

It seems to be the case for rgdx.set and rgdx.param, which can not be used for variables.
Best,

Alois

Renger · September 26, 2018, 12:29pm

Hi Alois
There is a newer version 1.0.4. Perhaps updating might resolve this problem: https://support.gams.com/doku.php?id=gdxrrw:interfacing_gams_and_r
Cheers
Renger

alodi · September 26, 2018, 12:49pm

Hi Renger,

thanks again for taking your time. It looks like there is no change in 1.0.4 regarding the observed behavior. In particular I could reproduce all steps in this issue and there is

no change in the layout of the domains when re-reading data from the gdx and
no compress flag available for rgdx.

Furthermore, since the first execution of rgdx produces the correct layout (even without a compress flag), I strongly expect the difference with respect to the second execution to stem from wgdx.
Best,

Alois

Renger · September 27, 2018, 7:42am

Hi Alois
Could you send code that reproduces this problem. Here is a small model you could use for generating variables.

set s /a,b,c/, t /1,2,3/;

variables
    Y(s)    Sectors,
    X(t)    Technologies,
    dummy   Dummy variable;

equations
    eqY(s)
    eqX(t)
    dummyeq;

eqY(s)..
    Y(s) =E= 1;

eqX(t)..
    X(t) =E= 1;

dummyeq..
    dummy =E= 1;

model test /all/;

solve test minimizing dummy using NLP;

execute_unload 'test1.gdx', y

alodi · September 27, 2018, 12:15pm

Hi,

I modified your script slightly to have a variable (X) with two dimensions. The results are quite interesting.

Using execute_unload without specifying the variables that are to be saved to the gdx, a subsequent call to rgdx yields the correct domains:

GAMS:

execute_unload 'test1.gdx'

R:

> data <- rgdx("test1.gdx", list(name="x", field="all"))
> str(data)
List of 11
 $ name       : chr "X"
 $ type       : chr "variable"
 $ dim        : int 2
 $ val        : num [1:45, 1:4] 1 1 1 1 1 1 1 1 1 1 ...
 $ form       : chr "sparse"
 $ uels       :List of 3
  ..$ : chr [1:3] "a" "b" "c"
  ..$ : chr [1:3] "1" "2" "3"
  ..$ : chr [1:5] "l" "m" "lo" "up" ...
 $ domains    : chr [1:3] "s" "t" "_field"
 $ domInfo    : chr "full"
 $ field      : chr "all"
 $ varTypeText: chr "free"
 $ typeCode   : int 5

Saving this to a gdx file using wgdx and re-loading the file yields again the problems mentioned above:

> wgdx("test2.gdx", data)
> data2 <- rgdx("test2.gdx", list(name="x", field="all"))
> str(data2)
List of 11
 $ name       : chr "X"
 $ type       : chr "variable"
 $ dim        : int 2
 $ val        : num [1:45, 1:4] 1 1 1 1 1 1 1 1 1 1 ...
 $ form       : chr "sparse"
 $ uels       :List of 3
  ..$ : chr [1:6] "a" "b" "c" "1" ...
  ..$ : chr [1:6] "a" "b" "c" "1" ...
  ..$ : chr [1:5] "l" "m" "lo" "up" ...
 $ domains    : chr [1:3] "s" "t" "_field"
 $ domInfo    : chr "relaxed"
 $ field      : chr "all"
 $ varTypeText: chr "free"
 $ typeCode   : int 5

If one provides the variables to be saved in the call to execute_unload, the new structure appears right after reading it with R:
GAMS:

execute_unload 'test1.gdx', x

R:

> data <- rgdx("test1.gdx", list(name="x", field="all"))
> str(data)
List of 11
 $ name       : chr "X"
 $ type       : chr "variable"
 $ dim        : int 2
 $ val        : num [1:45, 1:4] 1 1 1 1 1 1 1 1 1 1 ...
 $ form       : chr "sparse"
 $ uels       :List of 3
  ..$ : chr [1:6] "a" "b" "c" "1" ...
  ..$ : chr [1:6] "a" "b" "c" "1" ...
  ..$ : chr [1:5] "l" "m" "lo" "up" ...
 $ domains    : chr [1:3] "s" "t" "_field"
 $ domInfo    : chr "relaxed"
 $ field      : chr "all"
 $ varTypeText: chr "free"
 $ typeCode   : int 5

It looks like wgdx does something similar to execute_unload when called with a variable name when it comes to saving the variable to gdx, i.e., merging the domains and copying them identically on all axes.
Please find appended the slightly modified script.
Best, Alois.
test.gms (333 Bytes)

dirkse · September 27, 2018, 5:02pm

Alois,

I can appreciate the struggle you have been engaged in with R, gdxrrw, and GDX files. Please find below some observations about the topic that I hope you find helpful. I will also try to include some attachments of some GAMS and R source that illustrates a way forward for you.

GDX data is more than just the data itself, there is possibly meta-data. For example, the data for a variable X are levels, marginals, etc. But the GDX can also contain metadata: what set(s) is the symbol X declared over, and what are the elements in these sets? With all of this, you get the nice output you got initially. Without it, you get something else. You can use the gdxdump utility to get the details for a particular GDX file. It’s quite helpful in this case;

https://www.gams.com/latest/docs/T_GDXDUMP.html?search=gdxdump

2. To address you specific question about round-trip behavior: the initial GDX file (from your first post) was dumped with no symbol list, so it contains the entire list of symbols and data from the GAMS run. This implies there is lots of meta-data: variable X is declared over sets s and t, and the data for these sets is also part of the GDX file, but not part of the symbol X itself. When you read X and write it back to GDX, most of this meta-data is lost. You will see this if you dump your GDX files like this:

gdxdump xxx.gdx domainInfo

I could not attach a R script, so I copied to .txt. If you run the attached data.gms, it will produce some interesting GDX file. rename the attached example.txt to example.R and run that also to see some interesting output and comments.

HTH,

-Steve
example.txt (1.13 KB)
data.gms (438 Bytes)

dirkse · September 27, 2018, 7:40pm

Hi,

I should have mentioned a couple things in my last post on this. First, you cannot do a compressed read for variables or equations. For parameters, we squeeze out index positions that don’t have any data, i.e.that are all zero. But for some fields of a variable, the default is not zero. For example, for a positive variable, the default upper bound is +INF and the default scale is 1. Should we filter out an index position if the upper bounds are all zero? Or if the upper bounds are all +INF? This is not an impossible problem to solve but it’s not so clear-cut what to do.

Second, you can get the domain information of a GDX file directly in R via gdxrrw via the gdxInfo() function. Using my example from earlier in this thread, I can do this to the all Data.gdx file (the one with full domain info):

> dmpAll <- gdxInfo ('allData.gdx',dump=F,returnDF=T)
> dmpAll$variables
   name index dim card           text doms domnames
1     Y     3   1    3        Sectors    1        s
2     X     4   2    9   Technologies 1, 2     s, t
3 dummy     5   0    1 Dummy variable              
> dmpAll$sets
  name index dim card text doms domnames
1    s     1   1    3         0        *
2    t     2   1    3         0        *

From this I conclude that variable X is indexed by (s,t) and that these sets are part of the GDX file: they are the first and second set, respectively. Essentially, I have full domain info here. But if I look at the justX.gdx file, I see a different story:

> dmpX <- gdxInfo ('justX.gdx',dump=F,returnDF=T)
> dmpX$variables
  name index dim card         text doms domnames
1    X     1   2    9 Technologies 0, 0     s, t
> dmpX$sets
[1] name     index    dim      card     text     doms     domnames
<0 rows> (or 0-length row.names)

The variable X has 2 dimensions with the set names (s,t) associated with them, but the doms is listed as 0,0 - these sets are not part of the GDX file.

-Steve

Renger · September 28, 2018, 9:43am

Hi Steve

Thanks for the many insights.
As I understand, the only way to read in the variable again correctly is by reading in the information from the full gdx file or by explicitly sending with the variable X also the set s and t to the justX.gdx file.

s <- rgdx('allData.gdx',list(name='s',compress=T))
suels <- s$uels[[1]]
t <- rgdx('allData.gdx',list(name='t',compress=T))
tuels <- t$uels[[1]]
filter3 <- list(suels,tuels)  
data3 <- rgdx("justX.gdx", list(name="x", field="all", uels=filter3))

Cheers
Renger

alodi · September 28, 2018, 1:57pm

Hi Renger, hi Steve,

thanks for your help. Providing the sets for the dimensions when writing to gdx does indeed solve the problem!
Here is the solution based on the example, where I included the call to wgdx:

> x=rgdx("test1.gdx", list(name="X", field="all"))
> s=rgdx("test1.gdx", list(name="s"))
> t=rgdx("test1.gdx", list(name="t"))
> str(x)
List of 11
 $ name       : chr "X"
 $ type       : chr "variable"
 $ dim        : int 2
 $ val        : num [1:45, 1:4] 1 1 1 1 1 1 1 1 1 1 ...
 $ form       : chr "sparse"
 $ uels       :List of 3
  ..$ : chr [1:3] "a" "b" "c"
  ..$ : chr [1:3] "1" "2" "3"
  ..$ : chr [1:5] "l" "m" "lo" "up" ...
 $ domains    : chr [1:3] "s" "t" "_field"
 $ domInfo    : chr "full"
 $ field      : chr "all"
 $ varTypeText: chr "free"
 $ typeCode   : int 5
> wgdx("test2.gdx", x, s, t)
> str(rgdx("test2.gdx", list(name="X", field="all")))
List of 11
 $ name       : chr "X"
 $ type       : chr "variable"
 $ dim        : int 2
 $ val        : num [1:45, 1:4] 1 1 1 1 1 1 1 1 1 1 ...
 $ form       : chr "sparse"
 $ uels       :List of 3
  ..$ : chr [1:3] "a" "b" "c"
  ..$ : chr [1:3] "1" "2" "3"
  ..$ : chr [1:5] "l" "m" "lo" "up" ...
 $ domains    : chr [1:3] "s" "t" "_field"
 $ domInfo    : chr "full"
 $ field      : chr "all"
 $ varTypeText: chr "free"
 $ typeCode   : int 5

Thanks again for your help and the swift response,

Alois

Edit: It might be worthwhile to point out in the documentation of wgdx that one has to save the uel sets explicitly. To me it was not obvious since this information is clearly contained in the object delivered to wgdx.
Edit2: Here is a wrapper (not tested)

var2gdx <- function(gdx, var){
    uels <- list()
    for(n in 1:(length(var$uels) - 1)){
        uels[[n]] <- list(name=var$domains[[n]], type="set", uels=list(var$uels[[n]]))
    }
    wgdx(gdx, var, uels)
}