Hi grecht, Great questions… thanks.
First some quick background, you already know much of this but just adding for completeness. There are 3 GAMS special values: EPS
, NA
and UNDF
. EPS
is used to explicitly represent a zero in GAMS and is mathematically zero, GAMS (and more explicitly, CMEX, our execution system, will not store zeros so we needed a way to get around this). NA
can be used in GAMS to initialize a symbol, but is not assigned to a numerical value at all, it is more of a placeholder. If a model contains an NA
value the user will get an execution error. Many people use NA
in order to initialize a symbol and then put their data into that symbol. If there data doesn’t cover their model’s use case then they might know there is either missing data or an error in how the model is constructed. UNDF
is a special value that is returned when a function evaluation goes sideways (like 1/0
).
But now we are working in the world of python, so we need two things 1) to be able to represent all these special values from GAMS, but also 2) to maintain a float
column datatype in order for pandas to be performant. Thus, we represent:
EPS
as -0.0
(a negative zero, which is still mathematically zero)
UNDF
as a nan
NA
also as a nan
There are many many nan
s avaliable to use for UNDF
and NA
… so we specifically use:
UNDF
is float("nan")
which has a byte pattern of:
In [1]: struct.pack('>d', float("nan")).hex()
Out[1]: '7ff8000000000000'
np.nan
also has the same byte pattern:
In [1]: struct.pack('>d', np.nan).hex()
Out[1]: '7ff8000000000000'
np.nan
is, therefore, only interpreted as UNDF
. Which is why you are not able to countNA
in your p1
.
NA
is represented as a nan
with a byte pattern of fffffffffffffffe
:
In [1]: struct.unpack(">d", bytes.fromhex("fffffffffffffffe"))[0]
Out[1]: nan
The logic of choosing the float("nan")
or np.nan
for UNDF instead of NA follows other function returns like:
In [1]: np.sqrt(-1)
<ipython-input-11-597592b72a04>:1: RuntimeWarning: invalid value encountered in sqrt
np.sqrt(-1)
Out[11]: np.float64(nan)
np.float64("nan")
also has the same byte pattern as np.nan
and float("nan")
.
A “special” nan
is used for NA which, in GAMS means “initialized, but no numerical value assigned” aka “missing”.
Hopefully that helps untangle the nan
behavior you are seeing.
Now on to the drop*
methods.
dropUndef
really means drop all nan
s that are GAMS UNDF
special values.
dropNA
really means drop all nan
s that are GAMS NA
special values.
and
dropMissing
really means drop all nan
s.
The “missing” naming follows pandas
behavior for dropna
… but you can see the obvious naming problem when compared to the pandas
method – so we adopted the (hopefully clearer) dropMissing
naming convention for a native GAMSPy
method that will just get rid of all nan
s (and not rely on native pandas
functionality – although mixing and matching pandas and GAMSPy methods is very common and is powerful).
You also state:
My use case is the following: I have a parameter based on which I want to define variable limits. This parameter is not defined for all set elements, and naturally I do not want to set any limit in the undefined cases. However, since GAMS assumes Parameters to be zero where they are not defined, it would simply set the limit to zero.
My suggestion is to simply define the sparse data for the parameter rather than using a numpy array. At this time, numpy arrays are assumed to be dense data structures which means that you must define all values for all domain tuples. This might get relaxed in a future release.
Something like this:
p1 = ct.addParameter("p1", domain=S, records=["b",1])
Then you could set your variable bounds:
v.lo[S].where[p1[S]] = p1[S]
hope this is helpful,
adam