Running MIP Models bigger than 250GBs

Hi Team,

We are trying to solve an MIP model with GAMS/CPLEX. The model returns a solution when we run it with small sets (less no. of variables). But in reality those sets can be very large and can have upto 3000-4000 elements. When we ran the model with the larger set (real case scenario), model generation time increased by a lot which should be reasonable to expect.

By improving the code we have been able to reduce the model size a bit but it is still considerable. When we run it, the model starts generating, it reaches a total size of 250GBs and ends with “Error: Out of Memory”. We have tried with options such as memoryemphasis, workmem, solvelink = 0 etc.

My Question is:- Lets say if we increase the RAM by a large amount and give the model the processing power it needs (by using GPUs, TPUs etc.), basically run the model on a better hardware, Will it then be reasonable to expect a solution as the model returned solution when ran with less no. of variables? Or is it the other case that increasing the size of sets (variables) leads to not only an increase in computational requirement but also the complexity of solving the problem which may render the model unsolvable?

In short, Will investing in a better hardware pay-off? Is there any information available as to what is largest size of model GAMS/CPLEX has solved till date? (I understand that the complexity of the formulation also matters and not just the model size, but knowing the max. solved model size will help).

Thank you.
Regards,
-Utkarsh

Hi,

Expressing the model in GB is not so helpful. Estimate manually how many variables and constraints your model will have. Also the number of discrete (e.g. binary) variables are good to know. Also try to estimate how many non-zeros you will eventually have. This will also allow you to estimate the memory you will need.

Chances are that you won’t be able to solve a MIP model of this size with off-the-shelf solvers, but this is not certain. The MIP solver does a step called preprocessing. I have seen models with millions of variables and constraints that are reduced by presolve to tiny models. Often something wasn’t quite right in such cases, but the effect of presolve should not be understated. Performance of solvers for MIP models is notoriously difficult to predict. There are tiny MIP models that run forever while large well behaved MIPs can be solved efficiently.

If I were in your shoes, I would reduce my model instance to a size that just fits on your existing machine and see what happens with this instance. Working with large models can be a challenge. GAMS defaults are often designed for smaller and medium sized models with lots of debug output (e.g. equation listing) turned on. The GAMS experts at support have plenty of experience with large models, so you might want to share your model source with them and get some concrete advice.

-Michael

Hi Miachel,

Thanks for the insight. We will definitely try running a bit smaller version of the model as you suggested.

However, I do need a bit of information on calculating memory required with the help of no. of variables and constraints. Is there any defined formula or technique where we input the no. of variables, constraints etc. and get memory required? or We have to just extrapolate memory consumed by GAMS for a smaller model to get an approximate idea of memory that will be required by a larger model if it has ‘n’ times more variables (constraints etc.)?

Any suggested reads about these topics?


Thanks & Regards,
-Utkarsh

Hi, the number of variables and constraints count somewhat in the calculation in memory requirement but they are usually dominated by requirement of the number of non-zero elements (so where variables and constraints intersect with a non-zero value). GAMS put out these stats after generating the model:

--- indus89.gms(3618) 6 Mb
--- Generating LP model wsisn
--- indus89.gms(3621) 8 Mb
---   2,726 rows  6,570 columns  39,489 non-zeroes

You might also want to check GAMS internal memory usage (before model generation). You find this out from interpreting the last line before “— Generating XXX model YYYY”, this list the size of the GAMS heap (here 6Mb). Moreover, usually the memory GAMS consumes to generate a model (so the delta between the GAMS internal heap before and after model generation, here 8Mb-6Mb=2Mb) is at least tripled by what the solver requires to solve. If it is convenient (and GAMS dynamic subset usually help a lot with this) you can make a series of experiments. Load all the data but use 5, 10, 15, … percent of the data and generate the model with this. For example, if you have two big sets i (e.g. location) and t (e.g. time) in your model, use dynamic sets ii(i) and tt(t) in your equation algebra and fill them with a fraction of the real i and t:

set ii(i), tt(t);
ii(i) = ord(i)<0.1*card(i);
tt(t) = ord(t)<0.1*card(t);

Now run the series of models and record the memory requirement curve of GAMS and the solver (I would use the taskmanager [Windows] or other tools to record the memory consumption of the processes gamscmex.exe and gmsgennx.exe (solver)). This will give you the best extrapolation for memory requirement of your big model. If the model is a MIP the solver memory requirement might not be as easy to predict because it works with a branch-and-bound tree and that can grow if the model is difficult to solve. At least you get a pretty good idea what it takes to get the model started.

-Michael