Hi,
For my master’s thesis I have to run pretty large GAMS simulation that are created via another tool. I assume these are correctly formed, given that this other tool works as expected.
I run them on the cluster I have access to, with the following command in the SLURM submission script:
srun $GAMSPATH/gams UCM_h.gms threads=16 workSpace=6300 > some-log_file.log
The job script specifies the resources to be allocated, I ask 16 cpus and 400MB per cpu, hence 6400MB in total. It’s worth mentioning that the UCM_h.gms file aslo specifies “Option threads=16;” so that it actually takes precedence over the command-line.
When I launch these, I get the following in the log (full log joined):
# ... lots of lines
Dual simplex solved model.
Root relaxation solution time = 12.91 sec. (9781.40 ticks)
Nodes Cuts/
Node Left Objective IInf Best Integer Best Bound ItCnt Gap
* 0+ 0 3.76641e+13 4.06277e+11 98.92%
Found incumbent of value 3.7664120e+13 after 55.04 sec. (50034.23 ticks)
0 0 2.18079e+12 11426 3.76641e+13 2.18079e+12 70267 94.21%
* 0+ 0 3.76538e+13 2.18079e+12 94.21%
Found incumbent of value 3.7653757e+13 after 68.03 sec. (62585.06 ticks)
0 0 2.18079e+12 8136 3.76538e+13 Cuts: 9899 83209 94.21%
0 0 2.18080e+12 7895 3.76538e+13 Cuts: 7452 98828 94.21%
0 0 2.18080e+12 6865 3.76538e+13 Cuts: 5977 110708 94.21%
* 0+ 0 3.76448e+13 2.18080e+12 94.21%
Found incumbent of value 3.7644847e+13 after 92.52 sec. (82006.49 ticks)
0 0 -1.00000e+75 0 3.76448e+13 2.18080e+12 110708 94.21%
* 0+ 0 3.54979e+12 2.18080e+12 38.57%
Found incumbent of value 3.5497876e+12 after 100.50 sec. (86565.08 ticks)
0 0 2.18080e+12 6612 3.54979e+12 Cuts: 3885 118320 38.57%
0 0 2.18080e+12 6469 3.54979e+12 Cuts: 2720 123469 38.57%
Detecting symmetries...
0 0 2.18080e+12 6306 3.54979e+12 Cuts: 1590 125968 38.57%
* 0+ 0 3.48532e+12 2.18080e+12 37.43%
Found incumbent of value 3.4853207e+12 after 130.46 sec. (107363.89 ticks)
0 0 2.18080e+12 6400 3.48532e+12 Cuts: 1014 128020 37.43%
0 0 2.18080e+12 6424 3.48532e+12 Cuts: 745 129478 37.43%
0 0 2.18080e+12 6329 3.48532e+12 Cuts: 461 130489 37.43%
0 0 2.18080e+12 6242 3.48532e+12 Cuts: 364 131104 37.43%
* 0+ 0 3.48333e+12 2.18080e+12 37.39%
Found incumbent of value 3.4833338e+12 after 139.51 sec. (113121.00 ticks)
0 0 2.18080e+12 6519 3.48333e+12 Cuts: 192 131452 37.39%
Heuristic still looking.
Heuristic still looking.
--- Reading solution for model UCM_SIMPLE
--- Executing after solve: elapsed 0:24:50.581
--- UCM_h.gms(1161) 1866 Mb
--- GDX File (execute_unload) /home/ulg/thermlab/fstraet/work/data-generation/simulations/ic-2000/sim-0_1.61-0.99-0.20-0.24-0.48-0.35/debug.gdx
--- Generating MIP model UCM_SIMPLE*** Error: Could not spawn gamscmex, rc = 4
Cmex executable : /home/users/f/s/fstraet/gams37.1_linux_x64_64_sfx/gamscmex.out
System directory: /home/users/f/s/fstraet/gams37.1_linux_x64_64_sfx
From https://www.gams.com/latest/docs/UG_GAMSReturnCodes.html, I see this error code corresponds to “the system limits were reached”.
I can check the job memory usage, and I see “Memory Utilized: 6.06 GB Memory Efficiency: 96.98% of 6.25 GB”. However I don’t think it is a memory issue, mainly because there is a return code for that (10).
Also while checking the job afterwards, I also observe “CPU Efficiency: 10.76% of 07:04:32 core-walltime”. I have pre-post processing scripts in python so these waste a bit of performance, but take around 2 minutes to run so it should still be able to go up to around 95%.
Therefore, my questions are:
- What does “System limits” actually refers to in this context ?
- Did I do something wrong while calling GAMS so that it fails ?
- Is it normal that I get such low CPU efficiency on the cluster ?
If I forgot some important piece of information, please tell me.
Any help is really appreciated
François
gamsrun_0-0.log (75.3 KB)