Larger quadratic programs build very slowly due to "Q extraction"

Hi everyone,

I have a rather large QCP model and noticed that the Q extraction algorithms take quite long. I have already switched from ThreePass to DoubleForward, which for my model lead to a ~40% decrease in the Q extraction time. However, it still takes a long time - in a smaller test run roughly as long as the actual solving time (Edit: it can be much worse, see reply). See the relevant output below.

First of all, I am wondering what exactly these algorithms even do. I have a strong suspicion that they determine the Q matrix in the quadratic form x^TQx of the QCP. But what’s not clear to me is why, from where, and how it is extracted. A few quick google searches did not lead anywhere.
Secondly, I am wondering if it may be possible to reformulate constraints in a way which would help the algorithm to extract the quadratic parts.

Cheers,
Gereon

Processor information: 1 socket(s), 14 core(s), and 20 thread(s) available
--- Starting compilation
--- _6b21d2db-d34d-4fef-9729-4182830ad6dc.gms(67) 53 Mb
--- Starting execution: elapsed 0:00:00.014
--- Generating QCP model esma
--- _6b21d2db-d34d-4fef-9729-4182830ad6dc.gms(806) 135 Mb
--- _6b21d2db-d34d-4fef-9729-4182830ad6dc.gms(1022) 156 Mb
--- _6b21d2db-d34d-4fef-9729-4182830ad6dc.gms(1044) 189 Mb
--- _6b21d2db-d34d-4fef-9729-4182830ad6dc.gms(1044) 197 Mb
--- _6b21d2db-d34d-4fef-9729-4182830ad6dc.gms(1077) 227 Mb
--- _6b21d2db-d34d-4fef-9729-4182830ad6dc.gms(1077) 234 Mb
--- _6b21d2db-d34d-4fef-9729-4182830ad6dc.gms(1099) 271 Mb
--- _6b21d2db-d34d-4fef-9729-4182830ad6dc.gms(1110) 280 Mb
--- _6b21d2db-d34d-4fef-9729-4182830ad6dc.gms(1121) 293 Mb
--- _6b21d2db-d34d-4fef-9729-4182830ad6dc.gms(1132) 293 Mb
--- _6b21d2db-d34d-4fef-9729-4182830ad6dc.gms(1168) 305 Mb
--- _6b21d2db-d34d-4fef-9729-4182830ad6dc.gms(1197) 306 Mb
--- _6b21d2db-d34d-4fef-9729-4182830ad6dc.gms(1197) 320 Mb
--- _6b21d2db-d34d-4fef-9729-4182830ad6dc.gms(1197) 309 Mb
--- _6b21d2db-d34d-4fef-9729-4182830ad6dc.gms(1197) 312 Mb
---   911,524 rows  813,029 columns  3,031,608 non-zeroes
---   730,080 nl-code  216,000 nl-non-zeroes
--- Range statistics (absolute non-zero finite values)
--- RHS       [min, max] : [ 1.908E-03, 2.909E+04] - Zero values observed as well
--- Bound     [min, max] : [ 1.103E-04, 8.386E+04] - Zero values observed as well
--- Matrix    [min, max] : [ 1.001E-03, 2.336E+02] - Zero values observed as well
--- _6b21d2db-d34d-4fef-9729-4182830ad6dc.gms(1197) 312 Mb
--- _6b21d2db-d34d-4fef-9729-4182830ad6dc.gms(1197) 307 Mb
--- Executing CPLEX (Solvelink=5): elapsed 0:00:02.285
--- Warning: SolveOpt is set to REPLACE (0).
---          Empty equations are not passed to the solver
---          and will be cleared when reading the solution.

IBM ILOG CPLEX   47.4.1 4b675771 Aug 13, 2024          WEI x86 64bit/MS Window

--- GAMS/CPLEX licensed for continuous and discrete problems.

Reading parameter(s) from "C:\Users\rech_ge\AppData\Local\Temp\tmpuq05hh_o\cplex.123"
>>  threads 6
>>  lpmethod 4
>>  predual -1
>>  solutiontype 2
>>  barepcomp 1e-06
>>  barqcpepcomp 1e-06
>>  names no
>>  qextractalg 2
Finished reading from "C:\Users\rech_ge\AppData\Local\Temp\tmpuq05hh_o\cplex.123"

--- GMO Q Extraction (DoubleForward): 84.77s
--- GMO setup time: 84.77s
--- GMO memory 442.28 Mb (peak 455.44 Mb)
--- Dictionary memory 0.00 Mb
--- Cplex 22.1.1.0 link memory 25.94 Mb (peak 77.49 Mb)
--- Starting Cplex

Version identifier: 22.1.1.0 | 2022-11-27 | 9160aff4d
CPXPARAM_Advance                                 0
CPXPARAM_Preprocessing_Dual                      -1
CPXPARAM_LPMethod                                4
CPXPARAM_Threads                                 6
CPXPARAM_Parallel                                1
CPXPARAM_SolutionType                            2
CPXPARAM_MIP_Display                             4
CPXPARAM_MIP_Pool_Capacity                       0
CPXPARAM_MIP_Tolerances_AbsMIPGap                0
CPXPARAM_Barrier_ConvergeTol                     9.9999999999999995e-07
CPXPARAM_Barrier_QCPConvergeTol                  9.9999999999999995e-07
Tried aggregator 1 time.
QCP Presolve eliminated 203887 rows and 134023 columns.
Aggregator did 64079 substitutions.
Reduced QCP has 1352909 rows, 876287 columns, and 3264288 nonzeros.
Reduced QCP has 187920 quadratic constraints.
Presolve time = 2.27 sec. (2506.09 ticks)
Parallel mode: using up to 6 threads for barrier.

***NOTE: Found 145 dense columns.

Number of nonzeros in lower triangle of A*A' = 11598928
Using Nested Dissection ordering
Total time for automatic ordering = 3.30 sec. (4828.48 ticks)
Summary statistics for Cholesky factor:
  Threads                   = 6
  Rows in Factor            = 1353054
  Integer space required    = 4884361
  Total non-zeros in factor = 47038393
  Total FP ops to factor    = 12116570487
 Itn      Primal Obj        Dual Obj  Prim Inf Upper Inf  Dual Inf Inf Ratio
   0   1.9973228e+04   7.3185902e-12  1.55e+08  0.00e+00  1.74e+06  1.00e+00
[removed intermediate iterations]
  72   8.9960489e+03   8.9960409e+03  8.27e+01  0.00e+00  9.26e-01  2.53e+04

--- QCP status (1): optimal.
--- Cplex Time: 89.62sec (det. 111177.80 ticks)


Optimal solution found
Objective:         8996.048872

--- Reading solution for model esma
***
*** Reading with solveopt=REPLACE (0)
***
--- _6b21d2db-d34d-4fef-9729-4182830ad6dc.gms(1197) 307 Mb  265 secs
--- _6b21d2db-d34d-4fef-9729-4182830ad6dc.gms(1197) 309 Mb  265 secs
--- Executing after solve: elapsed 0:04:27.203
--- _6b21d2db-d34d-4fef-9729-4182830ad6dc.gms(1201) 309 Mb
--- _6b21d2db-d34d-4fef-9729-4182830ad6dc.gms(1259) 309 Mb
--- GDX File C:\Users\rech_ge\AppData\Local\Temp\tmpuq05hh_o\_6b21d2db-d34d-4fef-9729-4182830ad6dcout.gdx
*** Status: Normal completion
--- Job _6b21d2db-d34d-4fef-9729-4182830ad6dc.gms Stop 09/02/24 12:04:02 elapsed 0:04:27.470

I have some further results. In order to see whether other languages have this problem, I set up a toy model in GAMSPy and JuMP, both targeting Gurobi.
The GAMS model took ~1550s, while the JuMP model took ~6s. The GAMS model spends 1547.14s doing the Q extraction. There seems to be no such thing in JuMP, but it is still able to communicate the problem to Gurobi as a quadratic program. I am wondering where this big difference comes from?

The JuMP model

using JuMP
using Gurobi 
using Profile

function solve_model()
    m = Model(Gurobi.Optimizer)

    dim = floor(Int, 1e6)
    
    @variable(m, x[1:dim] >= 0)
    @constraint(m, e, x.^2 .>= 4)
    @objective(m, Min, ones(dim)'x)
    
    optimize!(m) 
end

solve_model()
@time solve_model()

Output

Gurobi Optimizer version 11.0.2 build v11.0.2rc0 (win64 - Windows 10.0 (19045.2))

CPU model: 13th Gen Intel(R) Core(TM) i7-1370P, instruction set [SSE2|AVX|AVX2]
Thread count: 14 physical cores, 20 logical processors, using up to 20 threads

Optimize a model with 0 rows, 1000000 columns and 0 nonzeros
Model fingerprint: 0xebe0e142
Model has 1000000 quadratic constraints
Coefficient statistics:
  Matrix range     [0e+00, 0e+00]
  QMatrix range    [1e+00, 1e+00]
  Objective range  [1e+00, 1e+00]
  Bounds range     [0e+00, 0e+00]
  RHS range        [0e+00, 0e+00]
  QRHS range       [4e+00, 4e+00]
Presolve removed 0 rows and 1000000 columns
Presolve time: 0.14s
Presolve: All rows and columns removed

Barrier solved model in 0 iterations and 0.14 seconds (0.10 work units)
Optimal objective 2.00000000e+06

User-callback calls 24, time in user-callback 0.00 sec
  6.224805 seconds (63.00 M allocations: 4.588 GiB, 28.90% gc time)

The GAMSPy model

import sys
import gamspy as gp

ct = gp.Container()

dim = int(1e6)

S = ct.addSet("S", records=range(dim))

x = ct.addVariable("x", type="positive", domain=S)
e = ct.addEquation("e", domain=S)
e[S] = x[S] ** 2 >= 4
obj = gp.Sum(S, x[S])

m = ct.addModel(
    "m",
    problem=gp.Problem.QCP,
    equations=ct.getEquations(),
    sense=gp.Sense.MIN,
    objective=obj,
)
m.solve(solver="Gurobi", output=sys.stdout)

Output

Processor information: 1 socket(s), 14 core(s), and 20 thread(s) available
--- Starting compilation
--- _1cfe1ac4-4737-4f05-afe6-8ca40aee045a.gms(67) 83 Mb
--- Starting execution: elapsed 0:00:00.016
--- Generating QCP model m
--- _1cfe1ac4-4737-4f05-afe6-8ca40aee045a.gms(24) 280 Mb
--- _1cfe1ac4-4737-4f05-afe6-8ca40aee045a.gms(24) 281 Mb
--- _1cfe1ac4-4737-4f05-afe6-8ca40aee045a.gms(35) 345 Mb
--- _1cfe1ac4-4737-4f05-afe6-8ca40aee045a.gms(35) 357 Mb
--- _1cfe1ac4-4737-4f05-afe6-8ca40aee045a.gms(35) 353 Mb
---   1,000,001 rows  1,000,001 columns  2,000,001 non-zeroes
---   4,000,000 nl-code  1,000,000 nl-non-zeroes
--- Range statistics (absolute non-zero finite values)
--- RHS       [min, max] : [ 4.000E+00, 4.000E+00] - Zero values observed as well
--- Bound     [min, max] : [        NA,        NA] - Zero values observed as well
--- Matrix    [min, max] : [ 1.000E+00, 1.000E+00] - Zero values observed as well
--- _1cfe1ac4-4737-4f05-afe6-8ca40aee045a.gms(35) 331 Mb
--- Executing GUROBI (Solvelink=2): elapsed 0:00:00.826

Gurobi           47.5.0 d6b95593 Aug 29, 2024          WEI x86 64bit/MS Window

Gurobi library version 11.0.2
GRB_LICENSE_FILE = C:\gurobi911\gurobi.lic
GAMS/Gurobi Link license.
Set parameter OutputFlag to value 1
Set parameter MIPGapAbs to value 0
--- GMO Q Extraction (ThreePass): 1547.14s
--- GMO setup time: 1547.14
Set parameter QCPDual to value 1      
Set parameter FuncNonlinear to value 1
Space for names approximately 17.06 Mb
Use option 'names no' to turn use of names off
Starting Gurobi...
Gurobi Optimizer version 11.0.2 build v11.0.2rc0 (win64 - Windows 10.0 (19045.2))

CPU model: 13th Gen Intel(R) Core(TM) i7-1370P, instruction set [SSE2|AVX|AVX2]  
Thread count: 14 physical cores, 20 logical processors, using up to 20 threads   

Optimize a model with 0 rows, 1000000 columns and 0 nonzeros
Model fingerprint: 0xebe0e142
Model has 1000000 quadratic constraints
Coefficient statistics:
  Matrix range     [0e+00, 0e+00]
  QMatrix range    [1e+00, 1e+00]
  Objective range  [1e+00, 1e+00]
  Bounds range     [0e+00, 0e+00]
  RHS range        [0e+00, 0e+00]
  QRHS range       [4e+00, 4e+00]
Presolve removed 0 rows and 1000000 columns
Presolve time: 0.14s
Presolve: All rows and columns removed

Barrier solved model in 0 iterations and 0.14 seconds (0.10 work units)
Optimal objective 2.00000000e+06
Solving KKT system to obtain QCP duals...

Optimize a model with 1000000 rows, 1000000 columns and 1000000 nonzeros
Model fingerprint: 0x08773976
Coefficient statistics:
  Matrix range     [4e+00, 4e+00]
  Objective range  [3e+00, 3e+00]
  QObjective range [2e+00, 2e+00]
  Bounds range     [0e+00, 0e+00]
  RHS range        [8e+00, 8e+00]
Presolve removed 1000000 rows and 1000000 columns
Presolve time: 0.71s
Presolve: All rows and columns removed

Barrier solved model in 0 iterations and 0.71 seconds (0.56 work units)
Optimal objective 2.00000000e+06

User-callback calls 26, time in user-callback 0.00 sec
QCP status(2): Model was solved to optimality (subject to tolerances).
--- Reading solution for model m
--- Executing after solve: elapsed 0:25:50.424
--- _1cfe1ac4-4737-4f05-afe6-8ca40aee045a.gms(39) 331 Mb
--- _1cfe1ac4-4737-4f05-afe6-8ca40aee045a.gms(97) 331 Mb
--- GDX File C:\Users\rech_ge\AppData\Local\Temp\tmprz0lgh6v\_1cfe1ac4-4737-4f05-afe6-8ca40aee045aout.gdx
*** Status: Normal completion
--- Job _1cfe1ac4-4737-4f05-afe6-8ca40aee045a.gms Stop 09/02/24 16:29:32 elapsed 0:25:50.649
(esma) 

Traditionally, GAMS deals with non-linear models and provides (point) derivatives to the solver. This is how solvers like MINOS and CONOPT solve quadratic problems, for these solvers a quadratic problem is no different from a general non-linear problem.

For solvers that need the explicit quadratic parts (and have requirements like PSD for such matrices) of the model, e.g. Cplex, Copt, Gurobi, Mosek, Xpress, … GAMS extracts the Q matrices from the non-linear instructions. This extraction algorithms use different strategies and storage classes (e.g. sparse/dense) depending on your particular instance. Your post triggered some fresh look at these functions. They needed some tuning and clean up. Moreover, we added a new “diagonal” only extractor, which is blazingly fast. You will be able to enjoy all this with GAMS 48. The beta release is scheduled for October 1st, 2024. Here is some preview of running times with the current and the new version:

dim GAMS 47 GAMS 48
10 000 0.02 0.00
50 000 0.36 0.01
100 000 1.32 0.01
500 000 33.58 0.07
1 000 000 153.80 0.13
3 Likes