I recently introduced a code change in my NLP model (solved using IPOPTH) in which I moved a variable from the bottom to the top of the list. In an overly simplified example,
To my surprise, that tiny change caused significant slowdown in the hessian computation step, which on average increased the solve time of my NLP test problems by 40%. The size of my problems is in a range of 1-3 million variables. I believe hessian computation constitutes GAMS’ automatic differentiation (AD) algorithm. Some questions:
(1) Do you have any explanation for such behaviour?
(2) How much variables ordering (and perhaps equations ordering too) affect GAMS computation of the jacobian and hessian?
(3) I wonder if there is any rule-of-thumb that I should follow when listing variables and equations?
You’ve asked some interesting questions. I’ll try to provide some answers that are helpful but not too long.
First, some background. To keep it simple, lets consider just the computation of the structural Hessian, i.e. the nonzero pattern of the Hessian matrix. For example, xxx is structurally nonzero: it doesn’t matter that it’s numerically zero if x==0. Call this matrix S. We store the nonzero columns of S, and for each such column we store the nonzero elements in that column (below the diagonal). The data structures and algorithms we use for this are a combination of dense and sparse matrix techniques, and we switch adaptively between them. For example, suppose only the first K columns are nonlinear and the Hessian is dense (or nearly so) in these columns.
In this case, we will choose to work with a dense storage scheme for the K-x-K Hessian, and this will be a big win over any sparse storage scheme.
In your case, the variable order may result in different storage schemes being used for the structural Hessian. Or, if both orderings lead to a dense scheme being used, one ordering may result in a larger dense block (because the nonlinear variables are more spread out). This larger block will slow things down, even if it faster than switching to a sparse implementation. This could explain the different behavior you see.
For the Hessian computation, it is best if you can keep the block of nonlinear variables as small as possible. It doesn’t have to start at the first variable, but if you can confine all the nonlinear variables to a small range this allows a more efficient storage scheme.
The Lagrangian Hessian is accumulated row-by-row. The behavior of this shouldn’t be too sensitive to the row order. And to say anything about the optimal way to order rows I would have to know something about the Hessians of each row and how GAMS is choosing to store them.
I look forward to any follow-up discussion on this topic!
Is there anything in the lst file or solver stdout that might indicate some ordering is better? The first thing I look when hessian computations are slower is the NL-code size, but it doesn’t change with reordering.