MDP Modeling

Hi,

I am trying to model conflict scenarios with response solutions (for electricity access) across 11 regions in a country using an MDP.

I am stuck with some debugging errors:

(1) Domain violation for elements L, M & H and (2) Row dimension inconsistency for the regions

GAMS file is attached (it is part of a larger code)

Kindly assist.
Conflict Model.gms (2.85 KB)

This syntax does not work in GAMS (even if you change L M H to low medium high, of the set elements conflicts from low medium high to L M H):

table ConflictToSolution(Regions, Conflicts, Solutions) 'Mapping conflict level to solutions in each region'
                 L   M   H      A   B   C
region1          0   0   1      1   0   0
region2          0   0   1      1   0   0
region3          0   0   1      1   0   0
region4          0   0   1      1   0   0
region5          0   0   1      1   0   0
region6          0   0   1      1   0   0
region7          0   0   1      1   0   0
region8          0   0   1      1   0   0
region9          0   0   1      1   0   0
region10         0   0   1      1   0   0
region11         0   0   1      1   0   0;

What do you try to say? If you want to “map” then use a map:

set ConflictToSolution(Regions, Conflicts, Solutions) / 
region1.high.A
region2.high.A
region3.high.A
region4.high.A
...
/

Moreover, you can optimize a free scalar variable only, so “solve MDP using mip maximizing V(Regions, Time);” does not work. I changed your objective to use a scalar variable.

Your BellmanEq is a quadratic constraint, so you can’t solve this as a MIP. I changed to MIQCP. Now this compiles and solves but is probably not what you want. These are a lot of rookie mistakes. I suggest that you go back to square one and invest learning the system by studying the GAMS tutorials.

-Michael
Conflict Model.gms (2.98 KB)

Thanks Michael.

Great to see the model work.

However, I’ve tried something else on the same subject, (I attempted the mapping hint - couldn’t get it to work) and now have difficulty resolving some of the errors.

For one, “missing suffix” on line 47 and 48.

kindly assist.
Conflict Model_2.gms (1.7 KB)

I started looking at this and it is full of compilation error and it shows fundamental lack of understanding of the GAMS language from your side. My initial advise, to invest in learning the language by starting to go through some tutorial material still holds. Good luck.

-Michael

Understood.

Just pressed for time and deliverable. Will invest some time much later.

For the meantime, can you provide some guidance.

Hi Michael,

I have made some adjustments to the model.

Although I have one syntax error to resolve, can you look through and see if it makes sense.

I’ll want to ultimately generate the best possible solution for each region - indicated by the solution with maximum reward.
Conflict Model_4.gms (4.24 KB)

can you look through and see if it makes sense

No. I can’t review the model this takes insight into the problem and time. If this is student work check with your advisor. If this is professional work, get some professional help on such tasks.

-Michael

Hi Michael,

I keep getting “Uncontrolled set entered as constant” error for the Bellman and Policy equations which affects the Solve statement below.

Sets
    Regions /r1*r11/,
    RewardMap /r1hA, r1hB, r1hC, r2mA , r2mB, r2mC, r3mA , r3mB, r3mC, r4mA , r4mB, r4mC, r5lA , r5lB, r5lC, r6lA , r6lB, r6lC, r7lA , r7lB, r7lC,r8lA , r8lB, r8lC,r9lA , r9lB, r9lC,r10lA , r10lB, r10lC,r11lA , r11lB, r11lC/,
    Conflicts /low, medium, high/,
    Solutions /A, B, C/;

alias (Regions, r);
alias (Regions, rr);
alias (Conflicts, c);
alias (Solutions, s);

Table Prob(r, *)  "probability of each conflict in each region"
       CPr
r1     0.54
r2     0.15
r3     0.10
r4     0.06
r5     0.04
r6     0.03
r7     0.03
r8     0.02
r9     0.02
r10    0.01
r11    0.01;

*Options A=330/132kV, B=33/11kV, C=415V to Solution Low=A, Medium=B or High=C
*The higher the conflict, the closer the grid solution to the point of use and vise versa

Table Reward(RewardMap, *)   "reward for taking an option in a conflict in a region"
       RewardV
r1hA   -5
r1hB    5
r1hC   10
r2mA    3
r2mB   10
r2mC   -3
r3mA    3
r3mB   10
r3mC   -3
r4mA    3
r4mB   10
r4mC   -3
r5lA   10
r5lB    5
r5lC   -2
r6lA   10
r6lB    5
r6lC   -2
r7lA   10
r7lB    5
r7lC   -2
r8lA   10
r8lB    5
r8lC   -2
r9lA   10
r9lB    5
r9lC   -2
r10lA  10
r10lB   5
r10lC  -2
r11lA  10
r11lB   5
r11lC  -2;

Parameter discount "discount factor" /0.9/;

Variables
    V(r)        "expected cumulative discounted reward"
    policy(r)   "optimal policy";

Equations
    Bellman(Regions)  "Bellman equation";

    V.up(r) =  1000;
    V.lo(r) = -1000;

    Bellman(Regions).. V.l(Regions) =e= sum(c, prob(Regions, 'CPr') * (Reward(RewardMap, 'RewardV') + discount * sum(r$(ord(r) <> ord(r)), prob(Regions, 'CPr') * V.l(r))));

*   calculate the optimal policy for each region  (smax instead of argmax)
    policy.l(r) = smax(s, sum(c, prob(r, 'CPr')*(Reward(RewardMap, 'RewardV') +  discount * sum(rr$(ord(rr) <> ord(r)), prob(r, 'CPr') * V.l(r)))));

Model MDP /all/;

Solve MDP using LP maximizing V;

*   print the optimal policy and expected cumulative discounted reward for each region
    Loop(Regions,
                 Display "Region ", r, " Optimal policy: ", policy.l, " Expected cumulative discounted reward: ", V.l
         );

I have tried different declaration types: sets, parameters, tables and different arrangement for the data under these declarations, no luck.

I ultimately want to print optimised solutions/results for the regions r1 to r11 based on highest reward path and then the worst, medium and best cases/path based on rewards.

Can you assist?

If you look at your failing equation “Bellman(Regions)… V.l(Regions) =e= sum(c, prob(Regions, ‘CPr’) * (Reward(RewardMap, ‘RewardV’) …” ask your self what element of RewardMap should GAMS take when instantiating this equation. You don’t give any hint for that. This is what GAMS tells you went is says that there is an uncontrolled set. You need to tell GAMS which of the many ReWardMap elements it should take for a given Regions and c element (these are your controlled sets). You can control a set by adding it to the equation index (index Regions) set or by summing over it (set c).

-Michael

Thank you, Michael, for the help so far. I returned back to tweak the initial code that had worked.

Please I’m trying something else and cant understand the errors:

I keep getting a “767 Unexpected symbol will terminate the loop - symbol replaced by )” & “409 Unrecognizable item - skip to find a new statement looking for a ‘;’ or a key word to get started again” errors at the final display, additional parenthesis generates more errors.

sets
    regions /r1*r11/
    actions /A, B, C/
    states /low, medium, high/
    states_next /low, medium, high/
    time /2023*2050/;

alias (regions, r);
alias (actions, a);
alias (states, s);
alias (states_next, sn);


parameters
    reward(states,actions)  /low.A    -5,
                             low.B    1,
                             low.C    5,
                             medium.A -5,
                             medium.B 1,
                             medium.C 5,
                             high.A   -5,
                             high.B   1,
                             high.C   5/
    discount_rate /0.95/
    value(states, regions)
    value_new(states, regions)
    epsilon /0.01/
    max_iter /1000/
    iter;

    value(states, regions)=0;
    value_new(states, regions)=0;
    iter=0;

* Transition probability
* Probability of transitioning from one state to another when an action is taken
* Format: (region, current state, action, next state)
* Actions A=415V, B=33/11kV, C=330/132kV

Set transition_prob(regions, states, actions, states_next);  transition_prob('r1', 'high', 'A', 'low')        = 0.54;
                                                             transition_prob('r1', 'high', 'B', 'low')        = 0.54;
                                                             transition_prob('r1', 'high', 'C', 'low')        = 0.54;
                                                             transition_prob('r2', 'medium', 'A', 'low')      = 0.15;
                                                             transition_prob('r2', 'medium', 'B', 'low')      = 0.15;
                                                             transition_prob('r2', 'medium', 'C', 'low')      = 0.15;
                                                             transition_prob('r3', 'medium', 'A', 'low')      = 0.10;
                                                             transition_prob('r3', 'medium', 'B', 'low')      = 0.10;
                                                             transition_prob('r3', 'medium', 'C', 'low')      = 0.10;
                                                             transition_prob('r4', 'medium', 'A', 'low')      = 0.06;
                                                             transition_prob('r4', 'medium', 'B', 'low')      = 0.06;
                                                             transition_prob('r4', 'medium', 'C', 'low')      = 0.06;
                                                             transition_prob('r5', 'low', 'A', 'low')         = 0.04;
                                                             transition_prob('r5', 'low', 'B', 'low')         = 0.04;
                                                             transition_prob('r5', 'low', 'C', 'low')         = 0.04;
                                                             transition_prob('r6', 'low', 'A', 'low')         = 0.03;
                                                             transition_prob('r6', 'low', 'B', 'low')         = 0.03;
                                                             transition_prob('r6', 'low', 'C', 'low')         = 0.03;
                                                             transition_prob('r7', 'low', 'A', 'low')         = 0.03;
                                                             transition_prob('r7', 'low', 'B', 'low')         = 0.03;
                                                             transition_prob('r7', 'low', 'C', 'low')         = 0.03;
                                                             transition_prob('r8', 'low', 'A', 'low')         = 0.02;
                                                             transition_prob('r8', 'low', 'B', 'low')         = 0.02;
                                                             transition_prob('r8', 'low', 'C', 'low')         = 0.02;
                                                             transition_prob('r9', 'low', 'A', 'low')         = 0.02;
                                                             transition_prob('r9', 'low', 'B', 'low')         = 0.02;
                                                             transition_prob('r9', 'low', 'C', 'low')         = 0.02;
                                                             transition_prob('r10', 'low', 'A', 'low')        = 0.01;
                                                             transition_prob('r10', 'low', 'B', 'low')        = 0.01;
                                                             transition_prob('r10', 'low', 'C', 'low')        = 0.01;
                                                             transition_prob('r11', 'low', 'A', 'low')        = 0.01;
                                                             transition_prob('r11', 'low', 'B', 'low')        = 0.01;
                                                             transition_prob('r11', 'low', 'C', 'low')        = 0.01;

* Value iteration to convergence
* V(s) = max_a (R(s,a) + gamma * sum_s' P(s'|s,a) * V(s'))

*while((iter = max_iter) or ((value_new - value) < epsilon),
while(iter = max_iter,
     iter = iter + 1;
*     value = value_new;
     loop(regions,
          loop(states,
               loop(actions,
                    value_new(states, regions) = max(reward(states, actions) +
                         discount_rate * sum(transition_prob(regions, states, actions, states_next), value_new(states, regions)), value_new(states, regions));

                   );
              );
          );
     );


* Print the optimal policy
* Q(s,a) = R(s,a) + gamma * sum_s' P(s'|s,a) * V(s')

display "Optimal policy for each region:", value_new;

loop(regions,
     display regions;
     loop(states, display states, " action: ", actions(smax(reward(states, actions) +
                  discount_rate * sum(transition_prob(regions, states, actions, states_next), value(states_next, regions)),
                                                      value(states_next, regions)))
          );
*     );

Please kindly assist.