Thank you, Michael, for the help so far. I returned back to tweak the initial code that had worked.
Please I’m trying something else and cant understand the errors:
I keep getting a “767 Unexpected symbol will terminate the loop - symbol replaced by )” & “409 Unrecognizable item - skip to find a new statement looking for a ‘;’ or a key word to get started again” errors at the final display, additional parenthesis generates more errors.
sets
regions /r1*r11/
actions /A, B, C/
states /low, medium, high/
states_next /low, medium, high/
time /2023*2050/;
alias (regions, r);
alias (actions, a);
alias (states, s);
alias (states_next, sn);
parameters
reward(states,actions) /low.A -5,
low.B 1,
low.C 5,
medium.A -5,
medium.B 1,
medium.C 5,
high.A -5,
high.B 1,
high.C 5/
discount_rate /0.95/
value(states, regions)
value_new(states, regions)
epsilon /0.01/
max_iter /1000/
iter;
value(states, regions)=0;
value_new(states, regions)=0;
iter=0;
* Transition probability
* Probability of transitioning from one state to another when an action is taken
* Format: (region, current state, action, next state)
* Actions A=415V, B=33/11kV, C=330/132kV
Set transition_prob(regions, states, actions, states_next); transition_prob('r1', 'high', 'A', 'low') = 0.54;
transition_prob('r1', 'high', 'B', 'low') = 0.54;
transition_prob('r1', 'high', 'C', 'low') = 0.54;
transition_prob('r2', 'medium', 'A', 'low') = 0.15;
transition_prob('r2', 'medium', 'B', 'low') = 0.15;
transition_prob('r2', 'medium', 'C', 'low') = 0.15;
transition_prob('r3', 'medium', 'A', 'low') = 0.10;
transition_prob('r3', 'medium', 'B', 'low') = 0.10;
transition_prob('r3', 'medium', 'C', 'low') = 0.10;
transition_prob('r4', 'medium', 'A', 'low') = 0.06;
transition_prob('r4', 'medium', 'B', 'low') = 0.06;
transition_prob('r4', 'medium', 'C', 'low') = 0.06;
transition_prob('r5', 'low', 'A', 'low') = 0.04;
transition_prob('r5', 'low', 'B', 'low') = 0.04;
transition_prob('r5', 'low', 'C', 'low') = 0.04;
transition_prob('r6', 'low', 'A', 'low') = 0.03;
transition_prob('r6', 'low', 'B', 'low') = 0.03;
transition_prob('r6', 'low', 'C', 'low') = 0.03;
transition_prob('r7', 'low', 'A', 'low') = 0.03;
transition_prob('r7', 'low', 'B', 'low') = 0.03;
transition_prob('r7', 'low', 'C', 'low') = 0.03;
transition_prob('r8', 'low', 'A', 'low') = 0.02;
transition_prob('r8', 'low', 'B', 'low') = 0.02;
transition_prob('r8', 'low', 'C', 'low') = 0.02;
transition_prob('r9', 'low', 'A', 'low') = 0.02;
transition_prob('r9', 'low', 'B', 'low') = 0.02;
transition_prob('r9', 'low', 'C', 'low') = 0.02;
transition_prob('r10', 'low', 'A', 'low') = 0.01;
transition_prob('r10', 'low', 'B', 'low') = 0.01;
transition_prob('r10', 'low', 'C', 'low') = 0.01;
transition_prob('r11', 'low', 'A', 'low') = 0.01;
transition_prob('r11', 'low', 'B', 'low') = 0.01;
transition_prob('r11', 'low', 'C', 'low') = 0.01;
* Value iteration to convergence
* V(s) = max_a (R(s,a) + gamma * sum_s' P(s'|s,a) * V(s'))
*while((iter = max_iter) or ((value_new - value) < epsilon),
while(iter = max_iter,
iter = iter + 1;
* value = value_new;
loop(regions,
loop(states,
loop(actions,
value_new(states, regions) = max(reward(states, actions) +
discount_rate * sum(transition_prob(regions, states, actions, states_next), value_new(states, regions)), value_new(states, regions));
);
);
);
);
* Print the optimal policy
* Q(s,a) = R(s,a) + gamma * sum_s' P(s'|s,a) * V(s')
display "Optimal policy for each region:", value_new;
loop(regions,
display regions;
loop(states, display states, " action: ", actions(smax(reward(states, actions) +
discount_rate * sum(transition_prob(regions, states, actions, states_next), value(states_next, regions)),
value(states_next, regions)))
);
* );
Please kindly assist.