-
Notifications
You must be signed in to change notification settings - Fork 1
Quick start guide
The following shows how to get quickly started with mdpsolver.
The following is an example of a simple MDP containing three states and two actions in each state. Note however that MDPSolver does not require all states to contain the same number of actions.
#Import packages
import mdpsolver
#Rewards (3 states x 2 actions)
#e.g. choosing second action in first state gives reward=-1
rewards = [[5,-1],
[1,-2],
[50,0]]
#Transition probabilities (3 from_states x 2 actions x 3 to_states),
#e.g. choosing first action in third state gives a probability of 0.6 of staying in the third state.
#Note that we also provide support for sparse matrices.
tranMatWithZeros = [[[0.9,0.1,0.0],[0.1,0.9,0.0]],
[[0.4,0.5,0.1],[0.3,0.5,0.2]],
[[0.2,0.2,0.6],[0.5,0.5,0.0]]]Now, create the model object and insert the problem parameters.
#Create model object
mdl = mdpsolver.model()
#Insert the problem parameters
mdl.mdp(discount=0.8,
rewards=rewards,
tranMatWithZeros=tranMatWithZeros)We can now optimize the policy. MDPSolver uses modified policy iteration as the default optimization algorithm. The optimization algorithm can, however, be easily changed to policy iteration or value iteration.
mdl.solve()The optimized policy can be returned in a variety of ways. Here, we return the policy as a list and print directly in the terminal.
print(mdl.getPolicy())
#[1, 1, 0]mdpsolver has three alternative formats for large and sparse transition probability matrices.
Recall the complete matrix from the example above:
tranMatWithZeros = [[[0.9,0.1,0.0],[0.1,0.9,0.0]],
[[0.4,0.5,0.1],[0.3,0.5,0.2]],
[[0.2,0.2,0.6],[0.5,0.5,0.0]]](1) Elementwise representation:
#[from_state,action,to_state,probability]
tranMatElementwise = [[0,0,0,0.9],
[0,0,1,0.1],
[0,1,0,0.1],
[0,1,1,0.9],
[1,0,0,0.4],
[1,0,1,0.5],
[1,0,2,0.1],
[1,1,0,0.3],
[1,1,1,0.5],
[1,1,2,0.2],
[2,0,0,0.2],
[2,0,1,0.2],
[2,0,2,0.6],
[2,1,0,0.5],
[2,1,1,0.5]]
mdl.mdp(discount=0.8,
rewards=rewards,
tranMatElementwise=tranMatElementwise)(2) Probabilities and column indices in separate lists:
tranMatProbs = [[[0.9,0.1],[0.1,0.9]],
[[0.4,0.5,0.1],[0.3,0.5,0.2]],
[[0.2,0.2,0.6],[0.5,0.5]]]
tranMatColumns = [[[0,1],[0,1]],
[[0,1,2],[0,1,2]],
[[0,1,2],[0,1]]]
mdl.mdp(discount=0.8,
rewards=rewards,
tranMatProbs=tranMatProbs,
tranMatColumns=tranMatColumns)(3) Load the elementwise representation from a file:
transitions.csv
stateFrom,action,stateTo,probability
0,0,0,0.9
0,0,1,0.1
0,1,0,0.1
0,1,1,0.9
1,0,0,0.4
1,0,1,0.5
1,0,2,0.1
1,1,0,0.3
1,1,1,0.5
1,1,2,0.2
2,0,0,0.2
2,0,1,0.2
2,0,2,0.6
2,1,0,0.5
2,1,1,0.5
mdl.mdp(discount=0.8,
rewards=rewards,
tranMatFromFile="transitions.csv")