Quick start guide

The following shows how to get quickly started with mdpsolver.

A small example

The following is an example of a simple MDP containing three states and two actions in each state. Note however that MDPSolver does not require all states to contain the same number of actions.

#Import packages
import mdpsolver

#Rewards (3 states x 2 actions)
#e.g. choosing second action in first state gives reward=-1
rewards = [[5,-1],
           [1,-2],
           [50,0]]

#Transition probabilities (3 from_states x 2 actions x 3 to_states),
#e.g. choosing first action in third state gives a probability of 0.6 of staying in the third state.
#Note that we also provide support for sparse matrices.
tranMatWithZeros = [[[0.9,0.1,0.0],[0.1,0.9,0.0]],
                    [[0.4,0.5,0.1],[0.3,0.5,0.2]],
                    [[0.2,0.2,0.6],[0.5,0.5,0.0]]]

Now, create the model object and insert the problem parameters.

#Create model object
mdl = mdpsolver.model()

#Insert the problem parameters
mdl.mdp(discount=0.8,
        rewards=rewards,
        tranMatWithZeros=tranMatWithZeros)

We can now optimize the policy. MDPSolver uses modified policy iteration as the default optimization algorithm. The optimization algorithm can, however, be easily changed to policy iteration or value iteration.

mdl.solve()

The optimized policy can be returned in a variety of ways. Here, we return the policy as a list and print directly in the terminal.

print(mdl.getPolicy())
#[1, 1, 0]

Sparse transition matrix

mdpsolver has three alternative formats for large and sparse transition probability matrices.

Recall the complete matrix from the example above:

tranMatWithZeros = [[[0.9,0.1,0.0],[0.1,0.9,0.0]],
                    [[0.4,0.5,0.1],[0.3,0.5,0.2]],
                    [[0.2,0.2,0.6],[0.5,0.5,0.0]]]

(1) Elementwise representation:

#[from_state,action,to_state,probability]
tranMatElementwise = [[0,0,0,0.9],
                      [0,0,1,0.1],
                      [0,1,0,0.1],
                      [0,1,1,0.9],
                      [1,0,0,0.4],
                      [1,0,1,0.5],
                      [1,0,2,0.1],
                      [1,1,0,0.3],
                      [1,1,1,0.5],
                      [1,1,2,0.2],
                      [2,0,0,0.2],
                      [2,0,1,0.2],
                      [2,0,2,0.6],
                      [2,1,0,0.5],
                      [2,1,1,0.5]]

mdl.mdp(discount=0.8,
        rewards=rewards,
        tranMatElementwise=tranMatElementwise)

(2) Probabilities and column indices in separate lists:

tranMatProbs = [[[0.9,0.1],[0.1,0.9]],
                [[0.4,0.5,0.1],[0.3,0.5,0.2]],
                [[0.2,0.2,0.6],[0.5,0.5]]]

tranMatColumns = [[[0,1],[0,1]],
                [[0,1,2],[0,1,2]],
                [[0,1,2],[0,1]]]

mdl.mdp(discount=0.8,
        rewards=rewards,
        tranMatProbs=tranMatProbs,
        tranMatColumns=tranMatColumns)

(3) Load the elementwise representation from a file:

transitions.csv

stateFrom,action,stateTo,probability
0,0,0,0.9
0,0,1,0.1
0,1,0,0.1
0,1,1,0.9
1,0,0,0.4
1,0,1,0.5
1,0,2,0.1
1,1,0,0.3
1,1,1,0.5
1,1,2,0.2
2,0,0,0.2
2,0,1,0.2
2,0,2,0.6
2,1,0,0.5
2,1,1,0.5

mdl.mdp(discount=0.8,
        rewards=rewards,
        tranMatFromFile="transitions.csv")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Quick start guide

A small example

Sparse transition matrix

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally