
CMUCS05127
Computer Science Department
School of Computer Science, Carnegie Mellon University
CMUCS05127
H. Brendan McMahan, Geoffrey J. Gordon
May 2005
CMUCS05127.ps
CMUCS05127.pdf
Keywords: Planning under uncertainty, Markov Decision Processes,
Prioritized Sweeping, Dijsktra s Algorithm, Gaussian Elimination,
Value Iteration, Improved Prioritized Sweeping (IPS), Prioritized
Policy Iteration (PPI), GaussDijsktra Elimination (GDE)
We study the problem of computing the optimal value function for a
Markov decision process with positive costs. Computing this function
quickly and accurately is a basic step in many schemes for deciding
how to act in stochastic environments. There are efficient algorithms
which compute value functions for special types of MDPs: for
deterministic MDPs with S states and A actions, Dijkstra's
algorithm runs in time O(AS log S). And, in singleaction MDPs
(Markov chains), standard linearalgebraic algorithms find the value
function in time O(S^{3}), or faster by taking
advantage of sparsity or good conditioning. Algorithms for solving
general MDPs can take much longer: we are not aware of any speed
guarantees better than those for comparablysized linear programs.
We present a family of algorithms which reduce to Dijkstra's algorithm
when applied to deterministic MDPs, and to standard techniques for
solving linear equations when applied to Markov chains. More
importantly, we demonstrate experimentally that these algorithms
perform well when applied to MDPs which "almost" have the required
special structure.
24 pages
