Low-Discrepancy Action Selection in Markov Decision Processes
Abstract or Description
Presentation given at the MAA Southeast Section.
In a Markov Decision Process, an agent must learn to choose actions in order to optimally navigate a Markovian environment. When the system dynamics are unknown and the agent's behavior is learned from data, the problem is known as Reinforcement Learning. In theory, for the learned behavior to converge to the optimal behavior, data must be collected from every state-action combination infinitely often. Therefore in practice, the methodology the agent uses to explore the environment is critical to learning approximately optimal behavior from a reasonable amount of data. This paper discusses the benefits of augmenting existing exploration strategies by choosing from actions in a low-discrepancy manner. When the state and action spaces are discrete, actions are selected uniformly from those who have been tried the least number of times. When the state and action spaces are continuous, quasi-random sequences are used to select actions. The superiority of this strategy over purely random action selection is demonstrated by proof for a simple discrete MDP, and empirically for more complex processes
MAA Southeast Section
Carden, Stephen W..
"Low-Discrepancy Action Selection in Markov Decision Processes."
Department of Mathematical Sciences Faculty Presentations.