Low-Discrepancy Action Selection in Markov Decision Processes

Document Type

Presentation

Presentation Date

3-2018

Abstract or Description

Presentation given at the MAA Southeast Section.

Abstract

In a Markov Decision Process, an agent must learn to choose actions in order to optimally navigate a Markovian environment. When the system dynamics are unknown and the agent's behavior is learned from data, the problem is known as Reinforcement Learning. In theory, for the learned behavior to converge to the optimal behavior, data must be collected from every state-action combination infinitely often. Therefore in practice, the methodology the agent uses to explore the environment is critical to learning approximately optimal behavior from a reasonable amount of data. This paper discusses the benefits of augmenting existing exploration strategies by choosing from actions in a low-discrepancy manner. When the state and action spaces are discrete, actions are selected uniformly from those who have been tried the least number of times. When the state and action spaces are continuous, quasi-random sequences are used to select actions. The superiority of this strategy over purely random action selection is demonstrated by proof for a simple discrete MDP, and empirically for more complex processes

Sponsorship/Conference/Institution

MAA Southeast Section

Location

Clemson, SC

This document is currently not available here.

Share

COinS