In this PhD research, we aim to bridge the gap between the two approaches
by developing a theory of informed RL, where information—also called advice—
is generated and used in a way that limits performance losses. In our
view, an RL-informed algorithm can get advice in various ways. It could be derived
from running trajectories on a simulator, knowledge sharing by a trained
agent, expert advice, learning to solve a related task, or any available source of
knowledge.
The first task is to investigate the impact of limited, inexact, and adversarial
advice. Following this, the second task focuses on generating the advice itself
and understanding the interaction between its generation and usage. Finally,
the third task involves applying this theory to real industrial data.