# Rigorous Systems Research Group (RSRG) Seminar

Tuesday, June 28, 2022
3:00pm to 4:00pm
In this work we study how to tackle decision-making for safety-critical systems under uncertainty. To that end, we formulate Reinforcement Learning with Almost Sure constraints, in which one seeks a policy that allows no more than $\Delta\in\mathbb{N}$ unsafe events in any trajectory, with probability one. We argue that this type of constraint might be better suited for safety-critical systems as opposed to the usual average constraint employed in Constrained Markov Decision Processes, and that moreover, having constraints of this kind makes feasible policies much easier to find.
The talk is didactically split in two parts, first considering $\Delta=0$ and then the $\Delta\geq 0$ case. At the core of our theory is a barrier-based decomposition of the Q-function, that decouples the problems of optimality and feasibility and allows them to be learned either independently or in conjunction. We develop an algorithm for characterizing the set of all feasible policies that provably converges in expected finite time. We further develop sample-complexity bounds for learning this set with high probability. Simulations corroborate our theoretical findings and showcase how our algorithm can be wrapped around other learning algorithms to hasten the search of first feasible and then optimal policies.