A sojourn-based approach to discrete-time semi-Markov decision processes
Seminario DSS in modalità blended Mercoledì 11 maggio 2022 Ore 14:30, Aula 34 (quarto piano), Dipartimento di Scienze Statistiche, Sapienza Università di Roma e online al seguente link meet.google.com/dwz-gvib-uoj A sojourn-based approach to discrete-time semi-Markov decision processes Abstract: Up to now, there is an extensive literature on stepped continuous-time semi-Markov decision processes. Such models generalize the classical continuous-time Markov chain decision processes by allowing for different distributions of interevent times. However, in such models, the decision is always taken in a change of state, so that it influences a priori the distribution of the inter-event times. For such a reason, these models do not allow for decision depending on intermediate observations of the system. In this talk we address the problem of intermediate observations by modelling the system as a finite horizon discrete-time semi-Markov process, thus assuming discrete observations of the system, and allowing possible decisions at each time step. To do this, we first provide an alternative characterization of discrete-time semi-Markov processes in terms of a bivariate Markov chain, involving both the state and the sojourn time of the process. Once this is done, one can use the dynamic programming principle to exploit optimal policies for the discrete-time semi-Markov decision process by resorting to the bivariate Markov formulation. With this approach we are able to exhibit the Bellman’s equation for both the value and the quality function. The latter is then used to develop a model-free reinforcement learning algorithm based on Watkins and Dayan’s Qlearning algorithm, together with a first naive attempt towards a deep reinforcement learning one. Two exploratory toy examples are provided. This is based on a joint work with Salvatore Cuomo from Università degli Studi di Napoli Federico II.
11/05/2022 - 14:30