Hierarchical Reinforcement Learning

[TOC] Title: Deep RL With Hierarchical Action Exploration for Dialogue Generation Author: Itsugun Cho et. al. Publish Year: 22 Mar 2023 Review Date: Thu, Mar 30, 2023 url: https://arxiv.org/pdf/2303.13465v1.pdf Summary of paper Motivation Approximate dynamic programming applied to dialogue generation involves policy improvement with action sampling. However, such a practice is inefficient for reinforcement learning because the eligible (high action value) responses are very sparse, and the greedy policy sustained by the random sampling is flabby. Contribution this paper shows that the performance of dialogue policy positively correlated with sampling size by theoretical and experimental. we introduce a novel dual-granularity Q-function to alleviate this limitation by exploring the most promising response category to intervene the sampling. Some key terms limitation of the maximum likelihood estimation (MLE) objective for the probability distribution of responses ...