Tao_lei When Attention Meets Fast Recurrence Training Language Models With Reduced Compute 2021

[TOC] Title: When Attention Meets Fast Recurrence: Training Language Models with Reduce Compute Author: Tao Lei Publish Year: Sep 2021 Review Date: Jan 2022 Summary of paper As the author mentioned, the inspiration of SRU++ comes from two lines of research: paralleization / speed problem of Original RNN leveraging recurrence in conjunction with self-attention Structure of SRU++ New discovery :little attention is needed given recurrence. Similar to the observation of Merity (2019), they found using a couple of attention layers sufficient to obtain SOTA results....

<span title='2022-01-14 00:26:37 +1100 AEDT'>January 14, 2022</span>&nbsp;ยท&nbsp;1 min&nbsp;ยท&nbsp;Sukai Huang