01 Oct -- 31 Oct, 2022
Last Week’s Work Review We finally complete both the language reward shaping module and the language reward shaping RL agent. This month we are going to upgrade and refine the reward shaping approach. There are some issues for the current approach the RL environment config setting is not in the standard way (standard -> deepmind way) The whole training is quite heavy ( 60 it/sec -> ~46 hours to train 10M steps ) It took too much ram space (25.1 GB for 1 gym env) You need password to access to the content, go to Slack *#phdsukai to find more. ...