January-2022-Weekly-Progress

23 January -- 29 January, 2022

Last Week’s Work Review Recall our Phase 1 visual module training scheme Our Phase 2 language module training scheme the testing accuracy could achieve $99.0%$ But I doubt about the true performance of this model. the model can achieve high accuracy by just memorising the current 3D grid value as long as the incremental change is small. which is true for this dataset because one instruction is often paired with a small change in the environment I should change the current training scheme to prevent model from taking shortcuts local addition one step (no intermediate states) After completing language module, our next step is to build PDDL model to bridge the gap between current state to the target state described by the instruction. You need password to access to the content, go to Slack *#phdsukai to find more. ...

16 January -- 22 January, 2022

Last Week’s Work Review key points from last week’s meeting: sequence of instructions matters and how can we capture this key feature e.g., the agent must build pillars before it can build the bell To generate training dataset. We can also use Monte Carlo methods to randomly extract movements from trajectories of human players compared to create random agents from scratch, Monte Carlo imitation methods would generate a more human-like trajectories I finished the visual module training codes on Wednesday this week (a little bit late). But many codes are reusable next time. so far the accuracy of visual module is desirable ($\approx 98.9%$) This week we are going to focus on “text to grids” conversion Originally the second stage is to label the intents of conversation sentences for the “Modular RL model”. However, Trevor and Nir suggested that I can start from simpler works first. One possible task is to rebuild human Builder’s (intermediate) structure based on (partial) instructions from Architect agent. well, are we assuming that human Builder’s actions are perfect solution to the Architect instructions? how can we break this assumption? we can imitate human players first and after that we can still improve the agent by providing them with good reward signal Besides Builder task, Architect task is also very interesting e.g., how to convey instructions to the agent, what would be the best workload for each conversation step? baseline, one block at a time … You need password to access to the content, go to Slack *#phdsukai to find more. ...

09 January -- 15 January, 2022

Last Week’s Work Review Last week I checked the environment codes for IGLU competition. So, this week we should start to build our Modular RL model. You need password to access to the content, go to Slack *#phdsukai to find more. ...

02 January -- 08 January, 2022

Last Week’s Work Review We continue reading the code of the winner model of IGLU competition. Also, we need to check the IGLU slack channel, download the training dataset and understand how to utilise the dataloader tools. You need password to access to the content, go to Slack *#phdsukai to find more. ...