05 Jul -- 31 Jul, 2023

Previous Work Review we have the briefing now we continue the experiments and continue writing You need password to access to the content, go to Slack *#phdsukai to find more. ...

<span title='2023-07-05 00:35:22 +1000 AEST'>July 5, 2023</span>&nbsp;·&nbsp;7 min&nbsp;·&nbsp;Sukai Huang

26 Jun -- 30 Jun, 2023

Previous Work Review continue investigating You need password to access to the content, go to Slack *#phdsukai to find more. ...

<span title='2023-06-26 13:14:16 +1000 AEST'>June 26, 2023</span>&nbsp;·&nbsp;20 min&nbsp;·&nbsp;Sukai Huang

07 Jun -- 14 Jun, 2023

Previous Work Review we continue the two directions and further preliminary tests needs to be conducted You need password to access to the content, go to Slack *#phdsukai to find more. ...

<span title='2023-06-07 22:49:20 +1000 AEST'>June 7, 2023</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;Sukai Huang

01 Jun -- 06 Jun, 2023

Previous Work Review we focus on how LLM can assists planning (using their reasoning ability) You need password to access to the content, go to Slack *#phdsukai to find more. ...

<span title='2023-05-29 17:07:51 +1000 AEST'>May 29, 2023</span>&nbsp;·&nbsp;10 min&nbsp;·&nbsp;Sukai Huang

15 May -- 21 May, 2023

Previous Work Review we have three things pitfall of LRS paper ALFRED project Visualising language instructions as lines project You need password to access to the content, go to Slack *#phdsukai to find more. ...

<span title='2023-05-15 10:56:05 +1000 AEST'>May 15, 2023</span>&nbsp;·&nbsp;14 min&nbsp;·&nbsp;Sukai Huang

10 Apr -- 16 Apr, 2023

Previous Work Review we have decided to improve the exploration strategy of ServiceNow model You need password to access to the content, go to Slack *#phdsukai to find more. ...

<span title='2023-04-12 10:24:37 +0800 +0800'>April 12, 2023</span>&nbsp;·&nbsp;1 min&nbsp;·&nbsp;Sukai Huang

12 Mar -- 18 Mar, 2023

Previous Work Review we continue to understand the code of the ServiceNow model last week we briefly understand the whole pipeline of the model this week we need to know how the model generate the PDDL domain and problem file. emnlp 2023 call for papers direct paper submission deadline June 23, 2023 maybe we can try to get this work done and come up with one paper for emnlp2023 Revise and resubmit Language Reward Shaping paper Shall we submit it to NeurIPS 2023? You need password to access to the content, go to Slack *#phdsukai to find more. ...

<span title='2023-03-10 18:12:42 +1100 AEDT'>March 10, 2023</span>&nbsp;·&nbsp;6 min&nbsp;·&nbsp;Sukai Huang

01 Mar -- 11 Mar, 2023

Previous Work Review We get the source code from Mr Xiaotian Liu @ ServiceNow Co. However, the code is not executable because he hasn’t shared with me some necessary datasets. I have sent an email to him to ask for the missing datasets. You need password to access to the content, go to Slack *#phdsukai to find more. ...

<span title='2023-03-06 14:43:14 +1100 AEDT'>March 6, 2023</span>&nbsp;·&nbsp;8 min&nbsp;·&nbsp;Sukai Huang

12 Feb -- 28 Feb, 2023

Previous Work Review We will work on the two projects ALFRED environment Visualise planned path as a line You need password to access to the content, go to Slack *#phdsukai to find more. ...

<span title='2023-02-12 09:53:46 +1100 AEDT'>February 12, 2023</span>&nbsp;·&nbsp;4 min&nbsp;·&nbsp;Sukai Huang

01 Feb -- 11 Feb, 2023

Previous Work Review We finish investigating the Language Reward Shaping model and we find out that it is slower than a vanilla PPO+RND learning agent. We found out that rewarding to partially matched trajectories significantly slows down the learning speed. Now we should move forward to the next research questions. You need password to access to the content, go to Slack *#phdsukai to find more. ...

<span title='2023-01-29 23:15:54 +1100 AEDT'>January 29, 2023</span>&nbsp;·&nbsp;7 min&nbsp;·&nbsp;Sukai Huang

1 Dec -- 31 Dec, 2022

Work Review Start to write negative paper You need password to access to the content, go to Slack *#phdsukai to find more. ...

<span title='2022-12-16 19:25:59 +1100 AEDT'>December 16, 2022</span>&nbsp;·&nbsp;7 min&nbsp;·&nbsp;Sukai Huang

01 Nov -- 30 Nov, 2022

Last Week’s Work Review for the last month we found out that the agent had the following issues it gives more weights to the object detection than action detection, thus it gave wrong rewards when the agent was close to the target object but did the wrong action e.g., it gave high rewards for sentence “climb down the ladder” when the agent was staying on the ladder it tried to analyse each single word instead of recognising the phrases and the qualifier e.g., it gave rewards for sentence “go to the ladder on the right” when the agent was at the ladder on the middle e.g., it gave reward for sentence “climb down the ladder on the right” when the agent was jumping to the right platform the module cannot correctly handle the phrase “the ladder on the right”, instead, it treated it as two things – “ladder” and “right” our solution is to generate hard negative examples by replacing the noun phrase and the verb phrase in the original sentence with random phrases picked from the phrase set. and I call it “phrase polluted hard negative example generation” You need password to access to the content, go to Slack *#phdsukai to find more. ...

<span title='2022-11-02 20:08:39 +1100 AEDT'>November 2, 2022</span>&nbsp;·&nbsp;12 min&nbsp;·&nbsp;Sukai Huang

01 Oct -- 31 Oct, 2022

Last Week’s Work Review We finally complete both the language reward shaping module and the language reward shaping RL agent. This month we are going to upgrade and refine the reward shaping approach. There are some issues for the current approach the RL environment config setting is not in the standard way (standard -> deepmind way) The whole training is quite heavy ( 60 it/sec -> ~46 hours to train 10M steps ) It took too much ram space (25.1 GB for 1 gym env) You need password to access to the content, go to Slack *#phdsukai to find more. ...

<span title='2022-10-05 19:25:25 +1100 AEDT'>October 5, 2022</span>&nbsp;·&nbsp;8 min&nbsp;·&nbsp;Sukai Huang

01 Sep -- 30 Sep, 2022

Last Week’s Work Review We have the Goyal’s code and we can directly test their model and our baseline You need password to access to the content, go to Slack *#phdsukai to find more. ...

<span title='2022-09-03 16:26:36 +1000 AEST'>September 3, 2022</span>&nbsp;·&nbsp;6 min&nbsp;·&nbsp;Sukai Huang

07 Aug -- 31 Aug, 2022

Last Week’s Work Review Most importantly, we need to pass the oral presentation in the confirmation review. You need password to access to the content, go to Slack *#phdsukai to find more. ...

<span title='2022-08-08 17:40:55 +1000 AEST'>August 8, 2022</span>&nbsp;·&nbsp;5 min&nbsp;·&nbsp;Sukai Huang

19 Jun -- 25 Jun, 2022

Last Week’s Work Review for vision modules, start with the simple ones ConvNext Go-Explore object recognition (Montezuma’s Revenge-specific) https://github.com/mcmachado/b-pro B-PRO implement the code based on the notes in https://sino-huang.github.io/weekly-report/12-jun-18-jun-2022/ You need password to access to the content, go to Slack *#phdsukai to find more. ...

<span title='2022-06-19 23:28:00 +1000 AEST'>June 19, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;Sukai Huang

12 Jun -- 18 Jun, 2022

Last Week’s Work Review We decided that we should reproduce Goyal’s work first, with a little bit modification. You need password to access to the content, go to Slack *#phdsukai to find more. ...

<span title='2022-06-09 12:39:31 +1000 AEST'>June 9, 2022</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;Sukai Huang

05 Jun -- 11 Jun, 2022

Last Week’s Work Review We decided our concrete next steps to do. You need password to access to the content, go to Slack *#phdsukai to find more. ...

<span title='2022-06-02 13:41:59 +1000 AEST'>June 2, 2022</span>&nbsp;·&nbsp;13 min&nbsp;·&nbsp;Sukai Huang

29 May -- 04 Jun, 2022

Last Week’s Work Review Option recognition The choice of actions and policies based on natural language walkthrough is highly similar to “option recognition” in traditional AI study, where natural language walkthrough will help agents to make options for subpolicies selection. When to start /end When we want to let agent utilise walkthrough data, the agent needs to know when to execute the walkthrough. it needs to know what the current situation is and then know what we have already done where are we, what to execute next what to do next if the previous execution failed currently they are not explicitly handled Reuse object recognition module We can use go-explore’s object recognition module You need password to access to the content, go to Slack *#phdsukai to find more. ...

<span title='2022-05-30 14:47:30 +1000 AEST'>May 30, 2022</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;Sukai Huang

22 May -- 28 May, 2022

You need password to access to the content, go to Slack *#phdsukai to find more. ...

<span title='2022-05-25 20:15:27 +1000 AEST'>May 25, 2022</span>&nbsp;·&nbsp;6 min&nbsp;·&nbsp;Sukai Huang

15 May -- 21 May, 2022

You need password to access to the content, go to Slack *#phdsukai to find more. ...

<span title='2022-05-18 15:52:30 +1000 AEST'>May 18, 2022</span>&nbsp;·&nbsp;8 min&nbsp;·&nbsp;Sukai Huang

17 April -- 23 April, 2022

Last Week’s Work Review besides the TODOs, we can think about how to utilise the walkthrough info in this way treat actions as queries KNN to find relevant sentence from WIKI walkthrough Run the AI policy and judge human players’ trajectories, the AI policy would find surprising moves from human trajectories. Then AI can find contents from WIKI that support human players’ moves. You need password to access to the content, go to Slack *#phdsukai to find more. ...

<span title='2022-04-18 18:05:46 +1000 AEST'>April 18, 2022</span>&nbsp;·&nbsp;1 min&nbsp;·&nbsp;Sukai Huang

03 April -- 09 April, 2022

Last Week’s Work Review Continue to work on the baseline models for NetHack Challenge You need password to access to the content, go to Slack *#phdsukai to find more. ...

<span title='2022-04-05 11:43:54 +1000 AEST'>April 5, 2022</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;Sukai Huang

20 March -- 2 April, 2022

Last Week’s Work Review Our first step should be writing codes for our baseline RL model, and after that we can try to add additional language interpreter on it and see if we can improve the performance by interpreting the guidebook we now have two things to do build baseline RL model for both NetHack and MiniHack environment then we try to feed language data into the model. decision transformer model seems a future proof model to embed language information build a user-friendly and useful annotation tool for annotators. can record the gameplay can annotate the objects can add instructions You need password to access to the content, go to Slack *#phdsukai to find more. ...

<span title='2022-03-21 14:29:31 +1100 AEDT'>March 21, 2022</span>&nbsp;·&nbsp;8 min&nbsp;·&nbsp;Sukai Huang

13 March -- 19 March, 2022

Last Week’s Work Review do not restrict what people annotate, do not limit the vocabulary… we can use modern BERT model to interpret natural language utterances. before we dive into the conversion from natural language utterances into logical forms, we can try to use general NLP models to give a end to end trial first… You need password to access to the content, go to Slack *#phdsukai to find more. ...

<span title='2022-03-14 14:29:03 +1100 AEDT'>March 14, 2022</span>&nbsp;·&nbsp;4 min&nbsp;·&nbsp;Sukai Huang