05 Jul -- 31 Jul, 2023
Previous Work Review we have the briefing now we continue the experiments and continue writing You need password to access to the content, go to Slack *#phdsukai to find more. ...
Previous Work Review we have the briefing now we continue the experiments and continue writing You need password to access to the content, go to Slack *#phdsukai to find more. ...
Previous Work Review continue investigating You need password to access to the content, go to Slack *#phdsukai to find more. ...
Previous Work Review we continue the two directions and further preliminary tests needs to be conducted You need password to access to the content, go to Slack *#phdsukai to find more. ...
Previous Work Review we focus on how LLM can assists planning (using their reasoning ability) You need password to access to the content, go to Slack *#phdsukai to find more. ...
Previous Work Review we have three things pitfall of LRS paper ALFRED project Visualising language instructions as lines project You need password to access to the content, go to Slack *#phdsukai to find more. ...
Previous Work Review we have decided to improve the exploration strategy of ServiceNow model You need password to access to the content, go to Slack *#phdsukai to find more. ...
Previous Work Review we continue to understand the code of the ServiceNow model last week we briefly understand the whole pipeline of the model this week we need to know how the model generate the PDDL domain and problem file. emnlp 2023 call for papers direct paper submission deadline June 23, 2023 maybe we can try to get this work done and come up with one paper for emnlp2023 Revise and resubmit Language Reward Shaping paper Shall we submit it to NeurIPS 2023? You need password to access to the content, go to Slack *#phdsukai to find more. ...
Previous Work Review We get the source code from Mr Xiaotian Liu @ ServiceNow Co. However, the code is not executable because he hasn’t shared with me some necessary datasets. I have sent an email to him to ask for the missing datasets. You need password to access to the content, go to Slack *#phdsukai to find more. ...
Previous Work Review We will work on the two projects ALFRED environment Visualise planned path as a line You need password to access to the content, go to Slack *#phdsukai to find more. ...
Previous Work Review We finish investigating the Language Reward Shaping model and we find out that it is slower than a vanilla PPO+RND learning agent. We found out that rewarding to partially matched trajectories significantly slows down the learning speed. Now we should move forward to the next research questions. You need password to access to the content, go to Slack *#phdsukai to find more. ...
Work Review Start to write negative paper You need password to access to the content, go to Slack *#phdsukai to find more. ...
Last Week’s Work Review for the last month we found out that the agent had the following issues it gives more weights to the object detection than action detection, thus it gave wrong rewards when the agent was close to the target object but did the wrong action e.g., it gave high rewards for sentence “climb down the ladder” when the agent was staying on the ladder it tried to analyse each single word instead of recognising the phrases and the qualifier e.g., it gave rewards for sentence “go to the ladder on the right” when the agent was at the ladder on the middle e.g., it gave reward for sentence “climb down the ladder on the right” when the agent was jumping to the right platform the module cannot correctly handle the phrase “the ladder on the right”, instead, it treated it as two things – “ladder” and “right” our solution is to generate hard negative examples by replacing the noun phrase and the verb phrase in the original sentence with random phrases picked from the phrase set. and I call it “phrase polluted hard negative example generation” You need password to access to the content, go to Slack *#phdsukai to find more. ...
Last Week’s Work Review We finally complete both the language reward shaping module and the language reward shaping RL agent. This month we are going to upgrade and refine the reward shaping approach. There are some issues for the current approach the RL environment config setting is not in the standard way (standard -> deepmind way) The whole training is quite heavy ( 60 it/sec -> ~46 hours to train 10M steps ) It took too much ram space (25.1 GB for 1 gym env) You need password to access to the content, go to Slack *#phdsukai to find more. ...
Last Week’s Work Review We have the Goyal’s code and we can directly test their model and our baseline You need password to access to the content, go to Slack *#phdsukai to find more. ...
Last Week’s Work Review Most importantly, we need to pass the oral presentation in the confirmation review. You need password to access to the content, go to Slack *#phdsukai to find more. ...
Last Week’s Work Review for vision modules, start with the simple ones ConvNext Go-Explore object recognition (Montezuma’s Revenge-specific) https://github.com/mcmachado/b-pro B-PRO implement the code based on the notes in https://sino-huang.github.io/weekly-report/12-jun-18-jun-2022/ You need password to access to the content, go to Slack *#phdsukai to find more. ...
Last Week’s Work Review We decided that we should reproduce Goyal’s work first, with a little bit modification. You need password to access to the content, go to Slack *#phdsukai to find more. ...
Last Week’s Work Review We decided our concrete next steps to do. You need password to access to the content, go to Slack *#phdsukai to find more. ...
Last Week’s Work Review Option recognition The choice of actions and policies based on natural language walkthrough is highly similar to “option recognition” in traditional AI study, where natural language walkthrough will help agents to make options for subpolicies selection. When to start /end When we want to let agent utilise walkthrough data, the agent needs to know when to execute the walkthrough. it needs to know what the current situation is and then know what we have already done where are we, what to execute next what to do next if the previous execution failed currently they are not explicitly handled Reuse object recognition module We can use go-explore’s object recognition module You need password to access to the content, go to Slack *#phdsukai to find more. ...
You need password to access to the content, go to Slack *#phdsukai to find more. ...
You need password to access to the content, go to Slack *#phdsukai to find more. ...
Last Week’s Work Review besides the TODOs, we can think about how to utilise the walkthrough info in this way treat actions as queries KNN to find relevant sentence from WIKI walkthrough Run the AI policy and judge human players’ trajectories, the AI policy would find surprising moves from human trajectories. Then AI can find contents from WIKI that support human players’ moves. You need password to access to the content, go to Slack *#phdsukai to find more. ...
Last Week’s Work Review Continue to work on the baseline models for NetHack Challenge You need password to access to the content, go to Slack *#phdsukai to find more. ...
Last Week’s Work Review Our first step should be writing codes for our baseline RL model, and after that we can try to add additional language interpreter on it and see if we can improve the performance by interpreting the guidebook we now have two things to do build baseline RL model for both NetHack and MiniHack environment then we try to feed language data into the model. decision transformer model seems a future proof model to embed language information build a user-friendly and useful annotation tool for annotators. can record the gameplay can annotate the objects can add instructions You need password to access to the content, go to Slack *#phdsukai to find more. ...
Last Week’s Work Review do not restrict what people annotate, do not limit the vocabulary… we can use modern BERT model to interpret natural language utterances. before we dive into the conversion from natural language utterances into logical forms, we can try to use general NLP models to give a end to end trial first… You need password to access to the content, go to Slack *#phdsukai to find more. ...