[TOC]

  1. Title: How to Design Your Research Project Structure
  2. Review Date: Fri, Feb 2, 2024

Basic Component

  • Policy model
  • Environment
  • Reward model
  • Evaluation
  • Logging and Result Postprocess
  • Training
    • train reward
    • train policy
  • Dataloader
  • Utils
    • Logging Setup
    • Multiprocessing Helper
    • etc

it’s better to have interface class in the __init__.py

Separate subproject for Data collection and PreTraining

  • Environment simulation running
    • how to save label, how to save images, align with the requirement of dali.
    • how to generate tasks, how to generate text
  • Pretrained CLIP/XLIP model
    • Dataloader
    • Model
  • Finetuning reward model
    • Training
    • Evaluation
    • Dataloader

How to git ignore data above 1M

  1. Open your terminal.
  2. Use the find command to list all files over 1M in size in your repository directory and append them to .gitignore:
1
find . -size +1M | sed 's|^\./||' >> .gitignore

This command works as follows:

  • find . -size +1M finds all files over 1M in size in the current directory and subdirectories.
  • sed 's|^\./||' removes the leading ./ from the file paths.
  • >> .gitignore appends the output to .gitignore.

After running this command, you might want to review and edit .gitignore to ensure it only contains the entries you want to ignore.

Extra: update gitignore

  • when we want to update gitignore, often we need to remove all the cache and then add it back again
1
2
3
4
git rm -r --cached . 
git add . 
git commit -m'.gitignore update' 
git push origin master