[TOC]

  1. Title: Python Module and Package Management 2023
  2. url: https://www.youtube.com/watch?v=v6tALyc4C10

Python module

common issue 1: python -m dir vs python dir

  • python dir will use the dir as the PYTHONPATH as run __main__.py file.
  • however, python -m dir will not use the dir as PYTHONPATH but rather it will search the local library.
    • thus ModuleNotFoundError may appear

Standard way of import when you want to create a module

  • use absolute path -> from moduledir/packagename import filename
  • avoid using relative path
    • e.g., from .xxx import yyy
  • if within the same module, you can just use from xxx import yyy

However, in a complicated project where there are too many modules.

  • one workaround is that we still make it a module, but try to append folder path to the system path.
1
2
import os, sys
sys.path.append(os.path.dirname(os.path.realpath(__file__)))
  • always use it as the last resort.

Solution: setup pyproject.toml and install it as a proper module

Sample code

  • hydra default.yaml config
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
defaults:
  - _self_

hydra:
  job:
    chdir: True
  run:
    dir: "outputs/${now:%Y-%m-%d}/${monitor.name}_${now:%H-%M-%S}"

monitor:
  use_wandb: False
  project_name: "My Test Program"
  name: "default"
  taglist: []
  notes: None

data:
  if_test: True
  • pyproject.toml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
[build-system]
requires = ["setuptools", "setuptools-scm"]
build-backend = "setuptools.build_meta"

[project]
name = "my_test"
authors = [
    {name = "Sukai Huang", email = "sukaih@student.unimelb.edu.au"},
]
description = "My test package description"
readme = "readme.md"
dependencies = [
    "hydra-core",
    "tqdm"
]
version = "1.0.0"

[tool.setuptools]
include-package-data = true
packages = ["my_test"]
package-dir = {"my_test" = "my_test_folder"}
package-data = {"my_test" = ["**/*"]} # include all of them

[tool.setuptools.packages.find]
exclude = ["config", "outputs", "multirun", "logs"] 

Hydra module

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
import random
import numpy as np
import torch
import wandb
from omegaconf import OmegaConf
import hydra
from hydra.utils import HydraConfig, get_original_cwd

def init_hydra(cfg, log):
    # print config
    wandb_cfg = OmegaConf.to_container(cfg, resolve=True)
    log.info("--------Model Config---------")
    log.info(OmegaConf.to_yaml(cfg))
    log.info("-----End of Model Config-----")
    projectname = cfg.monitor.project_name
    jobname = cfg.monitor.name
    tagslist = cfg.monitor.taglist
    notes = cfg.monitor.notes
    if cfg.monitor.use_wandb and cfg.config.mode == "train":
        wandb.init(
            project=projectname,
            name=jobname,
            notes=notes,
            tags=tagslist,
            config=wandb_cfg
        )
    # set up random seed
    np.random.seed(cfg.CONSTANT.RANDOM_SEED)
    torch.manual_seed(cfg.CONSTANT.RANDOM_SEED)
    random.seed(cfg.CONSTANT.RANDOM_SEED)

    return cfg
  
log = logging.getLogger(__name__)


@hydra.main(version_base=None, config_path="config", config_name="default")
def main(cfg):
    # print config
    cfg = init_hydra(cfg, log=log)
  • you can add this line for debugging
1
2
3
4
5
if __name__ == "__main__":
    with initialize(config_path="../../config"):
        cfg = compose(config_name="default")
        cfg = init_hydra(cfg)
    get_original_cwd = lambda: os.environ.get("PYTHONPATH", os.getcwd())
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
{
    "python.languageServer": "Pylance",
    "python.analysis.extraPaths": [ ".venv/lib/python3.9/site-packages"],
    "terminal.integrated.env.linux": {
        "PYTHONPATH": "${workspaceFolder}"
    },
    "python.envFile": "${workspaceFolder}/.env",
    
    "psi-header.variables": [
        ["projectCreationYear", "2023"],
        ["authoremail", "sukaih@student.unimelb.edu.au"],
        ["projectname", "Modular RL from LLM"],
        ["company", "Sukai@Unimelb NLP Group"],
    ],
    "psi-header.templates": [
        {
            "language": "*",
            "template": [
                "File Created: <<filecreated('dddd, Do MMMM YYYY h:mm:ss a')>>",
                "Author: <<author>> (<<authoremail>>)",
                "-----",
                "Last Modified: <<dateformat('dddd, Do MMMM YYYY h:mm:ss a')>>",
                "Modified By: <<author>> (<<authoremail>>>)",
                "-----",
                "Copyright <<projectCreationYear>> - <<year>> by <<company>>",
                "All rights reserved.",
                "This file is part of The <<projectname>> Project,",
                "and is released under The MIT License. Please see the LICENSE",
                "file that should have been included as part of this project."
            ]
        }
    ],
    "[python]": {
        "editor.defaultFormatter": "ms-python.autopep8"
    }
}
  • a sample structure can be
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
root
|   pyproject.toml
|   README.md
|   .venv
|   .vscode
|   .env
|   
|
+---your_project_name
|   +---config
|   |   |       default.yaml  
|   |
|   |   __main__.py
|   |       
|   +---pkg1
|   |       t1.py
|   |       __init__.py
|   |       
|   +---pkg2
|   |   |   t2.py
|   |   |   __init__.py
|   |   |   
  • once we setup the toml file, we can use pip install -e .

Extra: where do these square brackets do in pip install

  • we often see
1
pip install "splinter[django]"
  • you are installing splinter package which has the added support for django, the django is the extra dependencies that require further installation.
  • in pyproject.toml, we have the following config to control the extra dependencies
1
2
3
[project.optional-dependencies]
pdf = ["ReportLab>=1.2", "RXP"]
rest = ["docutils>=0.3", "pack ==1.1, ==1.3"]

Extra: what is [tool.setuptools.packages.find]

  • image-20230528181250046

Extra: useful tools that can integrate within pyproject.toml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
[tool.black]   # format code
line-length = 88
target-version = ["py38"]
# 'extend-exclude' excludes files or directories in addition to the defaults
extend-exclude = """
# A regex preceded with ^/ will apply only to files and directories
# in the root of the project.
(
  llm_genplan/third_party
)
"""

[tool.isort]  # format import libs
py_version = 38
profile = "black"
multi_line_output = 2
skip_glob = ["venv/*", "llm_genplan/third_party/*"]
split_on_trailing_comma = true

[tool.mypy]  # add static type comments
strict_equality = true
disallow_untyped_calls = true
warn_unreachable = true
exclude = ["venv/*", "llm_genplan/third_party/*"]
follow_imports = "skip"

[[tool.mypy.overrides]]  # we specify certain modules that we want to ignore the mypy static type check
module = [
    "matplotlib.*",
    "pddlgym.*",
    "pyperclip.*",
    "pandas.*",
]
ignore_missing_imports = true

Extra: update gitignore

  • when we want to update gitignore, often we need to remove all the cache and then add it back again
1
2
3
4
git rm -r --cached .
git add .
git commit -m'.gitignore update'
git push origin master

Useful VSCode python extension

  1. Python Extension Package
  2. GitGraph
  3. Remote - SSH
  4. Jupyter
  5. file-size
  6. psioniq File Header
  7. Better Comments
  8. Choose a License
  9. latex workshop
  10. code spell checker
  11. black formatter