Description
Context
This is a tracking issue to recognise that the lack of a site-packages
layout causes friction when making use of third-party distribution packages (wheels and sdists) from indexes such as PyPI.
Outside bazel and rules_python, it is common for distribution packages to assume that they will be installed into a single site-packages folder, either in a "virtual environment" or directly into a python user or global site installation.
Notable examples are the libraries in the AI / ML ecosystem that make use of the nvidia
CUDA shared libraries. These shared libraries contain relative rpath
in the ELF/Mach-O/DLL which fail when not installed as siblings in a site-packages
layout.
There is also a complication introduced into the rules due to lack of the single site-packages folder. Namespace packages in rules_python are all processed into pkg-util style namespace packages. This seems to work, but wouldn't be necessary if site-packages was used.
Another rare issue is failure to load *.pth
files. Python provides Site-specific configuration hooks that can customize the sys.path
at startup. rules_python could workaround this issue perhaps, but if a site-packages
layout was used and discovered by the interpreter at startup, no workarounds would be necessary.
Distribution packages on PyPI known to have issues:
- torch
- onnxruntime-gpu
- rerun-sdk
Known workarounds
- Patch the third-party dependencies using rules_python patching support
- Use an alternative set of rules such as rules_py
- Patch the third-party dependencies outside rules_python and push the patched dependencies to a private index
Related
- CUDA deps cannot be preloaded under Bazel pytorch/pytorch#117350
- Shared library loading logic breaks when CUDA packages are installed in a non-standard location pytorch/pytorch#101314
Proposed design to solve
The basic proposed solution is to create a per-binary virtual env whose site-packages contains symlinks to other locations in runfiles. e.g. ``$runfiles/mybin.venv/site-packages/foowould be a symlink to
$runfiles/_pypi_foo/site-packages/foo`
TODO list
- Add PyInfo.site_packages_symlinks. A depset of site-packages relative paths and runfiles paths to symlink to.
- Make pypi-generated targets use this site-packages solution by default
- Disable pkgutil-style
__init__.py
generation in pypi repo phase - Maybe refactor the pypi generation to use a custom rule instead of plain py_library.
- Disable pkgutil-style
- Add a flag to allow experimentation and testing
- Edge cases
- if two distributions install into the same directory and/or have overlapping files
- Handling pkgutil-style packages
- Interaction of bootstrap=script vs bootstrap=system with this new layout
- Handle platforms/cases where symlinks can't be created at build time (windows, using rules_pkg)
- Handling if multiple versions of a distribution are in the deps and ensuring only one is used, while still respecting merge/conflict logic.