Skip to content

Optimize package footprint by removing unnecessary deps #1077

Open
@vdusek

Description

@vdusek

Description

The current installation of the Crawlee package, when combined with all its direct and transitive dependencies, occupies ~75 MB. For context, Scrapy occupies ~88.4 MB.

$ (venv)  du -h venv/ --max-depth 0
6,6M	venv/
(venv) $ pip install crawlee
...
(venv) $ du -h venv/ --max-depth 0
82M	venv/

The large size is primarily due to several dependencies that may not be strictly necessary for the core functionality. Below is a detailed breakdown of the dependency tree, individual package sizes, and the summed sizes of Crawlee's direct dependencies.

Dependency tree

crawlee v0.6.4
    ├── apify-fingerprint-datapoints v0.0.2
    ├── browserforge v1.2.3
    │   └── click v8.1.8
    ├── cachetools v5.5.2
    ├── colorama v0.4.6
    ├── docutils v0.21.2
    ├── eval-type-backport v0.2.2
    ├── httpx[brotli, http2, zstd] v0.28.1
    │   ├── anyio v4.8.0
    │   │   ├── idna v3.10
    │   │   └── sniffio v1.3.1
    │   ├── certifi v2025.1.31
    │   ├── httpcore v1.0.7
    │   │   ├── certifi v2025.1.31
    │   │   └── h11 v0.14.0
    │   ├── idna v3.10
    │   ├── brotli v1.1.0 (extra: brotli)
    │   ├── h2 v4.2.0 (extra: http2)
    │   │   ├── hpack v4.1.0
    │   │   └── hyperframe v6.1.0
    │   └── zstandard v0.23.0 (extra: zstd)
    ├── more-itertools v10.6.0
    ├── psutil v7.0.0
    ├── pydantic v2.10.6
    │   ├── annotated-types v0.7.0
    │   ├── pydantic-core v2.27.2
    │   │   └── typing-extensions v4.12.2
    │   └── typing-extensions v4.12.2
    ├── pydantic-settings v2.6.1
    │   ├── pydantic v2.10.6 (*)
    │   └── python-dotenv v1.0.1
    ├── pyee v12.1.1
    │   └── typing-extensions v4.12.2
    ├── rich v13.9.4
    │   ├── markdown-it-py v3.0.0
    │   │   └── mdurl v0.1.2
    │   └── pygments v2.19.1
    ├── sortedcollections v2.1.0
    │   └── sortedcontainers v2.4.0
    ├── tldextract v5.1.3
    │   ├── filelock v3.17.0
    │   ├── idna v3.10
    │   ├── requests v2.32.3
    │   │   ├── certifi v2025.1.31
    │   │   ├── charset-normalizer v3.4.1
    │   │   ├── idna v3.10
    │   │   └── urllib3 v2.3.0
    │   └── requests-file v2.1.0
    │       └── requests v2.32.3 (*)
    ├── typing-extensions v4.12.2
    └── yarl v1.18.3
        ├── idna v3.10
        ├── multidict v6.1.0
        └── propcache v0.3.0

Package sizes

23M .venv/lib/python3.13/site-packages/zstandard
7,2M    .venv/lib/python3.13/site-packages/_brotli.cpython-313-x86_64-linux-gnu.so
4,9M    .venv/lib/python3.13/site-packages/pygments
4,7M    .venv/lib/python3.13/site-packages/pydantic_core
2,4M    .venv/lib/python3.13/site-packages/docutils
1,9M    .venv/lib/python3.13/site-packages/pydantic
1,1M    .venv/lib/python3.13/site-packages/yarl
1,1M    .venv/lib/python3.13/site-packages/rich
1,1M    .venv/lib/python3.13/site-packages/crawlee
1,0M    .venv/lib/python3.13/site-packages/psutil
836K    .venv/lib/python3.13/site-packages/apify_fingerprint_datapoints
784K    .venv/lib/python3.13/site-packages/propcache
484K    .venv/lib/python3.13/site-packages/urllib3
456K    .venv/lib/python3.13/site-packages/multidict
452K    .venv/lib/python3.13/site-packages/charset_normalizer
436K    .venv/lib/python3.13/site-packages/anyio
376K    .venv/lib/python3.13/site-packages/markdown_it
368K    .venv/lib/python3.13/site-packages/tldextract
368K    .venv/lib/python3.13/site-packages/click
352K    .venv/lib/python3.13/site-packages/idna
328K    .venv/lib/python3.13/site-packages/httpx
324K    .venv/lib/python3.13/site-packages/httpcore
308K    .venv/lib/python3.13/site-packages/certifi
260K    .venv/lib/python3.13/site-packages/h2
236K    .venv/lib/python3.13/site-packages/h11
228K    .venv/lib/python3.13/site-packages/requests
228K    .venv/lib/python3.13/site-packages/more_itertools
228K    .venv/lib/python3.13/site-packages/hpack
136K    .venv/lib/python3.13/site-packages/pydantic_settings
132K    .venv/lib/python3.13/site-packages/typing_extensions.py
124K    .venv/lib/python3.13/site-packages/sortedcontainers
120K    .venv/lib/python3.13/site-packages/browserforge
80K .venv/lib/python3.13/site-packages/colorama
60K .venv/lib/python3.13/site-packages/filelock
52K .venv/lib/python3.13/site-packages/pyee
52K .venv/lib/python3.13/site-packages/dotenv
44K .venv/lib/python3.13/site-packages/hyperframe
36K .venv/lib/python3.13/site-packages/mdurl
36K .venv/lib/python3.13/site-packages/cachetools
28K .venv/lib/python3.13/site-packages/sortedcollections
24K .venv/lib/python3.13/site-packages/annotated_types
16K .venv/lib/python3.13/site-packages/sniffio
16K .venv/lib/python3.13/site-packages/eval_type_backport
8,0K    .venv/lib/python3.13/site-packages/_virtualenv.py
8,0K    .venv/lib/python3.13/site-packages/requests_file.py
4,0K    .venv/lib/python3.13/site-packages/_virtualenv.pth
4,0K    .venv/lib/python3.13/site-packages/brotli.py

Extracted via:

du -sh .venv/lib/python*/site-packages/* | sort -hr

Total size per direct dependency

  • httpx[brotli, http2, zstd]: 32.732M
  • pydantic‑settings: 6.944M
  • pydantic: 6.756M
  • rich: 6.412M
  • yarl: 2.692M
  • docutils: 2.4M
  • tldextract: 2.260M
  • psutil: 1.0M
  • apify‑fingerprint‑datapoints: 836K
  • browserforge: 488K
  • more‑itertools: 228K
  • pyee: 184K
  • sortedcollections: 152K
  • typing‑extensions: 132K
  • colorama: 80K
  • cachetools: 36K
  • eval‑type‑backport: 16K

Goal

The goal is to identify and potentially remove or replace dependencies that contribute significantly to the overall package size without compromising its functionality.

Metadata

Metadata

Assignees

Labels

t-toolingIssues with this label are in the ownership of the tooling team.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions