Skip to content

Commit 470f9e2

Browse files
authored
Merge branch 'main' into pre/beta
2 parents 006a2aa + e721a49 commit 470f9e2

File tree

149 files changed

+1071
-683
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

149 files changed

+1071
-683
lines changed

.github/FUNDING.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,4 @@ lfx_crowdfunding: # Replace with a single LFX Crowdfunding project-name e.g., cl
1212
polar: # Replace with a single Polar username
1313
buy_me_a_coffee: # Replace with a single Buy Me a Coffee username
1414
thanks_dev: # Replace with a single thanks.dev username
15-
custom:
15+
custom:

.github/ISSUE_TEMPLATE/custom.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,5 +6,3 @@ labels: ''
66
assignees: ''
77

88
---
9-
10-

.github/workflows/release.yml

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -19,21 +19,21 @@ jobs:
1919
uses: actions/setup-python@v5
2020
with:
2121
python-version: '3.10'
22-
22+
2323
- name: Install uv
2424
uses: astral-sh/setup-uv@v3
25-
25+
2626
- name: Install Node Env
2727
uses: actions/setup-node@v4
2828
with:
2929
node-version: 20
30-
30+
3131
- name: Checkout
3232
uses: actions/checkout@v4.1.1
3333
with:
3434
fetch-depth: 0
3535
persist-credentials: false
36-
36+
3737
- name: Build and validate package
3838
run: |
3939
uv venv
@@ -44,10 +44,10 @@ jobs:
4444
uv build
4545
uv pip install --upgrade pkginfo==1.12.0 twine==6.0.1 # Upgrade pkginfo and install twine
4646
python -m twine check dist/*
47-
47+
4848
- name: Debug Dist Directory
4949
run: ls -al dist
50-
50+
5151
- name: Cache build
5252
uses: actions/cache@v3
5353
with:
@@ -59,7 +59,7 @@ jobs:
5959
runs-on: ubuntu-latest
6060
needs: build
6161
environment: development
62-
if: >
62+
if: >
6363
github.event_name == 'push' && (github.ref == 'refs/heads/main' || github.ref == 'refs/heads/pre/beta') ||
6464
(github.event_name == 'pull_request' && github.event.action == 'closed' && github.event.pull_request.merged &&
6565
(github.event.pull_request.base.ref == 'main' || github.event.pull_request.base.ref == 'pre/beta'))
@@ -74,23 +74,23 @@ jobs:
7474
with:
7575
fetch-depth: 0
7676
persist-credentials: false
77-
77+
7878
- name: Restore build artifacts
7979
uses: actions/cache@v3
8080
with:
8181
path: ./dist
8282
key: ${{ runner.os }}-build-${{ github.sha }}
83-
83+
8484
- name: Semantic Release
8585
uses: cycjimmy/semantic-release-action@v4.1.0
8686
with:
8787
semantic_version: 23
8888
extra_plugins: |
8989
semantic-release-pypi@3
90-
@semantic-release/git
91-
@semantic-release/commit-analyzer@12
92-
@semantic-release/release-notes-generator@13
93-
@semantic-release/github@10
90+
@semantic-release/git
91+
@semantic-release/commit-analyzer@12
92+
@semantic-release/release-notes-generator@13
93+
@semantic-release/github@10
9494
@semantic-release/changelog@6
9595
conventional-changelog-conventionalcommits@7
9696
env:

.readthedocs.yaml

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
2+
# Read the Docs configuration file for Sphinx projects
3+
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
4+
5+
# Required
6+
version: 2
7+
8+
# Set the OS, Python version and other tools you might need
9+
build:
10+
os: ubuntu-22.04
11+
tools:
12+
python: "3.12"
13+
# You can also specify other tool versions:
14+
# nodejs: "20"
15+
# rust: "1.70"
16+
# golang: "1.20"
17+
18+
# Build documentation in the "docs/" directory with Sphinx
19+
sphinx:
20+
configuration: docs/conf.py
21+
# You can configure Sphinx to use a different builder, for instance use the dirhtml builder for simpler URLs
22+
# builder: "dirhtml"
23+
# Fail on all warnings to avoid broken references
24+
# fail_on_warning: true
25+
26+
# Optionally build your docs in additional formats such as PDF and ePub
27+
# formats:
28+
# - pdf
29+
# - epub
30+
31+
# Optional but recommended, declare the Python requirements required
32+
# to build your documentation
33+
# See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
34+
# python:
35+
# install:
36+
# - requirements: docs/requirements.txt

.releaserc.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,4 +53,3 @@ branches:
5353
channel: "dev"
5454
prerelease: "beta"
5555
debug: true
56-

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,18 @@
11
## [1.36.1-beta.1](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.36.0...v1.36.1-beta.1) (2025-01-21)
22

33

4+
45
### Bug Fixes
56

67
* Schema parameter type ([2b5bd80](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/2b5bd80a945a24072e578133eacc751feeec6188))
8+
* search ([ce25b6a](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/ce25b6a4b0e1ea15edf14a5867f6336bb27590cb))
9+
710

811

912
### Docs
1013

14+
15+
* add requirements.dev ([6e12981](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/6e12981e637d078a6d3b3ce83f0d4901e9dd9996))
1116
* added first ollama example ([aa6a76e](https://github.com/ScrapeGraphAI/Scrapegraph-ai/commit/aa6a76e5bdf63544f62786b0d17effa205aab3d8))
1217

1318
## [1.36.0](https://github.com/ScrapeGraphAI/Scrapegraph-ai/compare/v1.35.0...v1.36.0) (2025-01-12)

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,4 @@ RUN pip install --no-cache-dir scrapegraphai
66
RUN pip install --no-cache-dir scrapegraphai[burr]
77

88
RUN python3 -m playwright install-deps
9-
RUN python3 -m playwright install
9+
RUN python3 -m playwright install

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,4 @@ Permission is hereby granted, free of charge, to any person obtaining a copy of
44

55
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
66

7-
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
7+
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,7 @@ The Official API Documentation can be found [here](https://docs.scrapegraphai.co
182182
</a>
183183
</div>
184184

185-
## 📈 Telemetry
185+
## 📈 Telemetry
186186
We collect anonymous usage metrics to enhance our package's quality and user experience. The data helps us prioritize improvements and ensure compatibility. If you wish to opt-out, set the environment variable SCRAPEGRAPHAI_TELEMETRY_ENABLED=false. For more information, please refer to the documentation [here](https://scrapegraph-ai.readthedocs.io/en/latest/scrapers/telemetry.html).
187187

188188

SECURITY.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,3 @@
33
## Reporting a Vulnerability
44

55
For reporting a vulnerability contact directly mvincig11@gmail.com
6-

docs/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ markmap:
5555
- Use Selenium or Playwright to take screenshots
5656
- Use LLM to asses if it is a block-like page, paragraph-like page, etc.
5757
- [Issue #88](https://github.com/VinciGit00/Scrapegraph-ai/issues/88)
58-
58+
5959
## **Long-Term Goals**
6060

6161
- Automatic generation of scraping pipelines from a given prompt

docs/requirements-dev.txt

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
sphinx>=7.1.2
2+
sphinx-rtd-theme>=1.3.0
3+
myst-parser>=2.0.0
4+
sphinx-copybutton>=0.5.2
5+
sphinx-design>=0.5.0
6+
sphinx-autodoc-typehints>=1.25.2
7+
sphinx-autoapi>=3.0.0

docs/requirements.txt

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
sphinx>=7.1.2
2+
3+
sphinx-rtd-theme>=1.3.0
4+
myst-parser>=2.0.0
5+
sphinx-copybutton>=0.5.2
6+
sphinx-design>=0.5.0
7+
sphinx-autodoc-typehints>=1.25.2
8+
sphinx-autoapi>=3.0.0
9+
furo>=2024.1.29

docs/russian.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -228,4 +228,4 @@ ScrapeGraphAI лицензирован под MIT License. Подробнее с
228228
## Благодарности
229229

230230
- Мы хотели бы поблагодарить всех участников проекта и сообщество с открытым исходным кодом за их поддержку.
231-
- ScrapeGraphAI предназначен только для исследования данных и научных целей. Мы не несем ответственности за неправильное использование библиотеки.
231+
- ScrapeGraphAI предназначен только для исследования данных и научных целей. Мы не несем ответственности за неправильное использование библиотеки.

docs/source/conf.py

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -12,31 +12,30 @@
1212
import sys
1313

1414
# import all the modules
15-
sys.path.insert(0, os.path.abspath('../../'))
15+
sys.path.insert(0, os.path.abspath("../../"))
1616

17-
project = 'ScrapeGraphAI'
18-
copyright = '2024, ScrapeGraphAI'
19-
author = 'Marco Vinciguerra, Marco Perini, Lorenzo Padoan'
17+
project = "ScrapeGraphAI"
18+
copyright = "2024, ScrapeGraphAI"
19+
author = "Marco Vinciguerra, Marco Perini, Lorenzo Padoan"
2020

2121
html_last_updated_fmt = "%b %d, %Y"
2222

2323
# -- General configuration ---------------------------------------------------
2424
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
2525

26-
extensions = ['sphinx.ext.autodoc', 'sphinx.ext.napoleon']
26+
extensions = ["sphinx.ext.autodoc", "sphinx.ext.napoleon"]
2727

28-
templates_path = ['_templates']
28+
templates_path = ["_templates"]
2929
exclude_patterns = []
3030

3131
# -- Options for HTML output -------------------------------------------------
3232
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
3333

34-
html_theme = 'furo'
34+
html_theme = "furo"
3535
html_theme_options = {
3636
"source_repository": "https://github.com/VinciGit00/Scrapegraph-ai/",
3737
"source_branch": "main",
3838
"source_directory": "docs/source/",
39-
'navigation_with_keys': True,
40-
'sidebar_hide_name': False,
39+
"navigation_with_keys": True,
40+
"sidebar_hide_name": False,
4141
}
42-

docs/source/getting_started/examples.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -84,4 +84,4 @@ After that, you can run the following code, using only your machine resources br
8484
result = smart_scraper_graph.run()
8585
print(result)
8686
87-
To find out how you can customize the `graph_config` dictionary, by using different LLM and adding new parameters, check the `Scrapers` section!
87+
To find out how you can customize the `graph_config` dictionary, by using different LLM and adding new parameters, check the `Scrapers` section!

docs/source/getting_started/installation.rst

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ The library is available on PyPI, so it can be installed using the following com
2222
pip install scrapegraphai
2323
2424
.. important::
25-
25+
2626
It is higly recommended to install the library in a virtual environment (conda, venv, etc.)
2727

2828
If your clone the repository, it is recommended to use a package manager like `uv <https://github.com/astral-sh/uv>`_.
@@ -35,7 +35,7 @@ To install the library using uv, you can run the following command:
3535
uv build
3636
3737
.. caution::
38-
38+
3939
**Rye** must be installed first by following the instructions on the `official website <https://github.com/astral-sh/uv>`_.
4040

4141
Additionally on Windows when using WSL
@@ -46,5 +46,3 @@ If you are using Windows Subsystem for Linux (WSL) and you are facing issues wit
4646
.. code-block:: bash
4747
4848
sudo apt-get -y install libnss3 libnspr4 libgbm1 libasound2
49-
50-

docs/source/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,4 +43,4 @@ Indices and tables
4343

4444
* :ref:`genindex`
4545
* :ref:`modindex`
46-
* :ref:`search`
46+
* :ref:`search`

docs/source/introduction/overview.rst

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,20 +3,23 @@
33
:width: 50%
44
:alt: ScrapegraphAI
55

6-
Overview
6+
Overview
77
========
88

99
ScrapeGraphAI is an **open-source** Python library designed to revolutionize **scraping** tools.
10-
In today's data-intensive digital landscape, this library stands out by integrating **Large Language Models** (LLMs)
10+
In today's data-intensive digital landscape, this library stands out by integrating **Large Language Models** (LLMs)
1111
and modular **graph-based** pipelines to automate the scraping of data from various sources (e.g., websites, local files etc.).
1212

1313
Simply specify the information you need to extract, and ScrapeGraphAI handles the rest, providing a more **flexible** and **low-maintenance** solution compared to traditional scraping tools.
1414

15+
For comprehensive documentation and updates, visit our `website <https://scrapegraphai.com>`_.
16+
17+
1518
Why ScrapegraphAI?
1619
==================
1720

1821
Traditional web scraping tools often rely on fixed patterns or manual configuration to extract data from web pages.
19-
ScrapegraphAI, leveraging the power of LLMs, adapts to changes in website structures, reducing the need for constant developer intervention.
22+
ScrapegraphAI, leveraging the power of LLMs, adapts to changes in website structures, reducing the need for constant developer intervention.
2023
This flexibility ensures that scrapers remain functional even when website layouts change.
2124

2225
We support many LLMs including **GPT, Gemini, Groq, Azure, Hugging Face** etc.
@@ -161,13 +164,13 @@ FAQ
161164
- Check your internet connection. Low speed or unstable connection can cause the HTML to not load properly.
162165

163166
- Try using a proxy server to mask your IP address. Check out the :ref:`Proxy` section for more information on how to configure proxy settings.
164-
167+
165168
- Use a different LLM model. Some models might perform better on certain websites than others.
166169

167170
- Set the `verbose` parameter to `True` in the graph_config to see more detailed logs.
168171

169172
- Visualize the pipeline graphically using :ref:`Burr`.
170-
173+
171174
If the issue persists, please report it on the GitHub repository.
172175

173176
6. **How does ScrapeGraphAI handle the context window limit of LLMs?**
@@ -200,3 +203,8 @@ Sponsors
200203
:width: 11%
201204
:alt: Scrapedo
202205
:target: https://scrape.do
206+
207+
.. image:: ../../assets/scrapegraph_logo.png
208+
:width: 11%
209+
:alt: ScrapegraphAI
210+
:target: https://scrapegraphai.com

docs/source/modules/modules.rst

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,3 @@ scrapegraphai
77
scrapegraphai
88

99
scrapegraphai.helpers.models_tokens
10-

docs/source/modules/scrapegraphai.helpers.models_tokens.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,4 +25,4 @@ Example usage:
2525
else:
2626
print(f"{model_name} not found in the models list")
2727
28-
This information is crucial for users to understand the capabilities and limitations of different AI models when designing their scraping pipelines.
28+
This information is crucial for users to understand the capabilities and limitations of different AI models when designing their scraping pipelines.

0 commit comments

Comments
 (0)