Skip to content

fix: Fix the order in which cookies are saved to the SessionCookies and the handler is executed for PlaywrightCrawler #1163

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Mantisus
Copy link
Collaborator

@Mantisus Mantisus commented Apr 18, 2025

Description

  • For PlaywrightCrawler, cookies should only be saved to the session store when the handler is fully executed. This is because the browser may continue to set cookies while the handler is being executed

Testing

  • Add a test simulating the installation of a cookie in the browser during the default_handler execution process
  • Update the test_isolation_cookies test

@Mantisus Mantisus changed the title fix: Fix the order in which cookies are saved to the session store and the handler is executed for PlaywrightCrawler fix: Fix the order in which cookies are saved to the SessionCookies and the handler is executed for PlaywrightCrawler Apr 18, 2025
@Mantisus Mantisus requested a review from Copilot April 18, 2025 18:12
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes the order of cookie saving in the PlaywrightCrawler to ensure cookies are saved only after the request handler has fully executed.

  • Reorders the cookie saving logic in the _navigate function so that cookies are set after the handler processing.
  • Adds a new unit test to simulate and verify the correct cookie saving behavior.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
tests/unit/crawlers/_playwright/test_playwright_crawler.py Introduces a new test ensuring cookies are saved post-handler execution.
src/crawlee/crawlers/_playwright/_playwright_crawler.py Reorders the cookie saving logic to occur after the handler finishes.

@Mantisus Mantisus self-assigned this Apr 18, 2025
@Mantisus Mantisus requested review from janbuchar and Pijukatel April 18, 2025 18:13
@Mantisus Mantisus requested a review from Copilot April 18, 2025 18:36
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes the order in which cookies are saved for PlaywrightCrawler by ensuring that session cookies are only set after the handler has fully executed.

  • Updated cookie saving logic in _playwright_crawler.py to run after handler processing.
  • Enhanced tests in test_playwright_crawler.py to verify that cookies are correctly stored post-handler execution.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
tests/unit/crawlers/_playwright/test_playwright_crawler.py Updated test code to record sessions and verify correct cookie saving order.
src/crawlee/crawlers/_playwright/_playwright_crawler.py Reordered the cookie saving logic to execute after the response is processed.
Comments suppressed due to low confidence (2)

tests/unit/crawlers/_playwright/test_playwright_crawler.py:360

  • The code assumes sessions_ids has at least three elements, but earlier assertions indicate only two unique sessions. Please verify that this indexing is intentional and update the test to accurately reflect the expected session count.
clean_session_id = sessions_ids[2]

tests/unit/crawlers/_playwright/test_playwright_crawler.py:360

  • [nitpick] The identifier 'clean_session_id' may be ambiguous; consider renaming it to a more descriptive name that clarifies its role in indicating the session state after handler processing.
clean_session_id = sessions_ids[2]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants