Skip to content

Enhance exception description when CRAWLERA_URL is missing the scheme #77

Closed
@raphapassini

Description

@raphapassini

If you inadvertently set you CRAWLERA_URL setting without the URL scheme like:

CRAWLERA_URL = "proxy.crawlera.com:8010"

You'll receive a non-descriptive twisted exception when trying to crawl http://

Traceback (most recent call last):
  File "/home/raphael/.virtualenvs/myproject-jQmK5Pxo/bin/scrapy", line 10, in <module>
    sys.exit(execute())
  File "/home/raphael/.virtualenvs/myproject-jQmK5Pxo/lib/python3.6/site-packages/scrapy/cmdline.py", line 150, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/home/raphael/.virtualenvs/myproject-jQmK5Pxo/lib/python3.6/site-packages/scrapy/cmdline.py", line 90, in _run_print_help
    func(*a, **kw)
  File "/home/raphael/.virtualenvs/myproject-jQmK5Pxo/lib/python3.6/site-packages/scrapy/cmdline.py", line 157, in _run_command
    cmd.run(args, opts)
  File "/home/raphael/.virtualenvs/myproject-jQmK5Pxo/lib/python3.6/site-packages/scrapy/commands/shell.py", line 74, in run
    shell.start(url=url, redirect=not opts.no_redirect)
  File "/home/raphael/.virtualenvs/myproject-jQmK5Pxo/lib/python3.6/site-packages/scrapy/shell.py", line 47, in start
    self.fetch(url, spider, redirect=redirect)
  File "/home/raphael/.virtualenvs/myproject-jQmK5Pxo/lib/python3.6/site-packages/scrapy/shell.py", line 120, in fetch
    reactor, self._schedule, request, spider
  File "/home/raphael/.virtualenvs/myproject-jQmK5Pxo/lib/python3.6/site-packages/twisted/internet/threads.py", line 122, in blockingCallFromThread
    result.raiseException()
  File "/home/raphael/.virtualenvs/myproject-jQmK5Pxo/lib/python3.6/site-packages/twisted/python/failure.py", line 488, in raiseException
    raise self.value.with_traceback(self.tb)
twisted.web.error.SchemeNotSupported: Unsupported scheme: b''

I think a good approach would be to identify the lack of the scheme on CRAWLERA_URL and throw a descriptive expection. This can be done at spider_open signal we listen to on CrawleraMiddleware.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions