Closed
Description
If you inadvertently set you CRAWLERA_URL
setting without the URL scheme like:
CRAWLERA_URL = "proxy.crawlera.com:8010"
You'll receive a non-descriptive twisted exception when trying to crawl http://
Traceback (most recent call last):
File "/home/raphael/.virtualenvs/myproject-jQmK5Pxo/bin/scrapy", line 10, in <module>
sys.exit(execute())
File "/home/raphael/.virtualenvs/myproject-jQmK5Pxo/lib/python3.6/site-packages/scrapy/cmdline.py", line 150, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/home/raphael/.virtualenvs/myproject-jQmK5Pxo/lib/python3.6/site-packages/scrapy/cmdline.py", line 90, in _run_print_help
func(*a, **kw)
File "/home/raphael/.virtualenvs/myproject-jQmK5Pxo/lib/python3.6/site-packages/scrapy/cmdline.py", line 157, in _run_command
cmd.run(args, opts)
File "/home/raphael/.virtualenvs/myproject-jQmK5Pxo/lib/python3.6/site-packages/scrapy/commands/shell.py", line 74, in run
shell.start(url=url, redirect=not opts.no_redirect)
File "/home/raphael/.virtualenvs/myproject-jQmK5Pxo/lib/python3.6/site-packages/scrapy/shell.py", line 47, in start
self.fetch(url, spider, redirect=redirect)
File "/home/raphael/.virtualenvs/myproject-jQmK5Pxo/lib/python3.6/site-packages/scrapy/shell.py", line 120, in fetch
reactor, self._schedule, request, spider
File "/home/raphael/.virtualenvs/myproject-jQmK5Pxo/lib/python3.6/site-packages/twisted/internet/threads.py", line 122, in blockingCallFromThread
result.raiseException()
File "/home/raphael/.virtualenvs/myproject-jQmK5Pxo/lib/python3.6/site-packages/twisted/python/failure.py", line 488, in raiseException
raise self.value.with_traceback(self.tb)
twisted.web.error.SchemeNotSupported: Unsupported scheme: b''
I think a good approach would be to identify the lack of the scheme on CRAWLERA_URL
and throw a descriptive expection. This can be done at spider_open
signal we listen to on CrawleraMiddleware
.
Metadata
Metadata
Assignees
Labels
No labels