Scrapy Debugging Tips

This post covers the most common techniques for debugging spiders. Consider the following code: This is a spider that crawls two pages. It also crawls a detail page, so the passes a parameter. The parse Command The most basic way to inspect spider output is to use the command. It can check the behavior of individual spider methods at the method level and is both flexible and simple — but it cannot debug the internals of a method. To view the content scraped from a specific URL: Use the or option to see the status at each depth level: Crawling from a single URL is also straightforward: Scrapy Shell While the command is very useful for inspecting spider behavior, it cannot inspect what happens inside a callback beyond viewing the received response and output. What if you need to debug why sometimes fails to retrieve scraped content? In that case, use the tool. See Invoking the shell from spiders to inspect responses for details. Open in Browser Sometimes you just want to see a specific response in the browser. In that case, use the function. For example: will open a browser displaying the response that the spider received at that point (it also adjusts the base tag so that images and styles display correctly). Logging Logging is another useful way to get information about spider execution. While not the most convenient, logs can be saved for later reference. For more information, see the section.