When running a package in Web Scraper it will only download those pages that you request, and nothing else. So unlike your browser, that may make a dozens of requests for a single page to load images, stylesheets, and external JavaScript, Web Scraper will only download the source code. This is great most of the time because it reduces the number of requests you make to a site and improves performance. But sometimes you want specific images. This is how you do that.
- Create a datapage that scrapes the image URL from one of the pages you're interested in
- Create the datapage and dataset as you normally would
- When adding the field that is the image URL make note if the URL is relative (starts with a /) or absolute (starts with http:). If it is absolute then add the field normally. If it is relative then after you add the field go back to the database tab under the field properties and switch to the URL column filter.
- Perform a test extraction and ensure that when the entire (absolute) path is picked up. If you are using the URL column filter, it should add in the http://... prefix.
- Create a package that navigates to all the pages that contain the images you are after that uses the datapage created above
- Add an additional step to download the images after the task that scrapes the image URLs
- Under the Steps tab of the package properties add a new task
- Name the task and select the type of "Navigate to a list of webpages". In this case we're navigating to images, not HTML pages specifically, but that actually doesn't matter. Uncheck the box at the bottom to indicate that you will not use a datapage in this task
- For the file list type select "Field in another datapage" and select the URL field in the datapage of the previous task.
- Finish adding the task.
That's it. You may want to adjust bandwidth throttling or other options afterwards.