Web Scraper and Web Macros FAQs

Page History: How to Build a Package


Compare Page Revisions



« Older Revision - Back to Page History - Newer Revision »


Page Revision: 2009/05/13 19:08


This method of building a package has been developed after building hundreds of packages. It uses a template that can be found here:

Navigation:

The idea is to test navigation to each type of page on the site before worrying about extracting the data, because if you can't get to it, you can't extract it.
  • Insert the URL or POST that navigates to the first page you need to extract into the template Package->Steps tab->Listings 1st Page->File/Form List. This URL is typically to a top level page like first page of listings or top level category page. Try and get as close as you can to the details you need to extract. If you can get to the detail pages directly by figuring out a pattern in the URL's, do it.
  • Run the package and double click the URL in the window that pops up. Make sure the downloaded file has the data you need. If not, try creating a step before this one that navigates to the home page or blank search page so that a cookie for the site can be obtained. If that doesn't work, you may need to change some HTTP headers in Package->Steps tab->YourStepName->Advanced tab->Http Client. Use an HTTP sniffer to find out what these should be. Cookie then Referral URL then and User Agent are the most important.
  • Repeat this process for sample URL's of other types of pages you need to navigate down to, until you get to the details pages with the information you need.
  • Once you can navigate to all these pages, you need to make the extraction templates known as datapages to get the data into a database, Excel, Access or other tabular format.

Extraction:

  • In Web Scraper, click the datapages icon on the left and open up the existing Counter Datapage.
  • Go back to Web Scraper and click the packages icon on the left and right click->run the package created above.
  • Click the highest URL that you need information from. (Usually the top one, unless you made a cookie step.)
  • Copy the URL from the browser that opens up into the datapage editor and hit enter.
  • In the top menu, choose datapage->New. (If the file takes a long time to load, you may need to edit the html to take out javascript or frames that is making thigs hang.