Web Scraper and Web Macros FAQs

Page History: How to Build a Package


Compare Page Revisions



« Older Revision - Back to Page History - Newer Revision »


Page Revision: 2009/05/13 18:58


This method of building a package has been developed after building hundreds of packages. It uses a template that can be found here:

Navigation:

The idea is to test navigation to each type of page on the site before worrying about extracting the data, because if you can't get to it, you can't extract it.
  • Insert the URL or POST that navigates to the first page you need to extract into the template Package->Steps tab->Listings 1st Page->File/Form List. This URL is typically to a top level page like first page of listings, top level category page, or home page/search page that gets the cookie needed to navigate the rest of the site. Try and get as close as you can to the details you need to extract. If you can get to the detail pages directly by figuring out a pattern in the URL's, this can often save you a lot of time
  • Run the package and double click the URL in the window that pops up. Make sure the downloaded file has the data you need. If not, try creating a step before this one that navigates to the home page or blank search page so that a cookie for the site can be obtained. If that doesn't work, you may need to change some HTTP headers in Package->Steps tab->YourStepName->Advanced tab->Http Client. Use an HTTP sniffer to find out what these should be. Cookie then Referral URL then and User Agent are the most important.
  • Repeat this process for sample URL's of other types of pages you need to navigate down to, until you get to the details pages with the information you need.
  • Once you can navigate to all these pages, you need to make extraction templates known as datapages to get the data into a database, excel, access or other tabular format.

Extraction:

PoweredBy
Create a Page | Administration | File Management | Login/Logout | Language Selection | Your Profile |Create Account