Web Scraper and Web Macros FAQs

Symptoms:
  • Package is running slow
  • The status of the executing package will often say “Data Extraction Complete” for a long period of time with no apparent progress being made.
  • The CPU is seen maxing out at 100% in the task manager

Explanation:
Web Scraper uses mshtml.dll (a dll used by IE) to do some formatting and standardization on any page that is to be scraped. This helps make datapages work more consistently. However, IE will often wait for embedded JavaScript or other controls to load, but they never will since the extraction is happening from a local copy of the file. So it is JavaScript or some other code in the page that is causing our program to wait.

Solution:
Web Scraper can do some tag pre-cleansing to break or remove the JavaScript tags that may be slowing things down:
  1. Open the package properties and select the steps tab
  2. Edit the properties of the task in question
  3. Go to the advanced tab
  4. Click the Edit button for download options
  5. Select the advanced link at the bottom of this page
  6. In the first drop down, select “Replace Text”
  7. In the first drop down box, type ">script,>foo " This will break all script tags. If this does not work you can also try “Remove text between tags” and type ">script,>/script" in the drop down box.

If the problem still persists, you can try looking through the source of those pages that are slowing things down and try to come up with your own pre-cleansing function to fix the problem.

Another strategy is to disconnect from the internet and scrape local files that you downloaded previously. This would prevent the downloaded files' scripts making timely requests to the web.
PoweredBy
Create a Page | Administration | File Management | Login/Logout | Language Selection | Your Profile |Create Account