Why is my package running slow and my CPU keeps hitting 100%?
Symptoms:
Package is running slow
The status of the executing package will often say “Data Extraction Complete” for a long period of time with no apparent progress being made.
The CPU is seen maxing out at 100% in the task manager
Explanation:
Web Scraper uses mshtml.dll (a dll used by IE) to do some formatting and standardization on any page that is to be scraped. This helps make datapages work more consistently. However, IE will often wait for embedded JavaScript or other controls to load, but they never will since the extraction is happening from a local copy of the file. So it is JavaScript or some other code in the page that is causing our program to wait.
Solution:
Web Scraper can do some tag pre-cleansing to break or remove the JavaScript tags that may be slowing things down:
Open the package properties and select the steps tab
Edit the properties of the task in question
Go to the advanced tab
Click the Edit button for download options
Select the advanced link at the bottom of this page
In the first drop down, select “Replace Text”
In the first drop down box, type ">script,>foo " This will break all script tags. If this does not work you can also try “Remove text between tags” and type ">script,>/script" in the drop down box.
If the problem still persists, you can try looking through the source of those pages that are slowing things down and try to come up with your own pre-cleansing function to fix the problem.
Another strategy is to disconnect from the internet and scrape local files that you downloaded previously. This would prevent the downloaded files' scripts making timely requests to the web.