This tutorial covers Data Toolbar for Internet Explorer. If you are interested in Data Toolbar for Chrome and Firefox tutorial, click here.
Version 3.4.7367 2020-03-04 View the release notes
Download and install the setup file using default settings. Restart Internet Explorer (32-bit or 64-bit) and navigate to the web site you want to extract data from. Make sure that Internet Explorer security level is set to medium-high or lower. For IE 11 and Windows 8.1 set enhanced protected mode to off. For best results use the latest version of Internet Explorer available for your operation system.
In this walkthrough we will use a product catalog (a list of Canon cameras) from www.bestbuy.com web site. To start the DataTool wizard click on the DataTool button in the toolbar area of Internet Explorer.
When the wizard is open, moving your mouse pointer over the web page automatically highlights page elements that can be marked as data fields. With the Add Column radio-button selected, clicking on a data field or an image will automatically create a new column. In column selection mode Internet Explorer navigation is controlled by the wizard so clicking on a hyperlink does not open a new page. Use right-click for element selection to avoid pop-up windows or page updates.
Chose any record as a sample and using this record simply point to the data you want to collect from all of the records on the web site (video). As you select new fields, additional columns are automatically created.
Test your columns selection by pressing the Get Data button.
Click on the Add Details radio-button to add a high resolution image or a detailed description from a Details page associated with the current item. The browser will automatically open that page using the first link found in the column list. When navigation is complete, click on the fields you want to add. To return to the master page press either the Add Columns or the Set Next Element button.
Sometimes a Details page contains all information about an item that you need. It is still required to add a link column from a primary list to instruct the program how it should navigate from one details page to another. You can easily delete the extra column from the final output file.
On some web site Details pages show a varying amount of details for different items. For example, in the business directory some companies may publish less information about themselves then others. That changes the web elements position in the document tree and causes data extraction errors. The problem can be usually resolved by linking a data field to its prompt.
Where a web site features a NEXT page option, the Data Toolbar will automatically collect data from all available pages. Once you have completed selecting the data fields, go to the "Set Next element" option. Once this radio-button is selected place your mouse on the Next button on the web page and click. You will then see the Next Element added to the column list. Make sure that the click has not cause web page update.
If a web does not have dedicated "Next element" but has numeric page links 1 2 3..., select number 2 as a crawling rule. The program will automatically increment it to go through the whole range.
If you have selected a data field you are not happy with, click on the red button on the far right hand side. This will remove the field you have selected. In the same way you can reset the Next page element. The default column names assigned by the program can be edited. Just click on a cell containing the name and type a new name. Press Clear to clear the column list.
Use Up and Down buttons in the left top corner of the data grid to change columns order.
Once you have selected the data fields and set the Next Element, click on the Get Data button. The program will start collecting web page data showing you the number of processed pages and extracted data rows. At any time you can interrupt data scraping by clicking either the Show Data or the Edit Tags button.
fter all pages are processed, the wizard goes into Review Data mode. You can review the collected information before saving it on your computer. The search box can be used to filter data. Checking the Show Complete Text checkbox wraps the text and adjusts the cell's height to fit the text without trimming.
If, instead of the multiple records that you see on the web page, the program collects just one, press the Edit Tags button to go back and check that all of the columns that you selected belong to the same record.
If you are satisfied with the collected information press Continue to go to the Save Data screen.
The Save Data screen presents two options: Saving Data and Adding More Data Rows. Pressing the Continue button on the Save Data Screen will default to Save and Exit. The program can save data as either a CSV, XML or HTML table. These formats can be easily imported into an Excel or Google spreadsheet. If you have added image collection as well, select the desired location of the downloaded images on your computer. Selecting Web location will keep references to the original image locations on the Web. Checking the Open File checkbox opens a generated data file as soon as it gets created.
The Free edition limits program output to 100 records. There are no limitations in the standard edition.
The picture on the right shows a CSV file generated by the Data Toolbar opened in Excel.
For web sites that may not offer a Next button, you can continue to extract data using the Add More Data Rows option. Once selected, press Continue. Next, navigate to the web page from which you wish to collect data and press Get Data. You can repeat this process as often as you like, adding data to the same CSV file before saving. When Edit Columns before adding rows is selected, the standard edit display is shown, allowing you to make any changes required.
To get access to advanced column editing options click on the icon in the first column of the data grid. Advanced options include:
Useful regular expressions:
To extarct a numeric value (i.e. a price) use either [1-9.,]+ or
[$][1-9.,]+ expression.
To extract a text between two strings use
start-string(.*?)end-string expression.
To extarct an email address use
(([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5}))
expression.
Sometimes a web site updates just a part of the web page instead of navigating to a new URL in response to a user action. Usually that happens when a user clicks on a "Next Page" element. Partial updates reduce screen flickering and
ata Toolbar associates the column list with the web site for which it has been created. The column list is saved and loaded automatically when you close or open the wizard. Besides the column list there are some advanced program options that can be associated with the web site. Select Options to manage advanced project settings. The Options screen allow you to change download rules for "Details" and "Next" web pages, and export or import a project as text file. Do not change the project settings unless you need to resolve a problem.
Expected site response can be set either to "New page" (default)
or "Partial Update". Partial updates are used by web designers to
eliminated flickering caused by full page updates. Partial updates
do not generate a normal event flow and are processed based on
timer events.
Decrease the default value of Delay after page complete
event to 0.5 second to improve program performance. Keep it at 2.5
seconds or increase it for pages that use asynchronous JavaScript
(AJAX).
Use "Open details page in a hidden window" option to
eliminate a page reload when going back from details to master
page.
The Web browser tab allows you to run the wizard in "Explorer" or "Standalone" modes. Standalone mode may improve web scraping performance by not showing downloaded content in Internet Explorer and running extraction task as aseparate process.
Project settings can be explicitly exported into an XML file. This can be useful for sites that require multiple data scraping schemes.
Data Toolbar does not need much space. Right click anywhere in the toolbar area of Internet Explorer and make sure that the Lock the Toolbars menu item is Off. Then drag the Data Toolbar to put it on the same line as the Menu Bar or another toolbar that you have. On the picture below the Menu Bar, the Data Toolbar and the Google toolbar share the same horizontal bar.
At any time you can hide Data Toolbar completely using close toolbar button [x]. You should disable Data Toolbar Helper if you disable the toolbar. To bring the toolbar back right click anywhere in the toolbar area (IE8) or Home button area (IE9) of Internet Explorer and enable the toolbar and its helper.