Linking a Web Data to a Prompt

Problem description

Sometimes Details pages show a varying amount of details for different items. For example, in a business directory some companies may publish less information than others. In this case the positions of the data fields inside the web page changes from item to another. That happens because web pages do not reserve space for missing data. However, web data extraction software uses position information to search for data fields and if their position changes, the scraper may extract wrong data or no data at all.

For example, suppose you want to extract web site URLs and email addresses and store them in separate columns on a spreadsheet. A sample Details page contains a company web site URL on the first line of the details block and an email address on the second line. The next company does not have a web site and its details block contains only an email address on the first line. When the scraper extracts web data it will mistakenly put the web site URL of the first company and the email address of the second one into the same column. How can this problem be resolved?

Solution

Most of the time, a data element on the web page has a corresponding prompt element. The text of the prompt does not change from one item to another. For example, an email prompt would always be “Email Address:” regardless of its position on the web page. The actual email address will always follow the prompt.

Linking a data item to its prompt helps collect information correctly from the changing details pages. Here is how this can be done using the Data Toolbar. Note that this functionality is only available on Details pages

Step by step

1. Select a data field you want to collect. A new data row will appear in the data-grid. Click on the "Add link" button on the right side of the data row.

2. After you have clicked the "Add link" button, the tool will switch into the “Prompt selection mode”. In that mode, data elements on the web page are highlighted with cyan background instead of a yellow one when you move the mouse cursor over them.

3. Select an element whose prompt you wish to use. The program will create a link between the prompt and the data element. Prompt text will appear in front of the data sample. You can remove the link anytime by clicking on the Remove Link icon in the data-grid.

4. Add other fields in the same way and press the Get Data button.

Get a Free Web Scraping Tool Now

Get a free version of Data Toolbar. The free version has the same functionality as the full version but its output is limited to 100 rows. There is no expiration date. No registration. No ads. See how easy it is for yourself today.

Get a Free Web Scraping Tool Now
download Data Toolbar

The latest production build was released on 2020-03-04. Version 3.4 supports background data srcaping that does not distrupt other applications. Update your program for free if you own any of its previous versions. Check release history here.

Version 4.0, which is to be released in 2020, will significantly improve the performance and the flexibility of the DataTool by using the new data extraction engine based on CSS selectors.