Web Data Extraction Project Structure

If you are tired of manually copying and pasting pages and pages of data into Excel, then it is time to make your web scraping projects happen literally while you sleep. Try a data extraction platform that will take care of all your data scraping projects for you. You no longer have to suffer with sore eyes and aching fingers, because screen scraping can be done quickly and efficiently with automated web data extraction software.


Data Toolbar is an intuitive data extraction tool that automates the data extraction process for your browser. It makes all your data scraping projects easy. Simply point to the data fields you want to collect, and the tool will do the web page content extraction for you. 

Data Toolbar is designed for everyday business users and requires no technical skill. Within minutes, you will be extracting thousands of data records from your favorite free or subscription-based websites.


Each template can have one or many actions that describe how the web browser should navigate to pages defined by other templates. For example, the template defining the product list would have an action telling the web browser to click on a product details link to navigate to a product details page. The template defining the product details page will be a child template of the template defining the product list page.

Content and Element Groups

Understanding how content on web pages works will make your web scraping projects easier. Some of the content is grouped. Other content is repeated across different pages. Let’s see how this affects your data scraping projects.

Content elements are the elements of the web page generated according to a certain pattern. Content elements defined by a template can be grouped to distinguish the elements that appear only once on a page from the repeatable elements. An example of a repeatable group of elements can be a product title, its price, and its description in the product catalog. 

Sometimes there are multiple repeatable groups on the same page, which you would include in your web scraping projects. For example, on a LinkedIn page, you may have multiple lists - skills, jobs, and educational institutions. Your web page content extraction would include capturing all these repeatable elements. 

Content elements are defined by XPath expressions and filtering conditions. A data capture type specifies what part of information should be extracted and saved for each content element. Data capture types include text, files, pictures, links, or raw HTML. Regex expressions are used to extract a particular substring, like a phone number, from a larger block of information.

The primary browser action is a click on a web element. An action can be attached to any web element that causes a page update or navigation. There are other actions, like navigation to a URL, that do not need a content element.

Template Example

The picture below shows the structure of a product catalog template that includes two element groups. One group is a repeatable item with three elements per item. Another group is not repeatable, and it has only one web element - the “Next Page” button. Knowing the difference between element groups, content (repeatable and non-repeatable), and actions is important for your web scraping projects.

Template Editor

Below is a screenshot of the template editor screen of the typical Sign In web page. It shows all the main components of the web data extraction software that you will use for your web page content extraction when doing data scraping projects. There is a template with one content group, a few content elements, and one action.

The content elements include the user's e-mail, the password, and the submit button. The action is associated with the submit button. The group of elements is marked as not repeatable because it appears only once on the web page.

