Template Editor

Template editor is the central screen of the the project wizard. It allows you to

  • Select and group content and action elements.
  • Set input text.
  • Edit and execute actions like mouse clicks
  • Move to other tempalates.
  • Access properties editors for content elements, actions, groups and the project.
  • Execute the data extraction project.

When you open the wizard, the first screen is the template editor of the starting page.

The data grids in the template editor allow editing of content elements and actions. Basic properties can be edited directly in the grid. Clicking on the left most column of each data grid opens a corresponding editor with the advanced editing options.

When the template editor is open, moving your mouse pointer over the web page automatically highlights page elements that can be marked as content or action elements. With the “Add Content” radio-button selected, clicking on a data field or an image will automatically create a new content element. “Selection mode” radio buttons controls how the program interprets the selected element.

Based on the web element type the program decides what information is to be extracted. The default extraction type is the text or the image of the element but that type can be modified using element editor screen. The content elements can be used not only to extract data but also to submit input information to the web site.

Clicking on the hyperlink creates an element with an attached action (Action element) if there is no other action element in the group. Actions can be detached later on or attached to any content element in template. The program does not collect the content of Action elements when it runs a data extraction project. If you need to keep the Action associated with an element and collect the element content, select the element twice. In content selection clicking on a hyperlink is intercepted by the template editor and it does not open a new page.

Action buttons are used to emulate user actions and to move to child templates. The default action type is a mouse click but there are other possible actions like mouse hover. An action can cause browser navigation or a partial update of the current page. Partial updates include various effects like sliding menus, popup dialogs, popup images or tooltips. The program detects partial updates most of the time but you may need to help it in certain cases. Use Actions grid to edit actions properties.

To return from a child template to its parent template use the Back button of the template editor. You can instruct the data extraction program to open hyperlinks in a new tab. That option eliminates page refresh when the crawler returns from a child page.

Groups tab allows creation and modification of element groups. Groups are used to group elements that appear as a part of a list and distinguish them from other elements that appear once per page. The “Repeatable” checkbox tells the program that the group of elements appear multiple times per page. For example if the project extracts a product catalog, product name, price and description belong to a repeatable group. At the same time “Next Page” button, that appears only once per page, belongs to another non-repeatable group.

“Active groups” dropdown sets the group that receives new elements. You can have multiple repeatable groups on the same page. For example on a product details page besides product specification the page can contain images and user comments.

"Get Data" button starts the crawler that executes the data extraction project. Data extraction process always begins from the project Start URL and starting template even if you are editing a child template.

The creation of the project should be an iterative process where each new template is tested against the web site by pressing Get Data button. The crawler can be stopped at any time to view the results. It is important to control which content group is repeatable and which appears only once per page to get the predictable results.