DOM blocks are handled at the DOM level. It will treat the source HTML as a DOM document and use the query to select the DOM nodes to perform the extraction. If a DOM block is used in a DOM operation, the result of the block will be a single or collection of DOM nodes. If it’s a single block rule expression or the block is used in a string operation, the result of the DOM block will be a string.

XPATH

XPath is a language used to navigate through elements and attributes in an XML document. The XPATH block accepts an XPath query to perform on the content. For example:

{{XPATH[value=inner]://div[@class="content"]}}

This example will extract the inner HTML of a div element which has a class named content.

Settings

value: The value specifies the desired value to be extracted when the DOM element is found in the DOM document. The available value types are:

  • Default. Inner HTML of the selected element.
  • outter: Outer HTML of the selected element.
  • text: Inner text of the selected element.
  • [attr_name]: If the query is used to extract attribute value, the value setting can be set to the name of the attribute, hence it will return the value of the attribute.

For example:

Given that the source document is

Screenshot of a piece of code.

The result of {{XPATH[value=inner]://div[@class="main"]}} is

Screenshot of a piece of code.

The result of {{XPATH[value=outter]://div[@class="main"]}} is

Screenshot of a piece of code.

The result of {{XPATH[value=text]://div[@class="main"]}} is

main content

The result of {{XPATH[value=src]://div[@class="main"]/img}} is

logo.png

single: The single setting will specify whether to extract a single element or all the elements matched by the query.

  • Default. If an expression returns multiple nodes, it will only return the first node in the collection.
  • false: Concatenate all the nodes in the collection.

For example:

Given that the source document is

Screenshot of a piece of code.

The result of {{XPATH[single=true]://div[@class="info"]}} is

Screenshot of a piece of code.

The result of {{XPATH[single=false]://div[@class="info"]}} is

Screenshot of a piece of code.

Note

  • If blocks are used in an operation, the settings of the first block in the operation will be used as the settings for the operation result.

For example, {{XPATH[value=inner]://div}} - {{XPATH[value=outter]://div/h1}} will return the inner HTML of the subtraction result even if the latter block is set to return the outer HTML.

  • This single setting can only be used in a single block expression, which means it can’t be used in operations.

For example, the following expression is valid because it’s a single block expression.

{{XPATH[single=false]://h2}}

The following expression is invalid.

{{XPATH[single=false]://h2}} + {{XPATH[single=true]://h3}}

Read More

As this documentation is not explaining the XPath itself, here are some websites to help you get started with the XPath:

JQUERY

The JQUERY block uses jQuery-like CSS selectors to select DOM elements and extract the content. For example:

{{JQUERY:#main}}

This will select and return the HTML of an element with class name main.

Settings

The JQUERY block has the same settings as the XPATH block. See XPATH block settings.

Read More

As this documentation is not explaining the CSS selector itself, the following is a list of websites to get start with the CSS selector.