Home > Web Automation Specific > HTML Descriptor Advanced

HTML Descriptor Advanced

WinTask x64 references each HTML element on a web page through the use of HTML Descriptors. The HTML Descriptors used in a WinTask x64 automation script make extensive use of the TAG attribute to locate the various page elements. This topic explains the format of the HTML Descriptors used by WinTask x64.

NOTE: This article assumes that you are familiar with HTML Tags. See the HTML Resources section at the end of this page for resources to help you learn more about HTML.

Block Element Tags

WinTask x64 supports HTML Descriptors using the following text Block Element Tags:

 Block Element Tag  Description
 <ADDRESS>  Specifies information, such as address, signature, and authorship, of the current document.
 <BLOCKQUOTE>  Separates a quotation in text.
 <DD>  Indicates the definition in a definition list. The definition is usually indented in the definition list.
 <DIV>  Specifies a container that renders HTML.
 <DL>  Denotes a definition list.
 <DT>  Indicates a definition term within a definition list.
 <H1> through <H6>  Renders text in heading style. H1 through H6 specify different sizes and styles of headings.
 <ID>  Unique ID of the HTML element.
 <INPUT FILE>      Chooses a file to upload.
 <LI>  Denotes one item in a list.
 <OL>  Denotes one item in a list.
 <P>  Denotes a paragraph.
 <SPAN>  Denotes a paragraph.
 <TABLE>      Specifies that the contained content is organized into a table with rows and columns.
 <TD>  Specifies a cell in a table.
 <UL>  Draws lines of text as a bulleted (unordered) list.


HTML Descriptor Format

The HTML Descriptors used by WinTask x64 to access a text Block Element on a web page is defined by a string that adheres to the following format:
"<TAG>[CONTENT= '<Element Text>']"
Where:
<TAG> is one of the Block Element Tags listed in the above table.
<Element Text> is a text string that uniquely identifies the HTML element.

Example:
"P[CONTENT= 'the ultimate']"

Note:
The HTML Descriptor uses single quotes ( ' ) to delimit the value strings for the different properties. If the value string for a property contains a single quote, it must be escaped by a preceding backslash (Ex. "P[CONTENT= 'Mc O\'Neil']").

In the above example, the HTML Descriptor specifies a paragraph element on the web page that starts with the text "the ultimate". If only one paragraph on the page starts with the word "the", the desired HTML element can be positively identified. If two or more paragraphs start with the word "the", WinTask x64 will add a word from the <Element Text> string to the search criteria (i.e. "the ultimate") until the target HTML element can be uniquely identified.

If the web page contains multiple HTML elements that match the HTML Descriptor and the target element cannot be uniquely identified; the INDEX property must added to the HTML Descriptor. The INDEX property is used to enumerate which element on the page is the target element. The format of the indexed text Block Element HTML Descriptor is as follows:
"<TAG>[CONTENT= '<Element Text>',INDEX= '<Number>']"
Where:
<TAG> is one of the Block Element Tags listed in the above table.
<Element Text> is a text string that uniquely identifies the HTML element.
<Number> is the instance from the beginning of the web page.

Examples:
"P[CONTENT= 'the ultimate',INDEX= '1']"
"P[CONTENT= 'the ultimate',INDEX= '3']"

In the above examples, the first line specifies the first paragraph on the web page that starts with the text "the ultimate". The second line specifies the third paragraph that starts with "the ultimate".

HTML Descriptors for the Table tag

The HTML Descriptor used to access a Table element on a web page uses the CONTENT property as do the other text Block Element tags. The difference however is that the text used to uniquely specify the HTML element can be located anywhere in the table element. The HTML Descriptor for Table elements follows the following format:
"Table[CONTENT='<Element Text>']"
Where:
<Element Text> is a text string that uniquely identifies the Table HTML element.

HTML Descriptors for the DIV, SPAN and TD tags

The HTML Descriptor used to access these elements uses the OUTERTEXT property instead of of the CONTEXT property. The value of the OUTERTEXT property is used to uniquely match an element on the web page that starts with the specified text. The specified text can be truncated as long as the target element can be uniquely identified.
"<TAG>[OUTERTEXT= '<Element Text>']"
Where:
<TAG> is the DIV, SPAN, or TD text Block Element tag.
<Element Text> is a text string that uniquely identifies the HTML element.


HTML Descriptors for the A tag

One of the more common web page elements that isn't a text Block Element is the A tag which specifies the start or destination of a hypertext link. This tag uses either the HREF property or the INNERTEXT property as follows:
"A[HREF= '<URL>']"
"A[INNERTEXT= '<Element Text>']"
"A[ID= '<ID>']"
Where:
<URL> is the address of a web page.
<Element Text> is a text string that uniquely identifies the HTML element.
<ID> is the HTML element ID. Recording mode does not generate the ID, you need to look at the source of the page to find it.

Example:
"A[HREF= 'http://www.wintask.com/d']"


HTML Descriptors for the Area tag

Another tag that uses the HREF property is the Area tag. The Area tag defines the shape, coordinates, and associated URL of one hyperlink region within a client-side image map. It is used as follows:
"AREA[HREF= '<URL>']"
Where:
<URL> is the address of a web page.


HTML Descriptors for the Input tag

The Input tag is used in conjunction with web page elements that accept user input. The tag has a TYPE property associated with it that defines the type of control element. A second property, dependent upon the TYPE property, is used to uniquely the web page element. The following table summarizes the HTML Descriptor formats:

Element   Description Type   Property  Format
 Button  Push button control  SUBMIT  VALUE  "INPUT SUBMIT[VALUE= '<Element Text>']"
 Checkbox  Checkbox control  CHECKBOX  NAME  "INPUT CHECKBOX[NAME= '<Element Text>']"
 Edit      Single line text entry control  TEXT  NAME  "INPUT TEXT[NAME= '<Element Text>']"
 Image  Image control that submits the form  IMAGE  NAME  "INPUT IMAGE[NAME= '<Element Text>']"
 Password  Single line text entry control that masks the user's input  PASSWORD  NAME  "INPUT PASSWORD[NAME= '<Element Text>']"
 Radio button  Radio button control  RADIO  NAME  "INPUT RADIO[NAME= '<Element Text>']"
 Reset  Push button control resetting the user entered content on form  RESET  VALUE  "INPUT RESET[VALUE= '<Element Text>']"
 Submit  Push button control that submits the form  SUBMIT  VALUE  "INPUT SUBMIT[VALUE= '<Element Text>']"

ID property can be used too, for example "INPUT RADIO[ID= '<ID>']", Recording mode does not generate the ID for the HTML element, you need to find it in the source of the page.

The following code sample illustrates some of the HTML Descriptors using the Input tag to fill in a form on the WinTask x64 Free Downloads web page.

UsePage("WinTask Download - Free Version 30 days")
ClickHTMLElement("INPUT CHECKBOX[NAME= 'tutorial']")
WriteHTML("INPUT TEXT[NAME= 'organization']", "WinTask")
SelectHTMLItem("SELECT[NAME= 'prefix']", "Mr.")
WriteHTML("INPUT TEXT[NAME= 'last_name']", " Michael Mercer")
WriteHTML("INPUT TEXT[NAME= 'email']", "info@wintask.com")
WriteHTML("INPUT TEXT[NAME= 'country']", "USA")
WriteHTML("INPUT TEXT[NAME= 'sources']", "wintask")
ClickHTMLElement("INPUT SUBMIT[VALUE= 'Start Download']")

Using Variables to define HTML Descriptors

Many times an HTML descriptor is dependent on the value of another HTML element. You may enter a value into the text field on a web page that varies the value of an HTML element on the same or later web page. Inserting a variable string into an HTML Descriptor provides the flexibility needed to automate a variety of use cases. The following code sample demonstrates how a variable can be used with the INNERTEXT property.

Filename$ = "CB_20030402.ACT"
...
Function ClickFileNameElement()
...
' Wait for the specified page to load.
    UsePage("Welcome to my page")

' Right click the web page element that contains the text
    ' specified by the string variable Filename$.
    ClickHTMLElement("A[INNERTEXT= '"+Filename$+"']", right)
...
EndFunction

Note:
Be sure to surround the string variable with double quotes as shown. Failure to do so may cause script compile and/or run-time errors.
See article Data Driven Automation  for additional information on using variables to define HTML Descriptors.

Iterating an HTML Descriptor

Your automation script may need to access several successive HTML elements that all start with the same text string. In this circumstance, it's usually more efficient for the script to contain a loop that iterates through the desired HTML elements. The HTML Descriptor is built by the concatenation of two fixed strings and a variable converted into a string. The variable string is used as the value for the INDEX property as follows:
"P[CONTENT= 'the ultimate',INDEX= '"+str$(index)+"']"
Where:
str$(index) generates an integer value string based on variable index.
See article Data Driven Automation  for additional information on how to iterate an HTML Descriptors.

HTML Resources

The HTML Descriptors used by WinTask x64 follow the Microsoft DHTML model. For a comprehensive explanation of Microsoft's implementation of DHTML, please use the following link:
http://msdn2.microsoft.com/en-us/library/ms533029.aspx
For information on HTML standards please visit the World Wide Web Consortium (W3C) web site at:
http://www.w3.org

See also

Introduction
Web Synchronization
Web Advanced Synchronization
HTML Descriptor
How To Measure Response Time
Data Driven Automation