Home > Capturing Data > Capturing Data in a Web Page

Capturing Data in a Web Page

Using Recording mode or the WinTask Capture Wizard (menu item Start/Capture Wizard), you can easily retrieve data from a web page. Once the data is captured, it can be written to a file in Microsoft Excel format or in a database. The following steps illustrate how to extract names, phones and emails from a webpage and how to save them in an Excel format file. Internet Explorer, Mozilla Firefox or Google Chrome can be used (IE is used in the example below).

You can watch the video or follow the steps below.

  1. Start WinTask. If the Your First Script Wizard dialog box is displayed, click the Close button. The WinTask Editor window should now be active.

  2. From the WinTask toolbar, click the Rec button  to start recording your actions.

  3. The Start Recording Mode dialog box will appear asking What do you want to start before recording?. Select the Internet Explorer radio button and click the OK button.

  4. In the following dialog box, Launching Internet Explorer, type www.wintask.com/demos into the Web address text field and click the OK button.

  5. When the main page titled WinTask Demonstration Pages is displayed, click the Data Table link.

  6. Once the page Capture Data from a Web Table is loaded with the results, click the Capture icon  on the floating WinTask toolbar. The Capture button is the third button from the left on the toolbar with the "T and magic wand" icon.

  7. The first screen of the Capture Wizard subtitled Specify the window where the data to be captured are is displayed. Click the Spy button. The mouse cursor changes to a "Crosshair within a circle". Move the cursor over the area on the web page labeled Name. When you see the selection rectangle around the table, click the left mouse button to capture the data within the table.

  8. Press the Next button.

  9. The screen subtitled Specify the HTML element where the data to be captured are is displayed. When the script is replayed, WinTask uses the HTML descriptor of the table to locate the data content for capture. A suggested HTML Descriptor is displayed in the HTML Descriptor field. In our example, the suggested HTML Descriptor, "TABLE[CONTENT='Name']", is correct.

 

  1. Press the Next button.

  2. The screen subtitled Select the data you want to capture is displayed. Click and hold the mouse cursor on the first cell of interest and drag to the last cell of interest in a manner similar to data selection in Microsoft Excel. The selected data is highlighted and appears in the lower portion of the dialog box. In our example, the three columns and all rows have been selected.

  1. Press the Next button.

  2. The screen subtitled Specify where to copy the captured data is displayed. In our example, we will be writing the captured data to an Excel format file. Leave the Excel radio button selected and click the Next button.

  3. The screen subtitled Specify the Excel file where to copy the extracted data is displayed. In the Excel file text field, type "C:\wttest\data.xlsx" (under Vista/Windows 7/2008 use a folder name where you have the right to create a file, in our example C:\wttest). The default values for the other text fields are acceptable as is. Click the Paste into the script button to close the Capture Wizard.

  4. Close Internet Explorer.

  5. Stop Recording Mode by clicking the Stop button on the floating WinTask toolbar. The Stop button is the first button on the left side of the toolbar with the "X and rectangle" icon.

  6. The WinTask Editor window is now restored and the script statements generated during Recording Mode are inserted into the current script document window.

  7. Open Excel and examine the spreadsheet file "C:\wttest\data.xlsx". The spreadsheet will be empty. Close Excel.

  8. Return to the WinTask Editor and click  icon to execute the script. You are prompted for a name, call the script for example capturedata. After the script runs to completion, the spreadsheet file "C:\wttest\data.xlsx" will contain the extracted web page data.


'The steps above were used to generate the following script statements. Comments have been added to explain each script statement.
' These statements define arrays of strings that are used to hold
' the data extracted from the web page. There is one array per
' column of data extracted. Increase the size of the array as needed
' for your application.
Dim tabcell_2$(100)
Dim tabcell_1$(100)
Dim tabcell_0$(100)
 
' Start Internet Explorer and load the web page
StartBrowser("IE", "www.wintask.com/demos",3)
 
' Wait for the demonstration main page of wintask.com to load
UsePage("WinTask Demonstration Pages")
 
' Click the Data Table link
ClickHTMLElement("A[INNERTEXT= 'Data Table']")
 
' Wait for the page to be updated with the results
UsePage("Capture Data from a Web Table")
 
' Capture the contents of the table with the HTML Descriptor
' "Name". Rows 1 through 6 (inclusive) of Column 1 are
' copied into string array "tabcell_0$".
ret = CaptureTableHTML("TABLE[CONTENT='Name']", "R1C1:R6C1", tabcell_0$())
 
' Write the contents of string array "tabcell_0$" into the Excel
' spreadsheet file, Sheet 1, Column A, Rows 1 through 6.
ret = WriteExcel("C:\wttest\data.xlsx", "Sheet1!A1:A6", tabcell_0$())

'Capture the contents of the table with the HTML Descriptor
'"Name". Rows 1 through 6 (inclusive) of Column 2 are
' copied into string array "tabcell_1$".
ret = CaptureTableHTML("TABLE[CONTENT='Name']", "R1C2:R6C2", tabcell_1$())
 
' Write the contents of string array "tabcell_1$" into the Excel
' spreadsheet file, Sheet 1, Column B, Rows 1 through 6.
ret = WriteExcel("C:\wttest\data.xlsx", "Sheet1!B1:B6", tabcell_1$())

'Capture the contents of the table with the HTML Descriptor
'"Name". Rows 1 through 6 (inclusive) of Column 3 are
' copied into string array "tabcell_2$".
ret = CaptureTableHTML("TABLE[CONTENT='Name']", "R1C3:R6C3", tabcell_2$())
 
' Write the contents of string array "tabcell_2$" into the Excel
' spreadsheet file, Sheet 1, Column C, Rows 1 through 6.
ret = WriteExcel("C:\wttest\data.xlsx", "Sheet1!C1:C6", tabcell_2$())
 
' Close Internet Explorer
CloseWindow("IEXPLORE.EXE|IEFrame|Capture Data from a Web Table - Windows Internet Explorer",1)


See also

Capturing Data in a Windows Application
Capturing Data Using OCR