Capturing Web Data Items
Using the WinTask Capture Wizard (menu item Start/Capture Wizard), you can easily retrieve data from a web page. Once the data is captured, it can be written to a file in Microsoft Excel format. The following steps illustrate how to extract addresses from the white pages directory web site, www.anywho.com, and to save them in an Excel format file.
Start WinTask. If the Your First Script Wizard dialog box is displayed, click the Close button. The WinTask Editor window should now be active.
From the WinTask toolbar, click the Rec button
to start recording your actions.
The Start Recording Mode dialog box will appear asking What do you want to start before recording?. Select the Internet Explorer radio button and click the OK button.
In the following dialog box, Launching Internet Explorer, type "www.anywho.com" into the Web address text field and click the OK button.
For this example, we will search for the addresses of all persons named "Dave Morgan" listed in Texas. Please wait until the page has finished loading.
In the Last Name text field, type "Morgan".
In the First Name text field, type "Dave".
From the State pull-down list, select TX and click the Search button.
Once the page reloads with the queried results, click the Capture button on the floating WinTask toolbar. The Capture button is the third button from the left on the toolbar with the "T and magic wand" icon.
The first screen of the Capture Wizard subtitled Specify the window where the data to be captured are is displayed. Click the Spy button. The mouse cursor changes to a "Crosshair within a circle". Move the cursor over the area on the web page labeled Residential Listings. When you see the selection rectangle around the table, click the left mouse button to capture the data within the table.
Press the Next button.
The screen subtitled Specify the HTML element where the data to be captured are is displayed. When the script is replayed, WinTask uses the HTML Descriptor of the table to locate the data content for capture. A suggested HTML Descriptor is displayed in the HTML Descriptor field. In our example, the suggested HTML Descriptor, "TABLE[CONTENT='Residential']", is correct.
Press the Next button.
The screen subtitled Select the data you want to capture is displayed. Click and hold the mouse cursor on the first cell of interest and drag to the last cell of interest in a manner similar to data selection in Microsoft Excel. The selected data is highlighted and appears in the lower portion of the dialog box. In our example, both columns and all rows have been selected.
Press the Next button.
The screen subtitled Specify where to copy the captured data is displayed. In our example, we will be writing the captured data to an Excel format file. Leave the Excel radio button selected and click the Next button.
The screen subtitled Specify the Excel file where to copy the extracted data is displayed. In the Excel file text field, type "C:\data.xls". The default values for the other text fields are acceptable as is. Click the Paste into the script button to close the Capture Wizard.
Close Internet Explorer.
Stop Recording Mode by clicking the Stop button on the floating WinTask toolbar. The Stop button is the first button on the left side of the toolbar with the "X and rectangle" icon.
The WinTask Editor window is now restored and the script statements generated during Recording Mode are inserted into the current script document window.
Open Excel and examine the spreadsheet file "C:\data.xls". The spreadsheet will be empty. Close Excel.
Return to the WinTask Editor and save the script. The script can now be compiled and replayed. After the script runs to completion, the spreadsheet file "C:\data.xls" will contain the extracted web page data.
The steps above were used to generate the following script statements. Comments have been added to explain each script statement.
' These statements define arrays of strings that are used to hold
' the data extracted from the web page. There will be one array per
' column of data extracted. Increase the size of the array as needed
' for your application.
Dim tabcell_1$(100)
Dim tabcell_0$(100)
' Start Internet Explorer and load the web page
StartBrowser("IE", "www.anywho.com")
' Wait for the home page of anywho.com to load
UsePage("AnyWho: Internet Directory Assistance; Yellow Pages, White Pages, Toll-Free Numbers, Maps and Directions")
' The next two statements enter the name of the person we're
' searching for
WriteHTML("INPUT TEXT[NAME= 'lastname']", "Morgan")
WriteHTML("INPUT TEXT[NAME= 'firstname']", "Dave")
' Select the state from the pull-down list
SelectHTMLItem("SELECT[NAME= 'state']", "TX")
' Click the Submit button to start the query
ClickHTMLElement("INPUT IMAGE[NAME= 'btnsubmit',INDEX='2']")
' Wait for the page to be updated with the query results
UsePage("AnyWho: Internet Directory Assistance; Yellow Pages, White Pages, Toll-Free Numbers, Maps and Directions")
' Capture the contents of the table with the HTML Descriptor
' "Residential". Rows 1 through 11 (inclusive) of Column 1 are
' copied into string array "tabcell_0".
ret = CaptureTableHTML("TABLE[CONTENT='Residential']", "R1C1:R11C1", tabcell_0$())
' Write the contents of string array "tabcell_0" into the Excel
' spreadsheet file, Sheet 1, Column A, Rows 1 through 11.
ret = WriteExcel("C:\data.xls", "Sheet1!A1:A11", tabcell_0$())
' Capture the contents of the table with the HTML Descriptor
' "Residential". Rows 1 through 11 (inclusive) of Column 2 are
' copied into string array "tabcell_1".
ret = CaptureTableHTML("TABLE[CONTENT='Residential']", "R1C2:R11C2", tabcell_1$())
' Write the contents of string array "tabcell_1" into the Excel
' spreadsheet file, Sheet 1, Column B, Rows 1 through 11.
ret = WriteExcel("C:\data.xls", "Sheet1!B1:B11", tabcell_1$())
' Close Internet Explorer
CloseWindow("IEXPLORE.EXE|IEFrame|AnyWho: Internet Directory Assistance; Yellow Pages, White Pages, Toll-Free Numbers, Maps and D - Windows Internet Explorer",1)