Classic interface tutorial
These instructions cover mostly the same topics as the Basic Instructions for Wizards but the classic interface instructions have the advantage that they are easier to customise for variations. See also the document Introduction to LexiURL Searcher - an overview for new users.
This section gives a step-by-step example of using the classic interface to create a Web Impact Report. To give an overview: the user must construct a text file containing a list of searches first. Then the user must select a search engine and any search options and instruct LexiURL Searcher to start submitting searches and reporting the results. Finally, when the searches are complete then the user must select a processing option and instruct LexiURL Searcher to apply it to the appropriate results file. Below is a simple example to illustrate how this process works.
The example is to compare the online impact of three books: Link Analysis: An Information Science Approach, Mining the web: Discovering Knowledge from Hypertext Data and Information politics on the web.
- Installation Download LexiURL Searcher from lexiurl.wlv.ac.uk and follow the instructions on the web site to install it and to get it working
- Input data generation Create a text file containing the three book titles in quotes, one title per line. This file will be the input for LexiURL Searcher and the lines are each single searches. The quotes are included to ensure that the searches are exact phrase matches. The file should be constructed in Windows Notepad (Start/Programs/Accessories/Notepad) or a similar text editing program but not in a word processor. The resulting file might be called test.txt and contain the following contents.
"Link analysis an information science approach"
"Mining the web discovering knowledge from hypertext data"
"Information politics on the web"
- Running the searches Start LexiURL Searcher and ensure that Live Search is the selected search engine by checking that this is the search engine ticked in the Search Engine menu. Move the file test.txt to a new empty folder in Windows to protect your computer from LexiURL Searcher accidentally overwriting or deleting files. Now click the “Search” button and select the file test.txt. The searches will be submitted over a period of a few minutes and LexiURL Searcher will report when they are complete. When the search is complete, three new text files will be found in the same folder as the original, with file names including “long results”, “short results” and “result counts per page”. If all that is needed is the hit count estimates, then these can be found in the short results file, shown below. The first number is the hit count estimates from the first results page and the last number is the actual number of URLs returned by the search engine. In this case the initial hit count estimates are wildly inaccurate but the last column seems more likely to give reasonable estimates (see later chapters, however).
450 "Link analysis an information science approach" 152
7840000 "Mining the web discovering knowledge from hypertext data" 729
90700000 "Information politics on the web" 310
If more information is needed than just the number of matching URLs then this can be extracted from the “long results” file by using LexiURL Searcher features for producing reports based upon search engine results.
- Creating the reports A standard set of summary reports can be produced from the raw search engine results in the “long results” file. This lists the matching URLs returned, some text extracted from the matching page and also repeats the search used. The number of URLs returned is never over 1,000, which is the maximum returned by search engines. To create a set of standard summary reports, select “Make a Set of Standard Impact Reports from a Long Results File” from the “Reports” menu, select the long results file and follow the instructions. This generates a set of web pages summarising the searches. To view the results, double-click on the file index.htm in the new folder created to load it into a web browser. This web page lists the main summary statistics and gives links to the more detailed results. Below is an extract from the overview.html page, which summarises the main results.
As shown above, the main table of a LexiURL Searcher impact report lists the number of URLs, domains, web sites, Second or Top-Level-Domains (STLDs) and TLDs matching each search, as calculated from the long result files. This page can be reached from the main index.html page by clicking on the “Overview of results” link. The most reliable impact indicator is normally the number of domains rather than the number of URLs due to the possibility that text or links are copied across multiple pages within a web site. The results suggest that Mining the web: Discovering Knowledge from Hypertext Data has the most online impact, based upon its domain count of 560. The results also include full lists of matching URLs, domains, sites, STLDs and TLDs for each query: clicking on the appropriate links on the home page reveals these lists. The diagram below illustrates such a list: the top half of a table of domain names of URLs matching the second query.
Also included in the web impact report are random lists of URLs matching each search, with a maximum of one per domain. These are intended for use in content analysis classification exercises, as discussed in the online impact analysis chapter. Finally, there is some information about the search at the top of the page, which can be edited in a web editor if necessary, and near the bottom of the home page is a link to a page with a comparative breakdown of TLDs in the STLDs returned for each of the queries.
It is important to note again that the results of an impact query do not give fair impact comparisons if the search engine has stopped giving results at 1,000. In fact, search engines sometimes stop after about 950 unique results and so it is reasonable to be safe by only relying upon the results if all the URL counts are below 925. See the advanced chapter for query splitting techniques for searches with more results than this.
The instructions in the classic example section above apply almost without change to link impact reports. A link impact report is an analysis of the web pages that link to any one of a set of URLs or Web sites. To create a link impact report, a list of URLs or domain names can be fed into LexiURL Searcher and it will download a list of pages that link to them, via search engine searches, and then produce a summary report based upon the URLs of these pages. The main difference is that the searches used are not phrase searches like "Link analysis an information science approach" but are link searches like linkdomain:linkanalysis.wlv.ac.uk –site:wlv.ac.uk, as described here. In addition, the searches can only be carried out with Yahoo! whilst the other search engines do not support equally powerful searches.
A few less common types of link and colink diagram can be created but for this the classic interface will be needed rather than the Wizards. The instructions below explain how to create a standard network diagram with the Classic Interface and this method can be customised for alternative network diagrams.
This section illustrates the steps needed to create a network diagram via the classic interface with a small example. The same technique is possible for any collection of web sites as long as they all have their own unique domain name, there are not too many sites in the list and there are some links between the sites.
- Create a plain text file using Windows Notepad or a similar text editor containing a list of the domain names of each web site, one per line. For example, the file might be called smalllist.txt and contain the following list of domains.
- Use LexiURL Searcher’s ability to generate a list of searches between all pairs of sites by selecting Make Set of Link Searches Between Pairs of Domains from the Make Searches menu and selecting the list of domains just created (e.g., smalllist.txt). This will create a new text file (e.g., called smalllist.searches.txt) containing the necessary searches. The searches can be seen by opening the file.
- Select the Yahoo search engine by choosing Yahoo from the Search Engine menu. This is necessary because Yahoo is the only search engine that can run the link queries. In addition, from the Search Options menu uncheck the Get all Matching URLs option. For simple network diagrams the link counts from searches will be used but not the full set of URLs returned from the search, and so the searching process is sped up by stopping the full set of results from being gathered by unchecking this option.
- Click on the Run All Searches in a File button and select the file with searches in (e.g., called smalllist.searches.txt) and wait for the searches to finish. This may take several minutes but for a large network it could take several hours.
- Once the searches are finished, the hit count estimates in the short results file can be used as the raw data for a network diagram. To convert the short results into a format that can be read by the Pajek or SocSciBot Network programs, select Convert link or colink short results file to Pajek Matrix from the Utilities menu and select the short results file (e.g., smalllist.searches. yahoo short results.txt). The new file created will be in the necessary format (e.g., smalllist.searches. yahoo short results.net). This can be loaded into Pajek, if installed on the computer, or can be viewed in SocSciBot Network.
- To use SocSciBot Network to display a network diagram, select SocSciBot Network from the File menu and the visualisation screen will appear. From the new SocSciBot Network screen, select Open from the File menu and select the network file (e.g., smalllist.searches.shortresults.matrix.net). The network will then be displayed on screen in a random format.
- See the online documentation for SocSciBot Network to see how to arrange and print the diagram.
Colink diagrams are often more revealing than link diagrams because they present an external perspective on a collection of web sites and can reveal structure even if the web sites in question do not interlink significantly.
If a colink network diagram is needed instead of a link network diagram then the LexiURL Searcher wizard can be followed as for link networks but with the modification of checking the Show advanced options box and selecting co-inlink networks instead of link networks.
If following the non-wizard steps then follow the instructions above but a modification to step 2 is needed to create a file of colink searches. In webometrics terminology the searches needed are actually for co-inlinks and so the option Make Set of Co-inlink and Co-outlink Searches from a list of URLs or Domains from the Make Searches menu should be used instead. This creates two files. In Step 3, select the new file of co-inlink searches (ignoring the other new file) for the searches. Note that the lines drawn in a network diagram for colinks should not have arrows on because colinks do not have a direction.