Frequently Asked Questions for LexiURL Searcher
See also: Installation instructions - Basic instructions - Standard instructions - Help manual - SocSciBot Network documentation
- What exactly is a web impact report?
- What exactly is a link impact report?
- What exactly is a network diagram?
- What exactly is a Web Environment report and how is it calculated?
- What is the difference between standard reports made using the wizards and reports made using the advanced options?
- Why does LexiURL Searcher sometimes return less than the estimated number of results, even for searches with less than 1,000 matches?
- What is the best way to print a Network Diagram?
- How do you get additional results beyond the normal maximum with the professional version of LexiURL Searcher?
- How are the random links in the LexiURL Searcher Reports generated?
- Can I export LexiURL Searcher data to UCINET for social network analysis?
- Is there any online documentation for SocSciBot Network graph drawing?
- See also the LexiURL Searcher blog for addional questions or to report problems.
What exactly is a web impact report?
A web impact report is a collection of statistics about web pages that mention a given word or phrase (or any search query) - or a list. These statistics inlude the estimated number of web pages, and a breakdown of the web sites and Top Level Domains (including country codes) sending the link. The report works best if it is conducted as a comparison between different words or phrases: you can enter a list of them and then the results will be automatically comapred between them. The amount of web mentions of each term is called its "web impact".
Common applications include comparing the online impact of similar ideas, similar books, news stories or similar academic publications. Here is an example for comparing the online impact of similar books (a graph and certification have been added to the report).
How is a web impact report calculated using the wizards?
Following the wizard instructions will automatically create a basic report. This is the underlying process that LexiURL Searcher conducts.
1) Each search query in your list is submitted to Live Search to get the first (up to) 1000 URLs matching the query.
2) The URLs are then sorted and filtered by factors such as full domain name, site domain name, Top-Level Domain and the results are reported separately for each query.
3) A list of random URLs is generated from the URLs matching each query, with at most one per domain name (this is for any subsequent human content analysis)
4) The results are reported in a mini-web site.
What do the advanced options do for web impact reports?
They give the option to use Yahoo! instead of Live Search as the data source.
What exactly is a link impact report?
A link impact report is a collection of statistics about web pages that link to a given web site or URL - or a list. These statistics inlude the estimated number of web pages, and a breakdown of the web sites and Top Level Domains (including country codes) sending the link. The report works best if it is conducted as a comparison between different sites or URLs: you can enter a list of them and then the results will be automatically comapred between them. The amount of links to each site or URL is called its "link impact".
Common applications include comparing the web impact of similar web sites or web pages, or to evaluate the web impact of a single web site by comparing its web impact to that of a set of similar web sites. Here is an example for comparing the web impact of similar web sites (a graph and certification have been added to the report).
How is a link impact report calculated using the wizards?
Following the wizard instructions will automatically create a basic report. This is the underlying process that LexiURL Searcher conducts.
1) Each web site domain name or URL in your list is converted into a query in the syntax of Yahoo, for all web pages outside the web site that link to the web site or URL ("site inlinks" in webometrics terminology).
2) Each link query created from your list is submitted to Yahoo to get the first (up to) 1000 URLs matching the query.
3) The URLs are then sorted and filtered by factors such as full domain name, site domain name, Top-Level Domain and the results are reported separately for each query.
4) A list of random URLs is generated from the links to each site/URL, with at most one per domain name (this is for any subsequent human content analysis)
5) The results are reported in a mini-web site.
What do the advanced options do for link impact reports?
There are no advanced options for link impact reports.
What exactly is a network diagram?
A network diagram is a network drawn to illustrate the strength of interlinking between a set of URLs and/or web sites. In the diagram, a circle is drawn to represent each URL or web site and arrows are drawn from circle A to circle B if there is a hyperlink from web site/page A to web site/page B. Overall, the diagram illustrates the pattern of interconnectivity between the collection of sites/URLs.
How is a network diagram drawn using the wizards?
Following the wizard instructions will automatically create a basic network diagram. This is the underlying process that LexiURL Searcher conducts.
1) Each pair (A, B) of web site domain names or URLs in your list is converted into a query in the syntax of Yahoo, for all web pages matching URL/site A that link to URL/site B.
2) Each link query created from your list is submitted to Yahoo to get an estimate of the number of URLs matching the query.
3) A graph matrix is constructed from the results of 2) above, with each URL/site having its own node and an arrow from URL/site A to URL/site B included with width w = 10 x (# of links from A to B)/(maximum # of links between any pair of sites/URLs) if there are some links from A to B and the width w is at least 0.1.
4) The network matrix is loaded into SocSciBot Network to be drawn.
What do the advanced options do for network diagrams?
The link network can be changed to a co-link network. This is similar to a link network except that the links counted are not the total links between a pair of web sites/URLs A and B but instead the total number of pages linking to both A and B is counted. This is an indirect form of link counting. Pairs of web sites or URLs that are similar tend to have higher link counts, so colink networks tend to cluster similar sites together.
What is the difference between standard reports made using the wizards and reports made using the advanced options?
The advanced options allow some of the parameters in the creation of the reports to be altered. If more changes are needed than those offered by the advanced options, then the standard interface can be used instead.
Why does LexiURL Searcher sometimes return less than the estimated number of results, even for searches with less than 1,000 matches?
There are two reasons why the number of URLs returned (under 1000) is often less than the number of estimated matches. The first is that duplicates can be ignored - and this also extends to near duplicates, or pages that are similar around the text that would be returned by the search engine as the page description snippet. The second reason is that a maximum of two hits per web site are normally returned unless you select the "disable host collapsing" search option in LexiURL Searcher.
What is the best way to print a Network Diagram?
A network can be printed using the Print option in the File menu. There are various ways in which a network can be included another document, such as a Word file. The options below list various ways, in increasing order of image quality.
- Paint Bitmap: press the Print Scr button on the keyboard, load Microsoft Paint (Start|All Programs|Accessories|Paint) and press Control-V. This should copy the screen into the Paint program, where it can be edited down to the correct size. Once edited, the file can be saved (as a bitmap .bmp for the highest resolution, or as a GIF .gif for the smallest file size) and then incorported into a document (e.g., in Word, using Insert|Image|From File).
- Medium resolution TIF. Within SocSciBot Network, select File|Print and then choose the Microsoft Office Document Image Writer printer driver from the Printer dialog box (in the Printer Name section). Use this to print a .TIFF version of the network, which can be inserted into a document, as for option 1.
- High resolution TIF. A high resolution TIF printer driver is normally needed to get a higher resolution file than 300dpi. Once this is installed (it will probably have to be bought) follow the instructions for 2 above, except selecting the new printer driver.
I own the professional version of LexiURL Searcher and want additional results beyond the normal maximum. How can I get them?
Essentially, to get lots of information about a single search:-
- Create a plain text file containing the search term/phrase (if there is more than one, enter one search per line).
- Start LexiURL Searcher and select the option to go to the Classic Interface
Look on the right for the Advanced Section and click the Extended URL lists tab - Check the Use Query Splitting... Option (the default is the maximum level of query splitting which can take a long time and many queries. If you just want double or quadruple the results, change the "max splits..." Option to 1 or 2 and it will run more quickly.
- Click Run searches in a File and select the plain text file with the searches in.
- Once the searches have finished, the results will all be in a plain text file with a similar name to the original file. If you need the results summarised and duplicate URLs eliminated (there will be many duplicates in the results) then please use the Reports menu, "make a set of.. " option.
Please note that this uses Live Search rather than Google - if you have a Google API key then you can use it but Google has stopped giving out new ones.
Also, the maximum number of searches allowed by Live Search is 10,000 per day and query splitting can take up a lot - 20 for a single query and 20*2^n for the nth level of query splitting. The programme will stop when the query limit has been reached and can be restarted after 24 hours. Please be careful with this feature and only use it if really necessary because it generates a lot of searches.
How are the random links in the LexiURL Searcher Reports generated?
The random URLs are generated by randomly slecting on the basis of domain names as follows:
- A complete list of domain names is assembled from the URLs (as in the domain name list pages of the reports)
- A random number generator is used to select up to 100 domain names from the list.
- For each domain name another random number generator is used to select a single URL from the URL list that has that domain name.
The list of random URLs is the set of URLs produced by this process.
Can I export LexiURL Searcher data to UCINET for social network analysis?
Yes, you can export interlinking, co-inlinking and co-outlinking results to UCINET via the Pajek file format. Use the LexiURL Searcher Utilities menu to convert your data into a Pajek file and then in UCINET import your Pajek file via Data|Import text file|Pajek and then selecting the Pajek file, [and entering a name for the Output UCINET network file].
What exactly is a Web Environment Report and how is it calculated?
A Web Environment Report is an attempt to create a network to illustrate the most relevant web sites for the web site you input. These web sites are relevant in the sense of frequently linked to in pages that also link to your web site. Here is what the program does to create this picture:
1) Identify web pages linking to your web site, via Yahoo! searches
2) Download all the pages identified in 1) and extract their links
3) Remove all site self-links from the set generated in 2) (i.e., links between different pages of the same site)
4) Take the 50 web sites attracting the most links (using the results of 3)
5) Plot a network of these 50 sites with connections between pairs of sites based upon the number of times they are both linked to from the same web site (complicated!)
This is a "co-inlink" (or colink) network diagram, rather than a direct link network diagram.