Frequently Asked Questions for Webometric Analyst
- How do I get a key to run web searches?
- What kinds of link analysis are possible with Webometric Analyst?
- What is a web impact report?
- What is a link impact report?
- What is a network diagram?
- How is a network diagram drawn using the wizards?
- What do the advanced options do for network diagrams?
- What is the maximum size of a network diagram?
- What do the line widths mean in network diagrams?
- What are the SNA metrics in the Stats network menu?
- What is a Web Environment Network and how is it calculated?
- What is an URL Citation?
- What is a title mention?
- What is a linked title mention?
- What is the format of the input file for title mentions, linked title mentiosns and URL citations?
- What is the difference between standard reports made using the wizards and reports made using the advanced options?
- How does the Twitter tab work?
- How does the Altmetric tab work?
- How does the YouTube tab work?
- How does the Mendeley tab work?
- How does the Google Book Search tab work?
- How do I combine searches with the OR operator?
- How do I use the Make title mention and URL citation searches option (Make Searches menu)?
- Why does Webometric Analyst sometimes return less than the estimated number of results, even for searches with less than 1,000 matches?
- What is the best way to print a Network Diagram?
- How can I get additional results beyond the normal maximum?
- How are the random links in the Webometric Analyst Reports generated?
- Can I export Webometric Analyst data to UCINET for social network analysis?
- Is there any online documentation for SocSciBot Network graph drawing?
- See also the Webometric Analyst blog for addional questions or to report problems.
- I have run several searches (e.g, for URL citations to different pages within a single site). How do I get a report that merges the results of all queries into a single list?
- What do the YouTube searches do?
- How can I create a reply network for a YouTube video?
- How is spam removed from the results?
- Why do I get strange results? - (I edit my input files with Excel)
A web impact report is a collection of statistics about web pages that mention a given word or phrase (or any search query) - or a list. These statistics inlude the estimated number of web pages, and a breakdown of the web sites and Top Level Domains (including country codes) mentioning the word or phrase. The report works best if it is conducted as a comparison between different words or phrases: you can enter a list of them and then the results will be automatically comapred between them. The amount of web mentions of each term is called its "web impact".
Common applications include comparing the online impact of similar ideas, similar books, news stories or similar academic publications. Here is an example for comparing the online impact of similar books (a graph and certification have been added to the report).
Following the wizard instructions will automatically create a basic report. This is the underlying process that Webometric Analyst conducts.
1) Each search query in your list is submitted to Live Search to get the first (up to) 1000 URLs matching the query.
2) The URLs are then sorted and filtered by factors such as full domain name, site domain name, Top-Level Domain and the results are reported separately for each query.
3) A list of random URLs is generated from the URLs matching each query, with at most one per domain name (this is for any subsequent human content analysis)
4) The results are reported in a mini-web site.
A link impact report is a collection of statistics about web pages that mention the URL of a given web site or URL (i.e., URL citations) - or a list. These statistics inlude the estimated number of web pages, and a breakdown of the web sites and Top Level Domains (including country codes) sending the URL citation. The report works best if it is conducted as a comparison between different sites or URLs: you can enter a list of them and then the results will be automatically comapred between them. The amount of URL citations to each site or URL is called its "URL citation link impact", or just "link impact".
Common applications include comparing the web impact of similar web sites or web pages, or to evaluate the web impact of a single web site by comparing its web impact to that of a set of similar web sites. Here is an example for comparing the web impact of similar web sites (a graph and certification have been added to the report).
Following the wizard instructions will automatically create a basic report. This is the underlying process that Webometric Analyst conducts.
1) Each web site domain name or URL in your list is converted into a query in the syntax of Bing, for all web pages outside the web site that cite to the web site or URL ("URL citations" in webometrics terminology).
2) Each URL citation query created from your list is submitted to Bing to get the first (up to) 1000 URLs matching the query.
3) The URLs are then sorted and filtered by factors such as full domain name, site domain name, Top-Level Domain and the results are reported separately for each query.
4) A list of random URLs is generated from the URL citations to each site/URL, with at most one per domain name (this is for any subsequent human content analysis)
5) The results are reported in a mini-web site.
There are no advanced options for link impact reports.
A network diagram is a network drawn to illustrate the strength of interlinking between a set of URLs and/or web sites. In the diagram, a circle is drawn to represent each URL or web site and arrows are drawn from circle A to circle B if there is a URL citation in web site/page A to web site/page B. Overall, the diagram illustrates the pattern of interconnectivity between the collection of sites/URLs.
Following the wizard instructions will automatically create a basic network diagram. This is the underlying process that Webometric Analyst conducts.
1) Each pair (A, B) of web site domain names or URLs in your list is converted into a query in the syntax of Bing, for all web pages matching URL/site A that contain the URL/domain name of URL/site B.
2) Each query created from your list is submitted to Bing to get an estimate of the number of URLs matching the query.
3) A graph matrix is constructed from the results of 2) above, with each URL/site having its own node and an arrow from URL/site A to URL/site B included with width w = 10 x (# of URL citations in A to B)/(maximum # of URL citations between any pair of sites/URLs) if there are some URL citations in A to B and the width w is at least 0.1.
4) The network matrix is loaded into SocSciBot Network to be drawn.
The link network can be changed to a co-link network. This is similar to a link network except that the links (URL citations) counted are not the total URL citations between a pair of web sites/URLs A and B but instead the total number of pages with an URL citation to both A and B is counted. This is an indirect form of link counting. Pairs of web sites or URLs that are similar tend to have higher URL citation counts, so colink networks tend to cluster similar sites together.
The theoretical maximum size of a network diagram is only limited by your computer's memory. Since large network diagrams are messy and consume huge numbers of queries to create, you are strongly recommeded to avoid creating networks with more than 50 nodes. If you have more than 50 nodes, then first create a web impact report for your websites (using URL citations or title mentions - whichever you are using for your network) and select the 50 nodes with the highest scores and draw your network with these. This represents the heart of your network. It should be practical to create, with about 2,500 queries, and should not look too messy when drawn. The number of queries needed to create a network with N nodes is N*N - more than this if you have multiple queries for any sites.
What is the difference between standard reports made using the wizards and reports made using the advanced options?
The advanced options allow some of the parameters in the creation of the reports to be altered. If more changes are needed than those offered by the advanced options, then the standard interface can be used instead.
Why does Webometric Analyst sometimes return less than the estimated number of results, even for searches with less than 1,000 matches?
There are two reasons why the number of URLs returned (under 1000) is often less than the number of estimated matches. The first is that duplicates can be ignored - and this also extends to near duplicates, or pages that are similar around the text that would be returned by the search engine as the page description snippet. The second reason is that a maximum of two hits per web site are normally returned unless you select the "disable host collapsing" search option in Webometric Analyst.
A network can be printed using the Print option in the File menu. There are various ways in which a network can be included another document, such as a Word file. The options below list various ways, in increasing order of image quality.
- Paint Bitmap: press the Print Scr button on the keyboard, load Microsoft Paint (Start|All Programs|Accessories|Paint) and press Control-V. This should copy the screen into the Paint program, where it can be edited down to the correct size. Once edited, the file can be saved (as a bitmap .bmp for the highest resolution, or as a GIF .gif for the smallest file size) and then incorported into a document (e.g., in Word, using Insert|Image|From File).
- Medium resolution TIFF. Within SocSciBot Network, select File|Print and then choose the Microsoft Office Document Image Writer printer driver from the Printer dialog box (in the Printer Name section). Use this to print a .TIFF version of the network, which can be inserted into a document, as for option 1.
- High resolution TIFF. A high resolution TIFF printer driver is normally needed to get a higher resolution file than 300dpi. Once this is installed (it will probably have to be bought) follow the instructions for 2 above, except selecting the new printer driver.
If there are more than 1,000 results for a search then this is a problem due to the search engine 1,000 result URLs maximum. It is possible to gain extra matches using the “query splitting” technique.
The random URLs are generated by randomly slecting on the basis of domain names as follows:
- A complete list of domain names is assembled from the URLs (as in the domain name list pages of the reports)
- A random number generator is used to select up to 100 domain names from the list.
- For each domain name another random number generator is used to select a single URL from the URL list that has that domain name.
The list of random URLs is the set of URLs produced by this process.
Yes, you can export interlinking, co-inlinking and co-outlinking results to UCINET via the Pajek file format. Use the Webometric Analyst Utilities menu to convert your data into a Pajek file and then in UCINET import your Pajek file via Data|Import text file|Pajek and then selecting the Pajek file, [and entering a name for the Output UCINET network file]. The network drawing part of Webometric Analyst also has a stats menu that reports some social network analysis statistics.
A Web Environment Network is an attempt to create a network to illustrate the most relevant web sites for the web site you input. These web sites are relevant in the sense of frequently linked to by pages that have URL citation links to your web site. Lines between nodes in the diagram represent similarity between web sites, as measured by co-inlink counts. Here is what the program does to create this picture:
1) Identifies web pages mentioning the URL of your web site, via Bing or Google searches (i.e., URL citation inlinks to the site; you can also request title mention searches instead). This uses up to 20 Bing/Google quereis.
2) Downloads all the pages identified in 1) and extract their links
3) Removes all site self-links from the set generated in 2) (i.e., links between different pages of the same site)
4) Takes the 50 web sites attracting the most links (using the results of 3)
5) Plots a network of these 50 sites with connections between pairs of sites based upon the number of times they are both linked to from the same web site (complicated!)
This is a "co-inlink" (or colink) network diagram, rather than a direct link network diagram.
The Web Envronment Network is saved in a Pajek file - when running this command you will be asked for a name and location for this file, which will then be created.
I have run several searches (e.g, for URL citation to different pages within a single site). How do I get a report that merges the results of all queries into a single list?
This can be done by making a new version of the "long results" file that changes all the different queries into a single dummy query and then making a new report from the new long results file. To do this, follow the instructiosn below:
1) From the Utilities menu select either
- a) Make long results file with all the queries replaced by "merged queries" OR - if you have any queries that you do NOT want merged in the file then select
- b) Make long results file with all the queries matching one or more strings of text added as a merged pseudo-query
Then select the long results file with all the search results in.
2) From the Reports menu, select Make a Set of Standard Impact Reports.... and select the new long results file. The new report will process all the merged queries as if they were a single query.
When creating text files of searches for use in the classic interface, separate the different searches with the pipe | character. Searches either side of the pipe character will be submitted and processed separately but any reports prepared with the data will automatically combine the different searches and eliminate duplicates. (e.g., University|College would result in a separate University and College searches, with the results combined). If you need the short results then these can be combined using the appropriate menu command in the Utilities menu (Consolidate short results...).
This OR operator is not supported by search engines but is internally processed by Webometric Analyst. It allows you to have long Boolean queries that are too long for the search engines to process.
This takes a single input file with information about URLs and titles (typically of organisations) and makes two sets of searches: title mentions, and URL Citations.
The input file must contain one line per web site.
The line must start with the name of the web site, followed by a tab character, followed by the domain name of the web site. If there are multiple names then these should be separated by the pipe character |. If there are multiple domains then these should be separated by spaces. The searches generated will always exclude all sites listed. SPECIAL: don't include commas in titles as all text after each comma is ignored.
It is the inclusion of an URL (or URL without the http://) in a web page, with or without a hyperlink. For example, "I like news.bbc.co.uk" is an URL citation in this page for the BBC news web site. URL Citations are a replacement for hyperlinks in Webometric Analyst because Bing does not allow hyperlink searches. In the picture below is an URL citation for the web site www.wlv.ac.uk (and also for the web page http://www.wlv.ac.uk/lssc). Note that in this case the URL citation is also a link but it does not have to be.
In contrast, the picture below shows a link that is to the university of Wolverhampton: You can't tell, but if you clicked on the link then you would arrive at a page in the University of Wolverhampton web site. This is a link that is not an URL citation because the URL is invisible - available only in the hidden (HTML) code of the page.
It is the inclusion of a title in a web page, with or without a hyperlink. For example, "I like the BBC News" is a title mention in this page for BBC News. Title mentions can be used as a replacement for hyperlinks in Webometric Analyst instead of URL citations, if you like. For these, you have to identify the names of the web sites and enter them as quotes, e.g., "BBC News". Note that you also have to enter a web site URL in addition to the title. Since some organisations have ambiguous names, you might have to add extra text to identify the organisation more exactly. For instance, "orange mobile phone network" would be more precise than "orange" and result in less false matches (it would also miss many correct matches too, but this is a different problem).
The picture below contains a title mention for the University of Wolverhampton. In this case it is not associated with a hyperlink.
It is the inclusion of a title in a web page, AND an associated hyperlink. For example, "I like the BBC News" is not a linked title mention in this page for BBC News becuase there is no hyperlink in this page to the BBC News web site. Linked title mentions are found in two stages: (a) search engine queries for the title and then (b) checking matching pages for associated hyperlinks. This can be done automatically in Webometric Analyst. This is useful when title mentions return many false matches and these can be removed by the check for associated hyperlinks. The extracts below are examples of linked title mentions for the University of Wovlerhampton - both contain the phrase "University of Wolverhampton" in the text and a hyperlink to a page with the wlv.ac.uk web domain (the blue text in both cases is a clickable link). In the first example the link is not associated with the university name but this is OK.
The input file must contain one line per web site.
Format 1 (for URL citations only): Each line must contain domain name or URL of one web site. If there are multiple domains or URLs then these should be separated by the pipe character |.
Format 2: (for URL citations, title mentions or linked title mentions): The line must start with the name of the web site, followed by a tab character, followed by the domain name or URL of the web site. If there are multiple names then these should be separated by the pipe character |. If there are multiple domains or URLs then these should be separated by the pipe character |. The searches generated will always exclude all sites listed. SPECIAL: don't include commas in titles as all text after each comma is ignored. Simple example; Complex example 1 with multiple domains and URLs (US maths schools 2013); Complex example 2 with multiple domains and URLs (US LIS schools 2013).
If you edit your input files with Excel then it will add extra quotes to the file that will be invisible to you but visible to Webometric Analyst and Bing and Google Books etc. and will therefore give incorrect results. To avoid this problem, always edit in Notepad instead of Excel. You might be able to copy and paste from Excel into Notepad as a way around this problem if you need to use Excel.
Line widths are proportional to the direct link count or co-inlink count (depending on the diagram), but are scaled so that the largest value is 10 (this helps view them in Pajek without the arrow widths being too big to see the diagram). The log version uses the natural log of the link counts - this is only to help draw a diagram if there are a few really high link counts and the others are small.
Indegree (Weighted) Sum of the weights of all arrows pointing to the node
Outdegree (Weighted) Sum of the weights of all arrows starting from the node
DegreeMin (min of non-zero weights) This is the sum of all weghts of arrows to or from the node except that if there are arrows to and from the same other node, then the mimimun of the two weights is used.
DegreeMax This is the sum of all weghts of arrows to or from the node except that if there are arrows to and from the same other node, then the mimimun of the two weights is used.
DegreeTotal This is the sum of all weghts of arrows to or from the node. If there are arrows to and from the same other node, then both weights are used.
IndegreeBinary Number of arrows pointing to the node
OutdegreeBinary Number of arrows starting from the node
DegreeBinary Number of nodes that are connected by an arrow to or from them
BetweennessBinary (for underlying undirected network) Betweenness centrality of the underlying undirected network, treating all connections has having weight 1
BetweennessWeighted (for underlying undirected network) Ignore this one