LexiURL home - FAQ - Counting Links - Counting URLs
Summarising methods for lists of links - Example
The following list of link URLs illustrates a range of different counting methods that can be used for lists of link URLs. The two standard (normally recommended) methods are unique sites for TLDs and SLDs, and unique URLs for all of the others. In some cases (e.g. link target lists, web server referrer longs) all URLs are more appropriate than unique URLs or sites.
List 1 below is the URL list used to illustrate the counting methods.
List 1: Original list: All links
The list below is an artificial list of link URLs. Every link has a source (the page containing the link) and a target (the URL of the page referred to in the link). In the list below, the first URL is of the link source and the second is of the link target. Note that some links are repeated, including the first two.
http://cybermetrics.wlv.ac.uk/ http://www.scit.wlv.ac.uk/ http://cybermetrics.wlv.ac.uk/ http://www.netscape.com/ http://cybermetrics.wlv.ac.uk/ http://www.google.com/ http://cybermetrics.wlv.ac.uk/ http://www.google.com/about.html http://cybermetrics.wlv.ac.uk/ http://www.google.co.uk/ http://cybermetrics.wlv.ac.uk/people.html http://www.google.es/ http://cybermetrics.wlv.ac.uk/ http://www.yahoo.com/ http://cybermetrics.wlv.ac.uk/database/ http://www.yahoo.com/ http://cybermetrics.wlv.ac.uk/database/ http://www.yahoo.com/ http://cybermetrics.wlv.ac.uk/database/ http://www.bham.ac.uk/home.html http://www.wlv.ac.uk/ http://www.yahoo.com/ http://www.bham.ac.uk/ http://www.yahoo.com/ http://www.google.com/others/ http://www.google.de/about.html http://www.google.com/others/ http://www.google.co.uk/ http://www.google.com/others/ http://www.google.de/ http://www.google.com/others/ http://www.google.es/
This list can be split into two separate lists: a list of link sources and a list of link targets. Each of these can be analysed separately using the URL list techniques. In this page counting methods specific to link lists are discussed.
Link source list
http://cybermetrics.wlv.ac.uk/ http://cybermetrics.wlv.ac.uk/ http://cybermetrics.wlv.ac.uk/ http://cybermetrics.wlv.ac.uk/ http://cybermetrics.wlv.ac.uk/ http://cybermetrics.wlv.ac.uk/people.html http://cybermetrics.wlv.ac.uk/ http://cybermetrics.wlv.ac.uk/database/ http://cybermetrics.wlv.ac.uk/database/ http://cybermetrics.wlv.ac.uk/database/ http://www.wlv.ac.uk/ http://www.bham.ac.uk/ http://www.google.com/others/ http://www.google.com/others/ http://www.google.com/others/ http://www.google.com/others/
Link target list
http://www.scit.wlv.ac.uk/ http://www.netscape.com/ http://www.google.com/ http://www.google.com/about.html http://www.google.co.uk/ http://www.google.es/ http://www.yahoo.com/ http://www.yahoo.com/ http://www.yahoo.com/ http://www.bham.ac.uk/home.html http://www.yahoo.com/ http://www.yahoo.com/ http://www.google.de/about.html http://www.google.co.uk/ http://www.google.de/ http://www.google.es/
List 2: List of all links that are not site self-links
A site self-link is a link where the source and target URL are part of the same site. In link analysis, links between separate pages of the same site are normally ignored because they are often used for navigational purposes.
The site of an URL is the end of its domain name, including one segment more than the standard domain name ending. For some domains, including .com, .net and .edu, the top-level domain (TLD) is the standard ending. For other domains, such as .uk and .au, there is an additional naming standard and the standard ending includes the second-level domain name portion (e.g., .ac.uk, .edu.au). after the initial http:// and before the next slash. The list of links above contains one site self-link: http://cybermetrics.wlv.ac.uk/ http://www.scit.wlv.ac.uk/ and here is the original list purged of the site self-link.
http://cybermetrics.wlv.ac.uk/ http://www.netscape.com/ http://cybermetrics.wlv.ac.uk/ http://www.google.com/ http://cybermetrics.wlv.ac.uk/ http://www.google.com/about.html http://cybermetrics.wlv.ac.uk/ http://www.google.co.uk/ http://cybermetrics.wlv.ac.uk/people.html http://www.google.es/ http://cybermetrics.wlv.ac.uk/ http://www.yahoo.com/ http://cybermetrics.wlv.ac.uk/database/ http://www.yahoo.com/ http://cybermetrics.wlv.ac.uk/database/ http://www.yahoo.com/ http://cybermetrics.wlv.ac.uk/database/ http://www.bham.ac.uk/home.html http://www.wlv.ac.uk/ http://www.yahoo.com/ http://www.bham.ac.uk/ http://www.yahoo.com/ http://www.google.com/others/ http://www.google.de/about.html http://www.google.com/others/ http://www.google.co.uk/ http://www.google.com/others/ http://www.google.de/ http://www.google.com/others/ http://www.google.es/
List 3: List of unique links
Below is a list of the unique links in the second list. This is the same as the second list except that the second occurrence of the link http://cybermetrics.wlv.ac.uk/database/ http://www.yahoo.com/ has been removed.
http://cybermetrics.wlv.ac.uk/ http://www.netscape.com/ http://cybermetrics.wlv.ac.uk/ http://www.google.com/ http://cybermetrics.wlv.ac.uk/ http://www.google.com/about.html http://cybermetrics.wlv.ac.uk/ http://www.google.co.uk/ http://cybermetrics.wlv.ac.uk/people.html http://www.google.es/ http://cybermetrics.wlv.ac.uk/ http://www.yahoo.com/ http://cybermetrics.wlv.ac.uk/database/ http://www.yahoo.com/ http://cybermetrics.wlv.ac.uk/database/ http://www.bham.ac.uk/home.html http://www.wlv.ac.uk/ http://www.yahoo.com/ http://www.bham.ac.uk/ http://www.yahoo.com/ http://www.google.com/others/ http://www.google.de/about.html http://www.google.com/others/ http://www.google.co.uk/ http://www.google.com/others/ http://www.google.de/ http://www.google.com/others/ http://www.google.es/
List 4: List of links by unique domains
For each of the URLs in the link lists above, its domain is the portion after the initial http:// and before the next slash. When counting link URLs, it is sometimes useful to treat two links as duplicates if they have the same source domain and target URL. To convert list 3 into a list of domain source – URL target links, two stages are needed. First, each link source URL is converted into its domain, and second duplicate domains are removed.
Stage 1: List of unique links converted into a list of links with domain sources
This list is list 3 except that each link source URL has been replaced by its domain.
cybermetrics.wlv.ac.uk http://www.netscape.com/ cybermetrics.wlv.ac.uk http://www.google.com/ cybermetrics.wlv.ac.uk http://www.google.com/about.html cybermetrics.wlv.ac.uk http://www.google.co.uk/ cybermetrics.wlv.ac.uk http://www.google.es/ cybermetrics.wlv.ac.uk http://www.yahoo.com/ cybermetrics.wlv.ac.uk http://www.yahoo.com/ cybermetrics.wlv.ac.uk http://www.bham.ac.uk/home.html www.wlv.ac.uk http://www.yahoo.com/ www.bham.ac.uk http://www.yahoo.com/ www.google.com http://www.google.de/about.html www.google.com http://www.google.co.uk/ www.google.com http://www.google.de/ www.google.com http://www.google.es/
Stage 2: Unique domain source link list
This list is the stage 1 list except that identical links have been removed. These are links with the same source domain and target URL. One duplicate link was identified and removed (cybermetrics.wlv.ac.uk http://www.yahoo.com/).
cybermetrics.wlv.ac.uk http://www.netscape.com/ cybermetrics.wlv.ac.uk http://www.google.com/ cybermetrics.wlv.ac.uk http://www.google.com/about.html cybermetrics.wlv.ac.uk http://www.google.co.uk/ cybermetrics.wlv.ac.uk http://www.google.es/ cybermetrics.wlv.ac.uk http://www.yahoo.com/ cybermetrics.wlv.ac.uk http://www.bham.ac.uk/home.html www.wlv.ac.uk http://www.yahoo.com/ www.bham.ac.uk http://www.yahoo.com/ www.google.com http://www.google.de/about.html www.google.com http://www.google.co.uk/ www.google.com http://www.google.de/ www.google.com http://www.google.es/
List 5: List of links by unique sites
To convert list 4, stage 2 into a list of links from sites, two stages are needed. First, each link source domain is converted into its site, and second duplicate links are removed.
Stage 1: List of domain source links converted into a list of site source links
This list is list 3 (stage 2) except that each link source domain has been replaced by its site.
wlv.ac.uk http://www.netscape.com/ wlv.ac.uk http://www.google.com/ wlv.ac.uk http://www.google.com/about.html wlv.ac.uk http://www.google.co.uk/ wlv.ac.uk http://www.google.es/ wlv.ac.uk http://www.yahoo.com/ wlv.ac.uk http://www.bham.ac.uk/home.html wlv.ac.uk http://www.yahoo.com/ bham.ac.uk http://www.yahoo.com/ google.com http://www.google.de/about.html google.com http://www.google.co.uk/ google.com http://www.google.de/ google.com http://www.google.es/
Stage 2: Unique site source link list
This list is the stage 1 list except that duplicate site source links have been removed. One of these was found (wlv.ac.uk http://www.yahoo.com/)
wlv.ac.uk http://www.netscape.com/ wlv.ac.uk http://www.google.com/ wlv.ac.uk http://www.google.com/about.html wlv.ac.uk http://www.google.co.uk/ wlv.ac.uk http://www.google.es/ wlv.ac.uk http://www.yahoo.com/ wlv.ac.uk http://www.bham.ac.uk/home.html bham.ac.uk http://www.yahoo.com/ google.com http://www.google.de/about.html google.com http://www.google.co.uk/ google.com http://www.google.de/ google.com http://www.google.es/
Counting link targets in the list
The number of each TLD, second-level domain, site, domain and URL link targets in the list depends upon the method of counting. In particular it depends upon which of the lists above are chosen as the basic data for counting. The examples below give the alternative possible answers and counting methods.
Link target TLDs
The table below summarises the frequency of the four link target TLDs in the data set, using three different methods of counting. For example, there are 7 (unique) com URL-URL link targets in the data set, 6 (unique) com domain-URL link targets, and (unique) com site-URL link targets.
TLD |
#unique non-site self-links (List 3) |
#Unique domains (List 4 stage 2) |
#Unique sites (List 5 stage2) |
uk |
3 |
3 |
3 |
com |
7 |
6 |
5 |
de |
2 |
2 |
2 |
es |
2 |
2 |
2 |
SLDs
The table below summarises the frequency of the four link target SLDs in the data set, using three different methods of counting. For example, there are 7 (unique) com URL-URL link targets in the data set, 6 (unique) com domain-URL link targets, and 5b(unique) com site-URL link targets.
TLD |
#unique non-site self-links (List 3) |
#Unique domains (List 4 stage 2) |
#Unique sites (List 5 stage2) |
co.uk |
3 |
3 |
3 |
com |
7 |
6 |
5 |
de |
2 |
2 |
2 |
es |
2 |
2 |
2 |
Sites
The table below summarises the frequency of the six link target sites in the data set, using three different methods of counting. For example, there are 3 (unique) yahoo.com URL-URL link targets in the data set, 3 (unique) yahoo.com domain-URL link targets, and 2 (unique) yahoo.com site-URL link targets.
Site |
#unique non-site self-links (List 3) |
#Unique domains (List 4 stage 2) |
#Unique sites (List 5 stage2) |
google.co.uk |
3 |
3 |
3 |
yahoo.com |
4 |
3 |
2 |
google.com |
2 |
2 |
2 |
netscape.com |
1 |
1 |
1 |
google.de |
2 |
2 |
2 |
google.es |
2 |
2 |
2 |
Domains
The table below summarises the frequency of the six link target domains in the data set, using three different methods of counting. For example, there are 3 (unique) www.yahoo.com URL-URL link targets in the data set, 3 (unique) www.yahoo.com domain-URL link targets, and 2 (unique) www.yahoo.com site-URL link targets.
Domain |
#unique non-site self-links (List 3) |
#Unique domains (List 4 stage 2) |
#Unique sites (List 5 stage2) |
www.google.co.uk |
3 |
3 |
3 |
www.yahoo.com |
4 |
3 |
2 |
www.google.com |
2 |
2 |
2 |
www.netscape.com |
1 |
1 |
1 |
www.google.de |
2 |
2 |
2 |
www.google.es |
2 |
2 |
2 |
URLs
The table below summarises the frequency of the six link target URLs in the data set, using three different methods of counting. For example, there are 4 (unique) www.yahoo.com URL-URL link targets in the data set, 3 (unique) www.yahoo.com domain-URL link targets, and 2 (unique) www.yahoo.com site-URL link targets.
URL |
#unique non-site self-links (List 3) |
#Unique domains (List 4 stage 2) |
#Unique sites (List 5 stage2) |
http://www.google.co.uk/ |
3 |
3 |
3 |
http://www.yahoo.com/ |
4 |
3 |
2 |
http://www.google.com/about.html |
1 |
1 |
1 |
http://www.google.com/ |
1 |
1 |
1 |
http://www.netscape.com/ |
1 |
1 |
1 |
http://www.google.de/ |
1 |
1 |
1 |
http://www.google.de/about.html |
1 |
1 |
1 |
http://www.google.es/ |
2 |
2 |
2 |
Counting link targets in the list
The number of each TLD, second-level domain, site, domain and URL link sources in the list again depends upon the method of counting. In particular it depends upon which of the lists above are chosen as the basic data for counting. The examples above for link targets can be applied to the same link lists to give the alternative possible answers and counting methods for link sources.
