LexiURL Logo

LexiURL home - FAQ - Counting Links - Counting URLs

Summarising methods for lists of links - Example

The following list of link URLs illustrates a range of different counting methods that can be used for lists of link URLs. The two standard (normally recommended) methods are unique sites for TLDs and SLDs, and unique URLs for all of the others. In some cases (e.g. link target lists, web server referrer longs) all URLs are more appropriate than unique URLs or sites.

List 1 below is the URL list used to illustrate the counting methods.

List 1: Original list: All links

The list below is an artificial list of link URLs. Every link has a source (the page containing the link) and a target (the URL of the page referred to in the link). In the list below, the first URL is of the link source and the second is of the link target. Note that some links are repeated, including the first two.

http://cybermetrics.wlv.ac.uk/ http://www.scit.wlv.ac.uk/
http://cybermetrics.wlv.ac.uk/ http://www.netscape.com/
http://cybermetrics.wlv.ac.uk/ http://www.google.com/
http://cybermetrics.wlv.ac.uk/ http://www.google.com/about.html
http://cybermetrics.wlv.ac.uk/ http://www.google.co.uk/
http://cybermetrics.wlv.ac.uk/people.html http://www.google.es/
http://cybermetrics.wlv.ac.uk/ http://www.yahoo.com/
http://cybermetrics.wlv.ac.uk/database/ http://www.yahoo.com/
http://cybermetrics.wlv.ac.uk/database/ http://www.yahoo.com/
http://cybermetrics.wlv.ac.uk/database/ http://www.bham.ac.uk/home.html
http://www.wlv.ac.uk/ http://www.yahoo.com/
http://www.bham.ac.uk/ http://www.yahoo.com/
http://www.google.com/others/ http://www.google.de/about.html
http://www.google.com/others/ http://www.google.co.uk/
http://www.google.com/others/ http://www.google.de/
http://www.google.com/others/ http://www.google.es/ 

This list can be split into two separate lists: a list of link sources and a list of link targets. Each of these can be analysed separately using the URL list techniques. In this page counting methods specific to link lists are discussed.

Link source list

http://cybermetrics.wlv.ac.uk/
http://cybermetrics.wlv.ac.uk/
http://cybermetrics.wlv.ac.uk/
http://cybermetrics.wlv.ac.uk/
http://cybermetrics.wlv.ac.uk/
http://cybermetrics.wlv.ac.uk/people.html
http://cybermetrics.wlv.ac.uk/
http://cybermetrics.wlv.ac.uk/database/
http://cybermetrics.wlv.ac.uk/database/
http://cybermetrics.wlv.ac.uk/database/
http://www.wlv.ac.uk/
http://www.bham.ac.uk/
http://www.google.com/others/
http://www.google.com/others/
http://www.google.com/others/
http://www.google.com/others/

Link target list

http://www.scit.wlv.ac.uk/
http://www.netscape.com/
http://www.google.com/
http://www.google.com/about.html
http://www.google.co.uk/
http://www.google.es/
http://www.yahoo.com/
http://www.yahoo.com/
http://www.yahoo.com/
http://www.bham.ac.uk/home.html
http://www.yahoo.com/
http://www.yahoo.com/
http://www.google.de/about.html
http://www.google.co.uk/
http://www.google.de/
http://www.google.es/

List 2: List of all links that are not site self-links

A site self-link is a link where the source and target URL are part of the same site. In link analysis, links between separate pages of the same site are normally ignored because they are often used for navigational purposes.

The site of an URL is the end of its domain name, including one segment more than the standard domain name ending. For some domains, including .com, .net and .edu, the top-level domain (TLD) is the standard ending. For other domains, such as .uk and .au, there is an additional naming standard and the standard ending includes the second-level domain name portion (e.g., .ac.uk, .edu.au). after the initial http:// and before the next slash. The list of links above contains one site self-link: http://cybermetrics.wlv.ac.uk/ http://www.scit.wlv.ac.uk/ and here is the original list purged of the site self-link.

http://cybermetrics.wlv.ac.uk/ http://www.netscape.com/
http://cybermetrics.wlv.ac.uk/ http://www.google.com/
http://cybermetrics.wlv.ac.uk/ http://www.google.com/about.html
http://cybermetrics.wlv.ac.uk/ http://www.google.co.uk/
http://cybermetrics.wlv.ac.uk/people.html http://www.google.es/
http://cybermetrics.wlv.ac.uk/ http://www.yahoo.com/
http://cybermetrics.wlv.ac.uk/database/ http://www.yahoo.com/
http://cybermetrics.wlv.ac.uk/database/ http://www.yahoo.com/
http://cybermetrics.wlv.ac.uk/database/ http://www.bham.ac.uk/home.html
http://www.wlv.ac.uk/ http://www.yahoo.com/
http://www.bham.ac.uk/ http://www.yahoo.com/
http://www.google.com/others/ http://www.google.de/about.html
http://www.google.com/others/ http://www.google.co.uk/
http://www.google.com/others/ http://www.google.de/
http://www.google.com/others/ http://www.google.es/ 

List 3: List of unique links

Below is a list of the unique links in the second list. This is the same as the second list except that the second occurrence of the link http://cybermetrics.wlv.ac.uk/database/ http://www.yahoo.com/ has been removed.

http://cybermetrics.wlv.ac.uk/ http://www.netscape.com/
http://cybermetrics.wlv.ac.uk/ http://www.google.com/
http://cybermetrics.wlv.ac.uk/ http://www.google.com/about.html
http://cybermetrics.wlv.ac.uk/ http://www.google.co.uk/
http://cybermetrics.wlv.ac.uk/people.html http://www.google.es/
http://cybermetrics.wlv.ac.uk/ http://www.yahoo.com/
http://cybermetrics.wlv.ac.uk/database/ http://www.yahoo.com/
http://cybermetrics.wlv.ac.uk/database/ http://www.bham.ac.uk/home.html
http://www.wlv.ac.uk/ http://www.yahoo.com/
http://www.bham.ac.uk/ http://www.yahoo.com/
http://www.google.com/others/ http://www.google.de/about.html
http://www.google.com/others/ http://www.google.co.uk/
http://www.google.com/others/ http://www.google.de/
http://www.google.com/others/ http://www.google.es/ 

List 4: List of links by unique domains

For each of the URLs in the link lists above, its domain is the portion after the initial http:// and before the next slash. When counting link URLs, it is sometimes useful to treat two links as duplicates if they have the same source domain and target URL. To convert list 3 into a list of domain source – URL target links, two stages are needed. First, each link source URL is converted into its domain, and second duplicate domains are removed.

Stage 1: List of unique links converted into a list of links with domain sources

This list is list 3 except that each link source URL has been replaced by its domain.

cybermetrics.wlv.ac.uk http://www.netscape.com/
cybermetrics.wlv.ac.uk http://www.google.com/
cybermetrics.wlv.ac.uk http://www.google.com/about.html
cybermetrics.wlv.ac.uk http://www.google.co.uk/
cybermetrics.wlv.ac.uk http://www.google.es/
cybermetrics.wlv.ac.uk http://www.yahoo.com/
cybermetrics.wlv.ac.uk http://www.yahoo.com/
cybermetrics.wlv.ac.uk http://www.bham.ac.uk/home.html
www.wlv.ac.uk http://www.yahoo.com/
www.bham.ac.uk http://www.yahoo.com/
www.google.com http://www.google.de/about.html
www.google.com http://www.google.co.uk/
www.google.com http://www.google.de/
www.google.com http://www.google.es/ 

Stage 2: Unique domain source link list

This list is the stage 1 list except that identical links have been removed. These are links with the same source domain and target URL. One duplicate link was identified and removed (cybermetrics.wlv.ac.uk http://www.yahoo.com/).

cybermetrics.wlv.ac.uk http://www.netscape.com/
cybermetrics.wlv.ac.uk http://www.google.com/
cybermetrics.wlv.ac.uk http://www.google.com/about.html
cybermetrics.wlv.ac.uk http://www.google.co.uk/
cybermetrics.wlv.ac.uk http://www.google.es/
cybermetrics.wlv.ac.uk http://www.yahoo.com/
cybermetrics.wlv.ac.uk http://www.bham.ac.uk/home.html
www.wlv.ac.uk http://www.yahoo.com/
www.bham.ac.uk http://www.yahoo.com/
www.google.com http://www.google.de/about.html
www.google.com http://www.google.co.uk/
www.google.com http://www.google.de/
www.google.com http://www.google.es/ 

List 5: List of links by unique sites

To convert list 4, stage 2 into a list of links from sites, two stages are needed. First, each link source domain is converted into its site, and second duplicate links are removed.

Stage 1: List of domain source links converted into a list of site source links

This list is list 3 (stage 2) except that each link source domain has been replaced by its site.

wlv.ac.uk http://www.netscape.com/
wlv.ac.uk http://www.google.com/
wlv.ac.uk http://www.google.com/about.html
wlv.ac.uk http://www.google.co.uk/
wlv.ac.uk http://www.google.es/
wlv.ac.uk http://www.yahoo.com/
wlv.ac.uk http://www.bham.ac.uk/home.html
wlv.ac.uk http://www.yahoo.com/
bham.ac.uk http://www.yahoo.com/
google.com http://www.google.de/about.html
google.com http://www.google.co.uk/
google.com http://www.google.de/
google.com http://www.google.es/ 

Stage 2: Unique site source link list

This list is the stage 1 list except that duplicate site source links have been removed. One of these was found (wlv.ac.uk http://www.yahoo.com/)

wlv.ac.uk http://www.netscape.com/
wlv.ac.uk http://www.google.com/
wlv.ac.uk http://www.google.com/about.html
wlv.ac.uk http://www.google.co.uk/
wlv.ac.uk http://www.google.es/
wlv.ac.uk http://www.yahoo.com/
wlv.ac.uk http://www.bham.ac.uk/home.html
bham.ac.uk http://www.yahoo.com/
google.com http://www.google.de/about.html
google.com http://www.google.co.uk/
google.com http://www.google.de/
google.com http://www.google.es/ 

Counting link targets in the list

The number of each TLD, second-level domain, site, domain and URL link targets in the list depends upon the method of counting. In particular it depends upon which of the lists above are chosen as the basic data for counting. The examples below give the alternative possible answers and counting methods.

Link target TLDs

The table below summarises the frequency of the four link target TLDs in the data set, using three different methods of counting. For example, there are 7 (unique) com URL-URL link targets in the data set, 6 (unique) com domain-URL link targets, and (unique) com site-URL link targets.

TLD

#unique non-site self-links

(List 3)

#Unique domains

(List 4 stage 2)

#Unique sites

(List 5 stage2)

uk

3

3

3

com

7

6

5

de

2

2

2

es

2

2

2

SLDs

The table below summarises the frequency of the four link target SLDs in the data set, using three different methods of counting. For example, there are 7 (unique) com URL-URL link targets in the data set, 6 (unique) com domain-URL link targets, and 5b(unique) com site-URL link targets.

TLD

#unique non-site self-links

(List 3)

#Unique domains

(List 4 stage 2)

#Unique sites

(List 5 stage2)

co.uk

3

3

3

com

7

6

5

de

2

2

2

es

2

2

2

Sites

The table below summarises the frequency of the six link target sites in the data set, using three different methods of counting. For example, there are 3 (unique) yahoo.com URL-URL link targets in the data set, 3 (unique) yahoo.com domain-URL link targets, and 2 (unique) yahoo.com site-URL link targets.

Site

#unique non-site self-links

(List 3)

#Unique domains

(List 4 stage 2)

#Unique sites

(List 5 stage2)

google.co.uk

3

3

3

yahoo.com

4

3

2

google.com

2

2

2

netscape.com

1

1

1

google.de

2

2

2

google.es

2

2

2

Domains

The table below summarises the frequency of the six link target domains in the data set, using three different methods of counting. For example, there are 3 (unique) www.yahoo.com URL-URL link targets in the data set, 3 (unique) www.yahoo.com domain-URL link targets, and 2 (unique) www.yahoo.com site-URL link targets.

Domain

#unique non-site self-links

(List 3)

#Unique domains

(List 4 stage 2)

#Unique sites

(List 5 stage2)

www.google.co.uk

3

3

3

www.yahoo.com

4

3

2

www.google.com

2

2

2

www.netscape.com

1

1

1

www.google.de

2

2

2

www.google.es

2

2

2

URLs

The table below summarises the frequency of the six link target URLs in the data set, using three different methods of counting. For example, there are 4 (unique) www.yahoo.com URL-URL link targets in the data set, 3 (unique) www.yahoo.com domain-URL link targets, and 2 (unique) www.yahoo.com site-URL link targets.

URL

#unique non-site self-links

(List 3)

#Unique domains

(List 4 stage 2)

#Unique sites

(List 5 stage2)

http://www.google.co.uk/

3

3

3

http://www.yahoo.com/

4

3

2

http://www.google.com/about.html

1

1

1

http://www.google.com/

1

1

1

http://www.netscape.com/

1

1

1

http://www.google.de/

1

1

1

http://www.google.de/about.html

1

1

1

http://www.google.es/

2

2

2

Counting link targets in the list

The number of each TLD, second-level domain, site, domain and URL link sources in the list again depends upon the method of counting. In particular it depends upon which of the lists above are chosen as the basic data for counting. The examples above for link targets can be applied to the same link lists to give the alternative possible answers and counting methods for link sources.