LexiURL home - FAQ - Counting Links - Counting URLs
Generating URL or Link Lists
LexiURL does not generate URL or link lists but the following methods are suggested, one of which may be appropriate for any given application.
- Manual identification through browsing the web.
- URL lists generated by search engine searches (e.g. all pages returned by Google for the search “heavy metal umlaut”).
- Link or URL lists generated by search engine advanced link searches (e.g., for the Google search link:www.scit.wlv.ac.uk).
- Web Sever log files’ referrer URLs.
- Link lists of link structure files generated by a web crawler such as SocSciBot.
- [recommended] Automatic generation of Google, MSN or Yahoo! search results using the LexiURL Searcher application available via the installation page.
If you use any of the first four formats above then you have to create a text file of links or URLs in one of the standard formats. See below for the formats. The last two options produce link lists that are already in a format that LexiURL can use.
URL List Format
The URL list must be a plain text file (Windows format) with one full URL per line. The file should not contain comment lines as these will cause error messages. Blank lines will be ignored.
Link List format
There are two permissible link list file formats. For both, the list must be a plain text file (Windows format). The file should not contain comment lines as these will cause error messages. Blank lines will be ignored. .
- Source-target format. Each line should contain the full URL of the source page, followed by a tab and the full URL of the target page. URLs should be group first by source URL, second by source URL domain name and third by source web site.
- Link structure file. This is the format used by Soc SciBot. It contains the targets of links of pages in a list, followed by the URL of the source page. In both cases the URLs should be in a short format: with the initial http:// removed and where a www. starts a domain name, the www should also be removed. Link target URLs should be preceded by a tab and page source URLs should be followed by a tab and the number 1. See the Link structure file example. More information and examples.
