Making Web Queries for a Set of Documents

These instructions use one type of web indicator: The number of citations from Wikipedia to the documents. For other types of web indicator, substitute the indicator type at the relevant stage. Before starting, copy the copy the folder of raw data (e.g., Citation counts structured names) and name it Wikipedia citations. After this, delete the copies of AllData.txt and Reports.txt from the new Wikipedia citations folder. The filenames can have any names or structured names.

The first step is to create a set of Wikipedia citation searches. To do this, start Webometric Analyst, close the Wizard and select Make Wikipedia searches for a set of Scopus/WoS/Other journal articles or books from the Make Searches menu (change this to a different type of search if you want different data). Answer the question about where the data is from and select any file in the Wikipedia citations folder. Accept the default answers to all of the questions asked and say Yes to do the same to the remaining files in the folder. You may end with a warning about “duplicate queries”. This means that some publications will be ignored because the queries generated from them are identical to each other.

Once these query files have been made they will all have the same file name ending, such as _wiki.txt.

Title lengths

Web queries are unreliable for articles with short titles because the resulting query can be ambiguous. A solution to this is to exclude all publications with short titles (e.g., with less than 3 terms) from the query set before running the queries. To do this, when using Webometric Analyst to make the search files from the WoS or Scopus data files, answer 3 to the question about the minimum number of words in a title to make a query for the publication.

Random sampling to avoid submitting too many queries

Since web queries from Bing need to be paid for if there are more than 5000, if the query files include a total of over 5000 lines (including the world reference sets, if you are using them) then it should be possible to get reasonable results by restricting the total number of queries in each file to a random sample of 500. Follow the instructions below to do this.

First, make a new folder called temp and copy all of the web query files into it. For example, if the queries are for Wikipedia then this means all the files to be copied will end in -wiki.txt and the rest will be left behind.

Start Webometric Analyst, close the Start-up Wizard and then from the Make Searches menu, select the menu option Replace search file(s) with a random sample up to a maximum number, choose the temp folder containing the queries and enter 500 as the maximum number of queries per file. The original files will be replaced with files with the same name but containing a random sample of 500 queries each.

Finally, copy the new files from the temp folder back to the original folder.

Back to the overview page.