Creating networks and key term/user lists from Twitter (Advanced)

The basic (much simpler) instructions are here.

**Please see also a PowerPoint presentation from Kim Holmberg summarising ways of creating Twitter Networks in Webometric Analyst (please save as filetype .pptx rather than .zip to view it)**

These notes summarise how to identify the key terms and users in a set of twitter data and how to create two types of network from the tweets collected. The notes describe how to create the networks from Tweets that have already been collected by you, so the first step is to collect the data using Webometric Analyst (see below for important advice) or another tool. The notes here describe how to create the following types of information.

Summary of key steps

Collecting tweets

The tweets should be collected either by listing users to follow or keywords to search for in tweets, or both. These lists must be created manually by you, using trial and error and intuition to identify appropriate terms. Avoid terms that could generate many false matches, for example due to being ambiguous. Check all terms carefully in as bad queries can generate a lot of problems later on.

See the instructions for collecting tweets using Webometric Analyst or use another tool. If using another tool, the data must be saved in, or converted to tab-separated plain text format, with one column for the tweets collected and, if creating direct tweet networks, then one column should contain the name of the tweeter and no other information. If there are multiple types of tweet then one column should also contain a label for each group of tweets, such as the query used to generate them.
In summary, the text should contain, in any order:

Note: If you want to group several labels together then you can do this by renaming all the labels to one common label, such as "dh_all" by repeatedly using the text substitution facility in Webometric Analyst to change all of the labels to the same one. This needs the menu item: Tab-sep. text| Replace text in column n with a different text.

Identifying key terms and/or users for the tweets

Webometric Analyst can identify the most important words, hashtags and users based upon their relative frequency in texts for each label. For each label, the most important terms are those that occur frequently for that label and rarely for other labels. Webometric Analyst uses the chisquare metric to estimate the importance of terms. It will produce a list of terms for each label and their importance rating, as follows.

If you only have one set of queries then it is impossible to identify the most important terms because a comparison is needed for this. Instead you can follow the above procedure but sort on the raw term frequencies instead of the chi-squared values. The top terms will be common words like "it" and "the" and you will have to manually identify topic-related words from the sorted list.

Example: The spreadsheet here covers words extracted from different scholarly disciplines or fields.

Creating co-mention networks for words, #hashtags or @users

Co-mention networks are networks based upon how often words, hashtags or users co-occur in tweets. For instance, if the data set contains the following three tweets:

Then in terms of the three types of co-mention:

To obtain a co-mention network of tweets, click the

ALTERNATIVE METHOD: To obtain co-mention networks of any type complete the following instructions.

Creating direct tweet networks for @users

Direct tweet networks are based upon how often tweets from one @user contain the names of other @users. For example, consider the tweets:

Then in terms of messaging:

To obtain direct tweet networks complete the following instructions.

Example: The network below was created from digital humanities tweets, using the option to ignore @ and # symbols when processing the data so hashtags, usernames and keywords are all mixed up. The network was drawn with webometric analyst and manually tidied up by moving nodes around to make the pattern clearer and recolouring the digitalhumanities node from blue to red.