Creating Twitter Conversation Networks

**These instructions are based on Kim Holmberg's PowerPoint presentation (please save as filetype .pptx rather than .zip to view it**

These notes summarise how create two types of network of conversations in Twiter. The notes describe how to create the networks from Tweets that have already been collected by you, so the first step is to collect the data using Webometric Analyst (see below for important advice). The notes here describe how to create the following types of information.

If you want lists of top terms or hashtag networks, see the advanced network creation instructions.

Summary of key steps

Collecting tweets

The tweets should be collected either by listing users to follow or keywords to search for in tweets, or both. These lists must be created manually by you, using trial and error and intuition to identify appropriate terms. Avoid terms that could generate many false matches, for example due to being ambiguous. Check all terms carefully in search.twitter.com as bad queries can generate a lot of problems later on.

See the instructions for collecting tweets using Webometric Analyst or use another tool. If using another tool, the data must be saved in, or converted to tab-separated plain text format, with one column for the tweets collected and, if creating direct tweet networks, then one column should contain the name of the tweeter and no other information. If there are multiple types of tweet then one column should also contain a label for each group of tweets, such as the query used to generate them.
In summary, the text should contain, in any order:

Note: If you want to group several labels together then you can do this by renaming all the labels to one common label, such as "dh_all" by repeatedly using the text substitution facility in Webometric Analyst to change all of the labels to the same one. This needs the menu item: Tab-sep. text| Replace text in column n with a different text.

Filtering out spam and duplicate tweets

To remove the pure retweets (i.e. Tweets starting with RT)

Removing Via tweets

To remove any occurences where the tweet has been sent ”via @username”

In the next step extracting the usernames from the tweets these retweets will not be included, because Webometric Analyst uses the @-sign as an identifier of usernames.

Creating Twitter Conversation networks (see pictures in this PowerPoint presentation -please save as filetype .pptx rather than .zip to view it)

The files below are produced by the above analysis.

The ..._centrality.txt file contains the (social network analysis) centrality scores of the nodes (=usernames). Both Arrow.Info –files are used by the built-in network visualization tool in Webometric Analyst.

The ...communicators.net file contains a network in which the nodes are Twitter users and the arrows between them have thicknesses proportional to the number of tweets sent from the arrow source node to the arrow target node.

The ...cotweeted.net file contains a network in which the nodes are Twitter users and the lines between them have thicknesses proportional to the number of tweets sent simultaneously to both of them ( e.g., ”@jim @naz morning” is a co-tweet between jim and naz, no matter who sent the tweet).

These .net files can be analyzed in WA Network, Gephi or Pajek. For many research goals choosing the appropriate type of network from the 5 options earlier and analysing the resulting network files in WA Network or Gephi would be enough. However, if you want to use the number of connections between the tweeters and tweetees (instead of using combinations of the number of tweets sent and received) you’ll need to continue with the ...TweeterTweetee.txt file.

The ...TweeterTweetee.txt file contains a list of the tweet sources and targets. If a tweet is sent to multiple targets, then there is one line per source-target pair. This file can be used to create a simple network of the conversational connections in the tweets. In the Networks menu there’s a function called: Convert columns of text into co-occurrence or link network (Pajek). This will convert the TweeterTweetee file into a network for people that either send or receive a tweet.

What are co-mention networks?

Co-mention networks are networks based upon how often words, hashtags or users co-occur in tweets. For instance, if the data set contains the following three tweets:

Then in terms of the three types of co-mention:

To obtain co-mention networks of any type complete the following instructions.

Creating direct tweet networks for @users

Direct tweet networks are based upon how often tweets from one @user contain the names of other @users. For example, consider the tweets:

Then in terms of messaging:

To obtain direct tweet networks complete the following instructions.

Example: The network below was created from digital humanities tweets, using the option to ignore @ and # symbols when processing the data so hashtags, usernames and keywords are all mixed up. The network was drawn with webometric analyst and manually tidied up by moving nodes around to make the pattern clearer and recolouring the digitalhumanities node from blue to red.