The Story So Far
We have been looking at an archive of tweets tagged with #MLA14, which corresponded to the 2014 MLA (Modern Language Association) Annual Convention. It was held in Chicago from Monday 9 to Sunday 12 January 2014. You can still browse or search 2014 sessions in the online Program.
The studied archive comprises a dataset of 27,491 unique tweets, collected between Sunday September 01 2014 at 20:35:07 and Wednesday January 15 2014 at 16:16:41Central Time.
The dataset studied in this series of posts was collected and cleaned by Chris Zarate and myself.
After deduplication we were down to 27,491 tweets, and in a sub-set that collects the tweets posted during the actual convention days the total number of tweets in this period sums 21,915 tweets.
We have been offering some key figures and some basic visualisations of the data.
For the first part of this series, click here.
For the second part of this series, click here.
For the third part of this series, click here.
Text Analysis
We used the Voyant Tools (previously the unfortunately-named Voyeur), a web-based reading and analysis environment for digital texts developed by Stéfan Sinclair and Geoffrey Rockwell, to obtain the most frequent words in the text of the total number of tweets (this includes RTs and replies) posted with #MLA14 during each day of the convention.
Below we share some word clouds to visualise this. As most people know now word clouds are visual presentations of keywords extracted from a text which are visually differentiated according to their position and frequency of use in that text. Voyant uses Cirrus, which is a “visualization tool that displays a word cloud relating to the frequency of words appearing in one or more documents. […] The larger the word, the more frequent the term.”
In this case we are sharing static image files exported from Voyant itself. We are also including the top 5 most frequent words in each set of tweets. In all cases we used a customised English (“Taporware”) stop words list that was applied globally including words like #mla14, MLA, RT, panel, session, http, t.co, etc.
Numbered hashtags corresponding to sessions were not included in the stop word list as one of the intentions was to reveal which sessions were more frequently mentioned each day. (To find out which sessions correspond to each numbered hashtag check the online Program).
Limitations and Fair Warning
After running the four different corpora more than once through Voyant we discovered the tool was unable to reproduce the same results, particularly regarding word and unique word counts. Top 5 most frequent words remained with minimal variations of little significance, which might mean the results we share in that regard are more or less reliable, though not 100% exact.
We were logically disappointed at the failure to ensure reproducibility using the same corpora and the same tool (we don’t consider each corpus to be too large for reliable text analysis). We will keep looking into it and will keep aiming for reproducibility of the results with different tools, and we will update any findings here.
Here we are only presenting as a research progress update the figures and clouds obtained after the fourth trial, having cleared caches and ensuring the corpora were complete.
Thursday 9 January 2012
Total number of tweets: 4,558
Total number of words: 71,630
Total number of unique words: 9,142
Top 5 most frequent words in the corpus: #s80 (271), #s66 (199), humanities (188), #s130 (156), #s173 (150).
Friday 10 January 2014
Total number of tweets: 7,417
Total number of words: 131,500
Top 5 most frequent words in the corpus: data (381), #s299 (378), students (354), #s339 (342), reading (342).
Saturday 11 January 2014
Total number of tweets: 6,265
Total number of words: 112,482
Top 5 most frequent words in the corpus: #s577 (562), digital (543), work (413), humanities (340), #medievaltwitter (283).
Sunday 12 January 2014
Total number of tweets: 3,675
Total number of words: 66,426
Total number of unique words: 8,206
Top 5 most frequent words in the corpus: #s679 (626), digital (266), #s738 (212), @adelinekoh (174), #s708 (173).
Tool Citation
Sinclair, S. and G. Rockwell (2014). Voyant Tools: Reveal Your Texts. Voyant. Retrieved January 22, 2014 from http://voyeurtools.org/
—
For the first part of this series, click here.
For the second part of this series, click here.
For the third part of this series, click here.