#MLA 14: A First Look (IV) | Far Away, Yet Close

The Story So Far

We have been looking at an archive of tweets tagged with #MLA14, which corresponded to the 2014 MLA (Modern Language Association) Annual Convention. It was held in Chicago from Monday 9 to Sunday 12 January 2014. You can still browse or search 2014 sessions in the online Program.

The studied archive comprises a dataset of 27,491 unique tweets, collected between Sunday September 01 2014 at 20:35:07 and Wednesday January 15 2014 at 16:16:41Central Time.

The dataset studied in this series of posts was collected and cleaned by Chris Zarate and myself.

After deduplication we were down to 27,491 tweets, and in a sub-set that collects the tweets posted during the actual convention days the total number of tweets in this period sums 21,915 tweets.

We have been offering some key figures and some basic visualisations of the data.

For the first part of this series, click here.

For the second part of this series, click here.

For the third part of this series, click here.

Text Analysis

We used the Voyant Tools (previously the unfortunately-named Voyeur), a web-based reading and analysis environment for digital texts developed by Stéfan Sinclair and Geoffrey Rockwell, to obtain the most frequent words in the text of the total number of tweets (this includes RTs and replies) posted with #MLA14 during each day of the convention.

Below we share some word clouds to visualise this. As most people know now word clouds are visual presentations of keywords extracted from a text which are visually differentiated according to their position and frequency of use in that text. Voyant uses Cirrus, which is a “visualization tool that displays a word cloud relating to the frequency of words appearing in one or more documents. […] The larger the word, the more frequent the term.”

In this case we are sharing static image files exported from Voyant itself. We are also including the top 5 most frequent words in each set of tweets. In all cases we used a customised English (“Taporware”) stop words list that was applied globally including words like #mla14, MLA, RT, panel, session, http, t.co, etc.

Numbered hashtags corresponding to sessions were not included in the stop word list as one of the intentions was to reveal which sessions were more frequently mentioned each day. (To find out which sessions correspond to each numbered hashtag check the online Program).

Limitations and Fair Warning

After running the four different corpora more than once through Voyant we discovered the tool was unable to reproduce the same results, particularly regarding word and unique word counts. Top 5 most frequent words remained with minimal variations of little significance, which might mean the results we share in that regard are more or less reliable, though not 100% exact.

We were logically disappointed at the failure to ensure reproducibility using the same corpora and the same tool (we don’t consider each corpus to be too large for reliable text analysis). We will keep looking into it and will keep aiming for reproducibility of the results with different tools, and we will update any findings here.

Here we are only presenting as a research progress update the figures and clouds obtained after the fourth trial, having cleared caches and ensuring the corpora were complete.

Thursday 9 January 2012

Total number of tweets: 4,558

Total number of words: 71,630

Total number of unique words: 9,142

Top 5 most frequent words in the corpus: #s80 (271), #s66 (199), humanities (188), #s130 (156), #s173 (150).

#MLA14 Thursday 9 January Cirrus Word Cloud. Retrieved January 22, 2014 from http://voyeurtools.org/tool/Cirrus/

Friday 10 January 2014

Total number of tweets: 7,417

Total number of words: 131,500

Total number of unique words: 13,367

Top 5 most frequent words in the corpus: data (381), #s299 (378), students (354), #s339 (342), reading (342).

#mla14 Friday 10 January Cirrus Word Cloud. Retrieved January 22, 2014 from http://voyeurtools.org/tool/Cirrus/

Saturday 11 January 2014

Total number of tweets: 6,265

Total number of words: 112,482

Total number of unique words: 11,954

Top 5 most frequent words in the corpus: #s577 (562), digital (543), work (413), humanities (340), #medievaltwitter (283).

#MLA14 Saturday 11 January Cirrus Word Cloud. Retrieved January 22, 2014 from http://voyeurtools.org/tool/Cirrus/

Sunday 12 January 2014

Total number of tweets: 3,675

Total number of words: 66,426

Total number of unique words: 8,206

Top 5 most frequent words in the corpus: #s679 (626), digital (266), #s738 (212), @adelinekoh (174), #s708 (173).

#MLA14 Sunday 12 January Cirrus Word Cloud. Retrieved January 22, 2014 from http://voyeurtools.org/tool/Cirrus/

Tool Citation

Sinclair, S. and G. Rockwell (2014). Voyant Tools: Reveal Your Texts. Voyant. Retrieved January 22, 2014 from http://voyeurtools.org/

—

For the first part of this series, click here.

For the second part of this series, click here.

For the third part of this series, click here.

5 thoughts on “#MLA 14: A First Look (IV)”

Brian Croxall 01/28/2014 at 16:45

This is fascinating work, and I’m really surprised to see that there are more tweets on Friday than Saturday, which I tend to think of as the biggest and busiest day of the Convention.

Thanks for sharing.
Ernesto Priego 01/28/2014 at 16:55

Thank you Brian. It is indeed surprising there would be more tweets on Friday than Saturday. Whilst following real time, it definitely looked like there were more tweets per minute on Saturday than in any other day.

It has to be said it is possible there might have been more tweets on Saturday, though. We noticed some longer gaps between tweets that we would have to look more into but had assumed corresponded with breaks, night time, etc. It’s also possible that on Saturday, a busier day, people might have actually indeed corrspondingly tweeted less. But it’s all hypothetical at this stage.

So the right way of putting it is that there are less tweets for Saturday than for Friday *in the dataset we were able to collect and dedupe*. We will be sharing this data set asap, and others will be able to look at it directly. We are more or less satisfied the dataset is the closest to a complete set of the tagged tweets during the period of the conference, but due to several factors it’s possible the figures are variable.

If anyone else archived tweets over Saturday and has a clean set we could make comparisons and find if there the results are reproducible.

Thank you for reading and commenting!
Pingback: #MLA14: A First Look (III) | Far Away, Yet Close
Pingback: MLA 2014 Notes - @elotroalex
Pingback: #MLA14 Twitter Archive Added to Academic Commons

Comments are closed.