Twitter Sentiment Analysis: Analyzing the use of hashtags in the Black Lives Matter Movement
Authors: Sia Phulambrikar (Project Lead), Jason Clark, Memphis Lau, Zoe Powers and Nathanael Nam
It has been a decade since the hashtag #BlackLivesMatter was first used in 2012, in response to the shooting of Trayvon Martin in Florida. Since then, the movement has seen a cyclical resurgence, seeing its most significant spike in May 2020 following the death of George Floyd in Minnesota. The momentum for political change brought about by the reemergence of #BLM has been phenomenal — it has manifested itself into a global movement, amassing a diverse population across the world calling for action on systemic racism and police brutality.
What is different in this era of the largest social movement of the 21st century is how it has been able to harness the power of social media to propagate its message. The immediacy of platforms like Twitter presented a ready platform to organize protests, spread activism and increase visibility. In 2020, there was a spike in politically motivated Twitter activity from both left-leaning and right-leaning groups and protesters. In this article, we use a range of sources to understand the ways in which public opinion surrounding the BLM movement was expressed on Twitter. By conducting a Sentiment Analysis of BLM related tweets, we try to decode the power of hashtags as a tool for activism in the movement.
Using Twitter data from May 2020 to August 2020, we extracted tweets related to BLM, and created Word Clouds for hashtags used in these tweets to analyze the most popular hashtags.
In the beginning of Summer 2020, immediately after the death of George Floyd, social media hashtags focused heavily on George Floyd, demanding justice for him by using hashtags such as #georgefloyd, #icantbreathe and #justiceforgerogefloyd. However, as the summer progressed, much of the internet activity unified around the agenda of #policebrutality, followed by #BLM, showing momentum towards the larger movement that was set alight after the death of George Floyd.
Using the same dataset, we observed how the popularity of the hashtags #icantbreathe, #policebrutality and #peacefulprotest varied with time, focusing on relevant historic events, protests and political action.
In the graph below comparing the trendline of three hashtags, we see that #icantbreathe and #policebrutality initially picked up almost an equal amount of momentum. Starting June, however, #policebrutality grew to almost quadruple the size of #icantbreathe. This represents a shift in the direction of activism from anger over one singular event to a broader movement surrounding police brutality.
Zooming in to #peacefulprotest, we tried to associate the trendline from June to August 2020 with relevant events that could have led to spikes and dips in the hashtag’s usage. One of the largest spikes in the trendline appears on June 3, 2020, when the Justice for George Floyd petition hit 3 million signatures. There was another short peak around June 14, 2020, when Rayshard Brooks is shot multiple times by the police, leading to protests.
We see similar peaks in the timeline of #policebrutality tweets from June to August 2020. Two major peaks appear on June 2, 2020, which marks the day protestors gather on the site of George Floyd’s brutal death by a police officer. Another spike appears on June 5, 2020, when Minneapolis bans chokeholds by the police, and regulations are imposed on Denver police against using tear gas, plastic bullets and flash grenades against protestors.
Sentiment Analysis
To delve deeper into the ways hashtags and tweets were used to convey public opinion surrounding the movement, we conducted a sentiment analysis on various datasets of BLM related tweets.
One such dataset we analyzed contained over 300,000 tweets from Minnesota between April 2020 and July 2020. The dataset was pre-classified into sentiments modeled using Robert Pluthick’s Wheel of Emotions, which span 8 emotions including Anger, Sadness, Disgust, Surprise, Trust, Joy, Fear, and Anticipation.
From our exploratory analysis, we found that the most common emotion expressed in this dataset was Trust, followed by Fear and Anticipation. Surprisingly, according to the given classification, ‘anger’ or ‘sadness’ is expressed in a much smaller proportion of tweets compared to ‘trust’ and ‘anticipation’. However, without context for the tweet, it is difficult to decode an emotion such as ‘trust’: it could mean anything from lack of trust in the justice system to positive trust in the power of the movement. For the rest of our analysis, we excluded ‘trust’ from our list of emotions, since this turned out to be the most difficult to interpret from the given data.
Given below is an example of the most common words found in tweets pertaining to each of the sentiments.
Common Words per Sentiment
Anger: outrage, protest
Anticipation: time
Disgust: shit
Fear: police
Joy: happy, love
Sadness: mourn
Surprise: Trump
Trust: justice
The most common words found in the dataset were “blacklivesmatter”, “people”, “protest”, “white”, “support”, and “trump”. In the graphs that follow, we use the pre-classified sentiments to model the sentiment distribution for each of the most common words, and analyze which emotions these words were most commonly used to express.
Next, we attempted to deduce our own binary classification of these tweets. For this, we used a dataset of tweets related to #Georgefloydfuneral, and attempted to classify them into positive and negative. Using a HuggingFace pipeline, we used a pre-trained transformer model for sentiment analysis. This pipeline selects a pre-trained model for sentiment analysis in English, and returns labels ‘positive’ and ‘negative’ for each string passed, along with a sentiment score.
After pre-processing the Twitter data and fine-tuning the pretrained model, we were able to label each tweet as positive or negative. Overall, we found that the tweets in the #GeorgeFloydFuneral dataset displayed an overwhelmingly negative sentiment.
In order to better demonstrate how the model classifies a tweet, we randomly sampled and extracted 3 positive and negative labeled tweets to see if the classification matches our intuition. The tweets in the blue box were marked ‘positive’ by the model, while the tweets in the red box were marked as ‘negative’. A tweet that says “Passerby takes selfie as police officer wrestles suspect” relays a negative sentiment, while a tweet about the “most powerful moments” from George Floyd’s funeral was classified as a positive tweet. Thus, the model was successfully able to perform the binary classification task, attaching labels to tweets in the dataset that match our intuitive understanding of positive and negative sentiments.
That being said, we found it difficult to interpret the scores corresponding to the labels. It was difficult to analyze the polarization of tweets, or create a spectrum of the negativity or positivity of a set of tweets, because the scores given by the model were very skewed. In other words, if we found 2 tweets that were classified as negative, we were unable to discern which one was ‘more’ negative, and which one was closer to neutral, because most of the tweet scores were highly skewed and close to 0.9. This skew in scores is illustrated in the boxplots below.
This provides us some scope for improvement: to fine-tune the model such that the scores are intelligible and useful for further analysis. Alternatively, we can explore other pre-trained models or clustering algorithms for the unsupervised Sentiment Analysis problem.
Conclusion
The use of social media as a tool for political activism is on the rise. The Black Lives Matter movement is one of the best examples of a movement that grew standing on the shoulders of Twitter hashtags. Learning from the movement’s resonating popularity, it is important for activists, social media users, and readers to understand how tweets of only 280 characters can generate powerful sentiments. For activists, this would mean using sentiment analyses to choose the right words to relay the correct political motives. On the other hand, it is pertinent for social media users to understand how specific words and hashtags in tweets can be used to manipulate and sway their emotions. Our method of conducting this analysis is only one such way of discerning sentiments from tweets, leaving much room for improvement by fine-tuning the model and correcting the imbalance in pre-labelled datasets.
Works Cited
- https://github.com/MeysamAsgariC/BLMT
- https://www.kaggle.com/datasets/carlsonhoo/baselinedataset
- https://www.kaggle.com/datasets/jl18pg052/tweets-for-georgefloydfuneral
- https://www.pewresearch.org/internet/2018/07/11/an-analysis-of-blacklivesmatter-and-other-twitter-hashtags-related-to-political-or-social-issues/
- https://www.pnas.org/doi/10.1073/pnas.2205767119#data-availability
- https://medium.com/towards-artificial-intelligence/unsupervised-sentiment-analysis-with-real-world-data-500-000-tweets-on-elon-musk-3f0653135558
- https://towardsdatascience.com/twitter-sentiment-analysis-based-on-news-topics-during-covid-19-c3d738005b55
- https://thinkingneuron.com/sentiment-analysis-of-tweets-using-bert/
- https://huggingface.co/learn/nlp-course/chapter1/3?fw=pt