“Go eat a bat, Chang!”: On the Emergence of Sinophobic Behavior on Web Communities in the Face of CO
Link: https://dl.acm.org/doi/10.1145/3442381.3450024
Conference: WWW 2021
Keywords: COVID-19, Sinophobia, Hate Speech, Twitter, 4chan
My notes
Summary
Collect 5 month post data from 2 online communities (4chan /pol/ and Twitter), measure the trends of Sinophobic, and investigate new terms with word2vec.
Related Technique: crawler/scrapper, word2vec, similarity calculation, data visualization
Pros
Hot topic from special perspectives (hatespeech and conspiracy theories) target to sinophobic
Nice intuitive visualization
Comperehensive investigation with NLP technique (word2vec)
Cons
Some statements are subjective inferences without groundtruths. Some slurs target to Asians insted of Chinese only (e.g. zipperhead, ricenigger)
Didn't explain pros of techniques they adopted
Details
Abstract
Target to the spread of COVID-19, to keep social distance, online medium became more active, but with spreading potentially harmful and disturbing content (e.g. conspiracy theories and hateful speech).
We collected and analyzed 2 larrge datasets from Twitter and 4chan (time period: 5 months) to investigate whether there is a rise or important differences with regard to the dissemination of Sinophobic content.
As a conclusion, COVID-19 boosted Sinophobia online and it is a cross-platform pheomenon (from fringe web communities like /pol/ ( "Politically Incorrect") to mainstream platforms like Twitter).
Findings
Discussions related to China and Chinese people on Twitter and 4chan's /pol/ after the outbreak of the COVID-19 pandemic raised. The increase in these discussions and Sinophobic slurs coincides with real-world events related to the COVID-19 pandemic.
With word embeddings, we find that various racial slurs are used in these contexts on both Twitter and /pol/ which shows it is a cross-platform phenomenon.
Analyzing word embeddings over time, we discover new emerging slurs and terms related to Sinophobic behavior, as well as the COVID-19 pandemic (e.g. asshoe, Kungflu).
By comparing data before and after COVID-19 outbreak, we observe shifts in the content posted by users on Twitter (towards blaming China and Chinese people) and /pol/ (towards using more and new Sinophobic slurs.).
Data
Twitter: Collection Tool: Streaming API Time Period: 1 Nov 2019 - 22 Mar 2020 Amount: 222,212,841 tweets (English only)
4chan /pol/: Collection Tool: JsonAPI Time Period: 1 Nov 2019 - 22 Mar 2020 Amount: 16,808,191 posts.
Temporal Analysis
1. The rises on 4chan match the real-world major events Mentions of the terms “china” and “chinese” on 4chan's /pol/
Major events
1
2019-12-12
President Donald Trump signs an initial trade deal with China.
2
2020-01-23
The Chinese government announces a lock-down in Wuhan.
3
2020-01-30
The World Health Organization declares a public health emergency].
4
2020-02-23
11 municipalities in Lombardy, Italy are locked down [19].
5
2020-03-09
Italy extends restrictions in the northern region of the country.
6
2020-03-16
Donald Trump referred to COVID-19 as “Chinese Virus” on Twitter.
3. Temporal dynamics of Sinophobic racial slurs on 4chan's /pol/ and Twitter
Slur pick: "chink,” “bugland,” “chankoro,” “chinazi,” “gook,” “insectoid,” “bugmen,” and “chingchong”
Top 20 most similar words, along with their cosine similarities, to the words “china,” “chinese,” and “virus” obtained from the word2vec models trained for the whole period (November 2019 - March 2020).
Takeaway of Temporal Analysis
4chan and Twitter are heavily discussing China in relation to COVID-19, and that this discussion accelerated rapidly once the Western world became affected.
Differences in the use of slurs on /pol/ and Twitter.
Content Analysis
Technique: word2vec (skip-gram) Method: 3 groups of word2vec models for each of Twitter and /pol/: Group1: $W_a$ trained on all posts made during the period between October 28, 2019 and March 22, 2020 to study the use of words for the entire duration of our study. Group2: One distinct word2vec model for each week between October 28, 2019 and March 22, 2020, $W_{t=i}$, i∈T (i is the ith week in T). To study changes in the use of words over time. Group3: $W_c$ trained on all posts (July 1, 2016 to November 1, 2019) to investigate the emergence of new terms during the period of our study
2 categories of profanities:
Insults addressing Asian people: Racist variations of “china” and “chinese" (e.g. chingchong) Culturally oriented racist terms (e.g. ricenigger, yellownigger)
Sexual stereotypes (e.g., pindick)
Content Evolution
4chan pol
4chan's /pol/ users typically use racial slurs targeted to Chinese people even before the outbreak of the COVID-19 pandemic.
A rise in the use of this term in discussions related to China.
Slurs used against Chinese people increased.
Twitter
New Term Discovery During important real-world events, such as the COVID-19 pandemic, language evolves and new terms emerge on Web communities like Twitter.
Semantic Changes between Words
For each $W_{t=i}$, we extract the cosine similarity between two terms and then we plot their similarities over time. Finding: With time going, more similarity between "Chinese" and "Virus","Chink"
Cosine similarities between various terms over time.
Conclusion
The paper aims to understand Sinophobic language related to COVID-19 on the social web.
The study collects data from 4chan's /pol/ and Twitter over five months, with word embedding, revealing a rise in Sinophobic content which is a crossplatform phenomenon on fringe web communities and mainstream platforms.
Sinophobic behavior evolves quickly and substantially, especially after world changing events like the COVID-19 pandemic.
Last updated