Deerwalk Journal

Automated Detection of Hate Speech in Twitter Using Natural Language Processing

Keywords: NLP, Hate Speech Classification, SVM, NLP Classifier, LSTM, Chrome Extension

Authors:
Aayam Ojha -

Published Date: 2024-09-09

View PDF Download PDF

ABSTRACT

This research project aimed to develop a highly efficient and effective Chrome extension
that could classify tweets containing hate speech, along with conducting sentiment and
topic analysis. Hate speech is a persistent and concerning issue on Twitter, yet the platform
has made little effort to address it. To address this challenge, this research performed a
series of experiments, including the use of Support Vector Machine (SVM), Random
Forest, and Long Short-Term Memory-based (LSTM) neural network classifiers. The
results of the experiments showed that the SVM classifier, combined with word2vec [1]
feature engineering, outperformed all other methods.
Then developed the Chrome extension using a monolithic repository architecture, utilizing
React and Django. By implementing this extension, users are able to automatically analyze
live tweets and identify hate speech content, along with obtaining sentiment and topic
analysis. The outcome of this research project could provide a significant contribution
towards a more positive online environment and towards curbing the prevalence of hate
speech on Twitter.

REFERENCES

[1] T. M. Ilya, I. Sutskever, K. Chen, G. Corrado and J. Dean, "Distributed
Representations of Words and Phrases and their Compositionality," CoRR, vol.
1301.3781, 16 Oct 2013.
[2] K. Saha, E. Chandrasekharan and M. D. Choudhury, "Prevalence and Psychological
Effects of Hateful Speech in Online College Communities," Association for
Computing Machinery, New York, 2019.
[3] C. Calvert, "Hate Speech and Its Harms: A Communication Theory Perspective,"
Journal of Communication,, vol. 47, no. 1, pp. 4-19, 1997.
[4] F. Barbieri, L. E. Anke and J. Camacho-Callados, "XLM-T: Multilingual Language
Models in Twitter for Sentiment Analysis and Beyond," in 13th Conference on
Language Resources and Evaluation, Marseille, 2022.
[5] T. Kuzman, "Comparison of genre datasets: CORE, GINCO and FTD," 2022.
[Online]. Available: https://github.com/TajaKuzman/Genre-Datasets-Comparison.
[6] Twitter, "Hateful conduct policy," Twitter Help Center, 08 2021. [Online]. Available:
https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy. [Accessed 05
03 2023].
[7] S. Frenkel and K. Conger, "Hate Speech's Rise on Twitter Is Unprecedented,
Researchers Find," The New York Times, 02 12 2022. [Online]. Available:
https://www.nytimes.com/2022/12/02/technology/twitter-hate-speech.html.
[Accessed 05 03 2023].
[8] S. Abra, S. Shaikh and Z. Hussain, "Automatic Hate Speech Detection using Machine
Learning: A Comparative Study," ResearchGate, vol. 11, no. 8, 2020.
[9] A. Bisht, A. Singh, H. S. Bhadauria and V. Jitendra, "Detection of Hate Speech and
Offensive Language in Twitter Data Using LSTM Model," ResearchGate, pp. 243-
264, 2020.
[10] J. Pennington, R. Socher and C. D. Manning, "GloVe: Global Vectors for Word
Representation," 2014.
[11] G. L. D. L. P. Sarracen, R. G. Pons, C. E. M. Cuza and P. Rosso, "Hate Speech
Detection using Attention-based LSTM," EVALITA, pp. 235-238, 2018.

[12] S. A. Kokatnoor and B. Krishnan, "978-1-7281-8818-8/20/$31.00 ©2020 IEEE
Twitter Hate Speech Detection using Stacked Weighted Ensemble (SWE) Model,"
ResearchGate, pp. 87-92, 2020.
[13] A. U. Lyer, "Toxic Tweets Dataset," Kaggle.com, 2021. [Online]. Available:
https://www.kaggle.com/datasets/ashwiniyer176/toxic-tweets-dataset.

(Total Views: 414)