Deerwalk

JOURNAL OF COMPUTER SCIENCE & TECHNOLOGY

Automated Detection of Hate Speech in Twitter Using Natural Language Processing

Keywords: NLP, Hate Speech Classification, SVM, NLP Classifier, LSTM, Chrome Extension

Authors:
Aayam Ojha -

Published Date: 2024-09-09

ABSTRACT

This research project aimed to develop a highly efficient and effective Chrome extension 
that could classify tweets containing hate speech, along with conducting sentiment and 
topic analysis. Hate speech is a persistent and concerning issue on Twitter, yet the platform 
has made little effort to address it. To address this challenge, this research performed a 
series of experiments, including the use of Support Vector Machine (SVM), Random 
Forest, and Long Short-Term Memory-based (LSTM) neural network classifiers. The 
results of the experiments showed that the SVM classifier, combined with word2vec [1] 
feature engineering, outperformed all other methods. 
Then developed the Chrome extension using a monolithic repository architecture, utilizing 
React and Django. By implementing this extension, users are able to automatically analyze 
live tweets and identify hate speech content, along with obtaining sentiment and topic 
analysis. The outcome of this research project could provide a significant contribution 
towards a more positive online environment and towards curbing the prevalence of hate 
speech on Twitter.

REFERENCES

[1] T. M. Ilya, I. Sutskever, K. Chen, G. Corrado and J. Dean, "Distributed 
Representations of Words and Phrases and their Compositionality," CoRR, vol. 
1301.3781, 16 Oct 2013. 
[2] K. Saha, E. Chandrasekharan and M. D. Choudhury, "Prevalence and Psychological 
Effects of Hateful Speech in Online College Communities," Association for 
Computing Machinery, New York, 2019. 
[3] C. Calvert, "Hate Speech and Its Harms: A Communication Theory Perspective," 
Journal of Communication,, vol. 47, no. 1, pp. 4-19, 1997. 
[4] F. Barbieri, L. E. Anke and J. Camacho-Callados, "XLM-T: Multilingual Language 
Models in Twitter for Sentiment Analysis and Beyond," in 13th Conference on 
Language Resources and Evaluation, Marseille, 2022. 
[5] T. Kuzman, "Comparison of genre datasets: CORE, GINCO and FTD," 2022. 
[Online]. Available: https://github.com/TajaKuzman/Genre-Datasets-Comparison. 
[6] Twitter, "Hateful conduct policy," Twitter Help Center, 08 2021. [Online]. Available: 
https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy. [Accessed 05 
03 2023]. 
[7] S. Frenkel and K. Conger, "Hate Speech's Rise on Twitter Is Unprecedented, 
Researchers Find," The New York Times, 02 12 2022. [Online]. Available: 
https://www.nytimes.com/2022/12/02/technology/twitter-hate-speech.html. 
[Accessed 05 03 2023]. 
[8] S. Abra, S. Shaikh and Z. Hussain, "Automatic Hate Speech Detection using Machine 
Learning: A Comparative Study," ResearchGate, vol. 11, no. 8, 2020. 
[9] A. Bisht, A. Singh, H. S. Bhadauria and V. Jitendra, "Detection of Hate Speech and 
Offensive Language in Twitter Data Using LSTM Model," ResearchGate, pp. 243- 
264, 2020. 
[10] J. Pennington, R. Socher and C. D. Manning, "GloVe: Global Vectors for Word 
Representation," 2014. 
[11] G. L. D. L. P. Sarracen, R. G. Pons, C. E. M. Cuza and P. Rosso, "Hate Speech 
Detection using Attention-based LSTM," EVALITA, pp. 235-238, 2018. 

[12] S. A. Kokatnoor and B. Krishnan, "978-1-7281-8818-8/20/$31.00 ©2020 IEEE 
Twitter Hate Speech Detection using Stacked Weighted Ensemble (SWE) Model," 
ResearchGate, pp. 87-92, 2020. 
[13] A. U. Lyer, "Toxic Tweets Dataset," Kaggle.com, 2021. [Online]. Available: 
https://www.kaggle.com/datasets/ashwiniyer176/toxic-tweets-dataset.


(Total Views: 195)