Bridging the Gap
Enhancing Kurdish News Classification with RFA-CNN Hybrid Model
Abstract
Effective organization and retrieval of news content are heavily reliant on accurate news classification. While the mountainous research has been conducted in resourceful languages like English and Chinese, the researches on under-resourced languages like the Kurdish language are severely lacking. To address this challenge, we introduce a hybrid approach called RFO-CNN in this paper. The proposed method combines an improved version of red fox optimization algorithm (RFO) and convolutional neural network (CNN) for finetuning CNN’s parameters. Our model’s efficacy was tested on two widely used Kurdish news datasets, KNDH and KDC-4007, both of which contain news articles classified into various categories. We compared the performance of RFO-CNN to other cutting-edge deep learning models such as bidirectional long short-term memory networks and bidirectional encoder representations from transformers (BERT) transformers, as well as classical machine learning approaches such as multinomial naive bayes, support vector machine, and K-nearest neighbors. We trained and tested our datasets using four different scenarios: 60:40, 70:30, 80:20, and 90:10. Our experimental results demonstrate the superiority of the RFO-CNN model across all scenarios, outperforming the benchmark BERT model and other machine learning models in terms of accuracy and F1-score.
Downloads
References
Ahmadi, S., 2020. KLPT-Kurdish Language Processing Toolkit. In Proceedings of the Second Workshop for NLP Open Source Software (NLP-OSS), pp.72-84. DOI: https://doi.org/10.18653/v1/2020.nlposs-1.11
Al-Tahrawi, M.M., 2015. Arabic text categorization using logistic regression. International Journal of Intelligent Systems and Applications, 7(6), pp.71-78. DOI: https://doi.org/10.5815/ijisa.2015.06.08
Azad, R., Mohammed, B., Mahmud, R., Zrar, L., and Sdiqa, S.J., 2021. Fake news detection in low resourced languages ”Kurdish language” using machine learning algorithms. Journal of Computational Science Education, 12(6), pp.4219-4225.
Badawi, S., 2023. Data augmentation for Sorani Kurdish news headline classification using back-translation and deep learning model. Kurdistan Journal of Applied Research, 8(1), pp.27-34. DOI: https://doi.org/10.24017/science/2023.1.4
Badawi, S., 2024. Deep learning-based cyberbullying detection in Kurdish language. The Computer Journal, p.bxae024. DOI: https://doi.org/10.1093/comjnl/bxae024
Badawi, S., Saeed, A.M., Ahmed, S.A., Abdalla, P.A., and Hassan, D.A., 2023. Kurdish News Dataset Headlines (KNDH) through multiclass classification. Data in Brief, 48, p.109120. DOI: https://doi.org/10.1016/j.dib.2023.109120
Badawi, S.S., 2023. Using multilingual bidirectional encoder representations from transformers on medical corpus for Kurdish text classification. ARO-The Scientific Journal of Koya University, 11(1), pp.10-15. DOI: https://doi.org/10.14500/aro.11088
Bouras, C., and Tsogkas, V., 2009. Personalization Mechanism for Delivering News Articles on the User’s Desktop. In: 2009 Fourth International Conference on Internet and Web Applications and Services, pp.157-162. DOI: https://doi.org/10.1109/ICIW.2009.30
Chen, X., Cong, P., and Lv, S., 2022. A Long-text classification method of Chinese news based on BERT and CNN. IEEE Access, 10, pp.34046-34057. DOI: https://doi.org/10.1109/ACCESS.2022.3162614
Cleger-Tamayo, S., Fernandez-Luna, J.M., and Huete, J.F., 2012. Top-N news recommendations in digital newspapers. Knowledge-Based Systems, 27, pp.180-189. DOI: https://doi.org/10.1016/j.knosys.2011.11.017
Dai, Y., and Wang, T., 2021. Prediction of customer engagement behaviour response to marketing posts based on machine learning. Connection Science, 33(4), pp.891-910. DOI: https://doi.org/10.1080/09540091.2021.1912710
Garrido, A.L., Gomez, O., Ilarri, S., and Mena, E., 2011. NASS: News Annotation Semantic System. IN: 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence, pp.904-905. DOI: https://doi.org/10.1109/ICTAI.2011.149
Jing, W., and Bailong, Y., 2021. News Text Classification and Recommendation Technology Basedon Wide and Deep-Bert Model. In: 2021 IEEE International Conference on Information Communication and Software Engineering (ICICSE), pp.209-216. DOI: https://doi.org/10.1109/ICICSE52190.2021.9404101
Jugovac, M., Jannach, D., and Karimi, M., 2018. Streamingrec. In: Proceedings of the 12th ACM Conference on Recommender Systems, pp.269-273. DOI: https://doi.org/10.1145/3240323.3240384
Kaliyar, R.K., Goswami, A., and Narang, P., 2021. FakeBERT: Fake news detection in social media with a BERT-based deep learning approach. Multimedia Tools and Applications, 80(8), pp.11765-11788. DOI: https://doi.org/10.1007/s11042-020-10183-2
Khorami, E., Mahdi Babaei, F., and Azadeh, A., 2021. Optimal diagnosis of COVID-19 based on convolutional neural network and red fox optimization algorithm. Computational Intelligence and Neuroscience, 2021, p.4454507. DOI: https://doi.org/10.1155/2021/4454507
Liu, J., Xia, C., Yan, H., Xie, Z., and Sun, J., 2019. Hierarchical Comprehensive Context Modeling for Chinese Text Classification. IEEE Access, 7, pp.154546-154559. DOI: https://doi.org/10.1109/ACCESS.2019.2949175
Mahesh, P.C.S., and Hemalatha, S., 2022. An efficient android malware detection using adaptive red fox optimization based CNN. Wireless Personal Communications, 126(1), pp.679-700. DOI: https://doi.org/10.1007/s11277-022-09765-0
Połap, D., and Wozniak, M., 2021. Red fox optimization algorithm. Expert Systems with Applications, 166, p.114107. DOI: https://doi.org/10.1016/j.eswa.2020.114107
Pugal Priya, R., Saradadevi Sivarani, T., and Gnana Saravanan, A., 2022. Deep long and short term memory based Red Fox optimization algorithm for diabetic retinopathy detection and classification. International Journal for Numerical Methods in Biomedical Engineering, 38(3), p.e3560. DOI: https://doi.org/10.1002/cnm.3560
Rashid, T.A., Mustafa, A.M., and Saeed, A.M., 2017. Automatic Kurdish Text Classification Using KDC 4007 Dataset. In: International Conference on Emerging Intelligent Data and Web Technologies. DOI: https://doi.org/10.1007/978-3-319-59463-7_19
Reddy, S., Nalluri, S., Kunisetti, S., Ashok, S., and Venkatesh, B., 2019. Content Based Movie Recommendation System Using Genre Correlation. Springer, Singapore, pp.391-397. DOI: https://doi.org/10.1007/978-981-13-1927-3_42
Saeed, A.M., Badawi, S., Ahmed, S.A., and Hassan, D.A., 2023. Comparison of feature selection methods in Kurdish text classification. Iran Journal of Computer Science, 7, pp.55-64. DOI: https://doi.org/10.1007/s42044-023-00159-4
Salh, D.A., and Nabi, R.M., 2023. Kurdish fake news detection based on machine learning approaches. Passer Journal of Basic and Applied Sciences, 5(2), pp.262-271. DOI: https://doi.org/10.24271/psr.2023.380132.1226
Tan, Y., 2018. An Improved KNN Text Classification Algorithm Based on K-Medoids and Rough Set. In: 2018 10th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), pp.109-113. DOI: https://doi.org/10.1109/IHMSC.2018.00032
Verma, P.K., Agrawal, P., Amorim, I., and Prodan, R., 2021. WELFake: Word embedding over linguistic features for fake news detection. IEEE Transactions on Computational Social Systems,8(4), pp.881-893. DOI: https://doi.org/10.1109/TCSS.2021.3068519
Xie, J., Chen, B., Gu, X., Liang, F., and Xu, X., 2019. Self-attention-based BiLSTM model for short text fine-grained sentiment classification. IEEE Access, 7, pp.180558-180570. DOI: https://doi.org/10.1109/ACCESS.2019.2957510
Zhang, C., Gupta, A., Kauten, C., Deokar, A.V., and Qin, X.J., 2019. Detecting fake news for reducing misinformation risks using analytics approaches. European Journal of Operational Research, 279(3), pp.1036-1052. DOI: https://doi.org/10.1016/j.ejor.2019.06.022
Zhang, Y., Xu, B., and Zhao, T., 2020. Convolutional multi-head self-attention on memory for aspect sentiment classification. IEEE/CAA Journal of Automatica Sinica, 7(4), pp.1038-1044. DOI: https://doi.org/10.1109/JAS.2020.1003243
Zhu, Y., 2021. Research on news text classification based on deep learning convolutional neural network. Wireless Communications and Mobile Computing, 2021, p.1508150 DOI: https://doi.org/10.1155/2021/1508150
Copyright (c) 2024 Soran S. Badawi
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Authors who choose to publish their work with Aro agree to the following terms:
-
Authors retain the copyright to their work and grant the journal the right of first publication. The work is simultaneously licensed under a Creative Commons Attribution License [CC BY-NC-SA 4.0]. This license allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
-
Authors have the freedom to enter into separate agreements for the non-exclusive distribution of the journal's published version of the work. This includes options such as posting it to an institutional repository or publishing it in a book, as long as proper acknowledgement is given to its initial publication in this journal.
-
Authors are encouraged to share and post their work online, including in institutional repositories or on their personal websites, both prior to and during the submission process. This practice can lead to productive exchanges and increase the visibility and citation of the published work.
By agreeing to these terms, authors acknowledge the importance of open access and the benefits it brings to the scholarly community.