Time Series-Based Spoof Speech Detection Using Long Short-Term Memory and Bidirectional Long Short-Term Memory

Keywords: Bidirectional Long Short-Term Memory, Constant Q cepstral coefficients, Countermeasure Spoofing, Long Short-Term Memory, Mel-frequency cepstral coefficients, Open-source speech and music interpretation by large-space extraction

Abstract

Detecting fake speech in voice-based authentication systems is crucial for reliability. Traditional methods often struggle because they can't handle the complex patterns over time. Our study introduces an advanced approach using deep learning, specifically Long Short-Term Memory (LSTM) and Bidirectional LSTM (BiLSTM) models, tailored for identifying fake speech based on its temporal characteristics. We use speech signals with cepstral features like Mel-frequency cepstral coefficients (MFCC), Constant Q cepstral coefficients (CQCC), and open-source Speech and Music Interpretation by Large-space Extraction (OpenSMILE) to directly learn these patterns. Testing on the ASVspoof 2019 Logical Access dataset, we focus on metrics such as min-tDCF, Equal Error Rate (EER), Recall, Precision, and F1-score. Our results show that LSTM and BiLSTM models significantly enhance the reliability of spoof speech detection systems.

Downloads

Download data is not yet available.

Author Biographies

Arsalan R. Mirza, Department of Computer Science, Faculty of Science, Soran University, Soran, Kurdistan Region – F.R. Iraq

Arsalan R. Mirza is an Assistant Lecturer at Soran University. He received his B.Sc. in Software Engineering from Salahadding University. Also, he holds a Master's degree in Software Engineering from Near East University, where his research focused on Software Engineering and Artificial Intelligence. He is a Ph.D. student in the Department of Computer Science at Soran University, specializing in semantic web and deep learning. His current research interests include automatic speech recognition, speaker verification, and countermeasure spoofing.

Abdulbasit K. Al-Talabani, Department of Software Engineering, Faculty of Engineering, Koya University, Koya KOY45, Kurdistan Region - F.R. Iraq

Abdulbasit Al-Talabani is an Assistant Prof. at the Department of Software Engineering, Faculty of Engineering, Koya University. He has a B.Sc. in mathematics at Salahadin University, Iraq, an M.Sc. in Computer Science, Koya University, Iraq, and a PhD degree at applied computing, Buckingham University, UK. His research interests is in machine learning, speech processing, and computer vision.

References

Abdul, Z.K., and Al-Talabani, A.K., 2022. Mel frequency cepstral coefficient and its applications: A review. IEEE Access, 10, pp. 122136-122158. DOI: https://doi.org/10.1109/ACCESS.2022.3223444

Adiban, M., Sameti, H., and Shehnepoor, S., 2020. Replay spoofing countermeasure using autoencoder and siamese networks on ASVspoof 2019 challenge. Computer Speech and Language, 64, pp. 1-10. DOI: https://doi.org/10.1016/j.csl.2020.101105

Ahmed, N., Khan, J., Sheta, N., Tarek, R., Zualkernan, I., and Aloul, F., 2022. Detecting Replay Attack on Voice-Controlled Systems using Small Neural Networks. In: 2022 IEEE 7th Forum on Research and Technologies for Society and Industry Innovation, RTSI 2022, pp.50-54. DOI: https://doi.org/10.1109/RTSI55261.2022.9905158

Bai, Z., and Zhang, X.L., 2021. Speaker recognition based on deep learning: An overview. Neural Networks, 140, pp. 65-99. DOI: https://doi.org/10.1016/j.neunet.2021.03.004

Chakravarty, N., and Dua, M., 2023. Data augmentation and hybrid feature amalgamation to detect audio deep fake attacks. Physica Scripta, 98(9), p. 096001. DOI: https://doi.org/10.1088/1402-4896/acea05

Dave, N., 2013. Feature extraction methods LPC, PLP and MFCC in speech recognition. International Journal for Advance Research in Engineering and Technology, 1(6), pp. 1-5.

Devesh, K., Pavan, K.V., Ayush, A., and Mahadeva Prasanna, S.R., 2022. Fake Speech Detection Using OpenSMILE Features. Springer International Publishing, Berlin.

Eyben, F., Scherer, K.R., Schuller, B.W., Sundberg, J., Andre, E., Busso, C., Devillers, L.Y., Epps, J., Laukka, P., Narayanan, S.S., and Truong, K.P., 2016. The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Transactions on Affective Computing, 7(2), pp. 190-202. DOI: https://doi.org/10.1109/TAFFC.2015.2457417

Eyben, F., Wöllmer, M., and Schuller, B., 2010. OpenSMILE - The Munich Versatile and Fast Open-Source Audio Feature Extractor. In: MM’10-Proceedings of the ACM Multimedia 2010 International Conference, pp.1459-1462. DOI: https://doi.org/10.1145/1873951.1874246

Hassan, F., and Javed, A., 2021. Voice Spoofing Countermeasure for Synthetic Speech Detection. In: 2021 International Conference on Artificial Intelligence, ICAI 2021, pp. 209-212. DOI: https://doi.org/10.1109/ICAI52203.2021.9445238

Hochreiter, S., and Schmidhuber, J., 1997. Long short-term memory. Neural Computation, 9(8), pp. 1735-1780. Jiang, Z., Huang, H., Yang, S., Lu, S., and Hao, Z., 2009. Acoustic Feature Comparison of MFCC and CZT-Based Cepstrum for Speech Recognition. In: 5th International Conference on Natural Computation, ICNC 2009, 1(200808003), pp.55-59. DOI: https://doi.org/10.1109/ICNC.2009.587

Kamble, M.R., Sailor, H.B., Patil, H.A., and Li, H., 2020. Advances in anti-spoofing: From the perspective of ASVspoof challenges. APSIPA Transactions on Signal and Information Processing, 9, e2. DOI: https://doi.org/10.1017/ATSIP.2019.21

Karo, M., Yeredor, A., and Lapidot, I., 2024. Compact time-domain representation for logical access spoofed audio. IEEE/ACM Transactions on Audio Speech and Language Processing, 32, pp.946-958. DOI: https://doi.org/10.1109/TASLP.2023.3341000

Kinnunen, T., Delgado, H., Evans, N., Lee, K.A., Vestman, V., Nautsch, A., Todisco, M., Wang, X., Sahidullah, M., Yamagishi, J., and Reynolds, D.A., 2020. Tandem assessment of spoofing countermeasures and automatic speaker verification: Fundamentals. IEEE/ACM Transactions on Audio Speech and Language Processing, 28, pp. 2195-2210. DOI: https://doi.org/10.1109/TASLP.2020.3009494

Kinnunen, T., Sahidullah, M., Delgado, H., Todisco, M., Evans, N., Yamagishi, J., and Lee, K.A., 2017. The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection. In: Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech, 2017-August, pp.2-6. DOI: https://doi.org/10.21437/Interspeech.2017-1111

Kumari, T.R.J., and Jayanna, H.S., 2015. Comparison of LPCC and MFCC Features and GMM and GMM-UBM Modeling for Limited Data Speaker Verification. In: 2014 IEEE International Conference on Computational Intelligence and Computing Research, IEEE ICCIC 2014, pp. 95-103. DOI: https://doi.org/10.1109/ICCIC.2014.7238329

McFee, B., Raffel, C., Liang, D., Ellis, D.P.W., McVicar, M., Battenberg, E., and Nietok, O., 2015. Librosa: Audio and Music Signal Analysis in Python. In: Proceedings of the 14th Python in Science Conference, (Scipy), pp.18-24. DOI: https://doi.org/10.25080/Majora-7b98e3ed-003

Nautsch, A., Wang, X., Evans, N., Kinnunen, T., Vestman, V., Todisco, M., Delgado, H., Sahidullah, M., Yamagishi, J., and Lee, K.A., 2021. ASVspoof 2019: Spoofing countermeasures for the detection of synthesized, converted and replayed speech. IEEE Transactions on Biometrics, Behavior, and Identity Science, 3(2), pp. 252-265. DOI: https://doi.org/10.1109/TBIOM.2021.3059479

Novoselov, S., Kozlov, A., Lavrentyeva, G., Simonchik, K., and Shchemelinin, V., 2016. STC Anti-Spoofing Systems for the ASVspoof 2015 Challenge. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp.5475-5479. DOI: https://doi.org/10.1109/ICASSP.2016.7472724

Patel, T.B., and Patil, H.A., 2015. Combining Evidences from Mel Cepstral, Cochlear Filter Cepstral and Instantaneous Frequency Features for Detection of Natural vs. Spoofed Speech. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp.2062-2066. DOI: https://doi.org/10.21437/Interspeech.2015-467

Rahmeni, R., Aicha, A.B., and Ayed, Y.B., 2020. Acoustic features exploration and examination for voice spoofing counter measures with boosting machine learning techniques. Procedia Computer Science, 176, pp. 1073-1082. DOI: https://doi.org/10.1016/j.procs.2020.09.103

Siami-Namini, S., Tavakoli, N., and Namin, A.S., 2019. The Performance of LSTM and BiLSTM in Forecasting Time Series. In: Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019, pp.3285-3292. DOI: https://doi.org/10.1109/BigData47090.2019.9005997

Tian, X., Xiao, X., Chng, E.S., and Li, H., 2017. Spoofing Speech Detection using Temporal Convolutional Neural Network. In: 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016. DOI: https://doi.org/10.1109/APSIPA.2016.7820738

Todisco, M., Delgado, H., and Evans, N., 2016. A New Feature for Automatic Speaker Verification Anti-Spoofing: Constant Q Cepstral Coefficients. In: Odyssey 2016: Speaker and Language Recognition Workshop, pp.283-290. DOI: https://doi.org/10.21437/Odyssey.2016-41

Todisco, M., Delgado, H., and Evans, N., 2017. Constant Q cepstral coefficients: Aspoofing countermeasure for automatic speaker verification. Computer Speech and Language, 45, pp. 516-535. DOI: https://doi.org/10.1016/j.csl.2017.01.001

Todisco, M., Wang, X., Vestman, V., Sahidullah, M., Delgado, H., Nautsch, A., Yamagishi, J., Evans, N., Kinnunen, T., and Aik Lee, K., 2019. ASVSpoof 2019: Future Horizons in Spoofed and Fake Audio Detection. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2019, pp.1008-1012. DOI: https://doi.org/10.21437/Interspeech.2019-2249

Wang, X., Yamagishi, J., Todisco, M., Delgado, H., Nautsch, A., Evans, N., Sahidullah, M., Vestman, V., Kinnunen, T., Lee, K.A., Juvela, L., Alku, P., Peng, Y.H., Hwang, H.T., &... Ling, Z.H., 2020. ASVspoof 2019: Alarge-scale public database of synthetized, converted and replayed speech. Computer Speech and Language, 64, 101114. DOI: https://doi.org/10.1016/j.csl.2020.101114

Wei, C., Pang, R., and Kuo, C.C.J., 2024. AGreen Learning Approach to Spoofed Speech Detection. In: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.12956-12960. DOI: https://doi.org/10.1109/ICASSP48485.2024.10448336

Wu, Z., Kinnunen, T., Evans, N., Yamagishi, J., Hanilci, C., Sahidullah, M., and Sizov, A., 2015. ASVspoof 2015: The First Automatic Speaker Verification Spoofing and Countermeasures Challenge. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp.2037-2041. DOI: https://doi.org/10.21437/Interspeech.2015-462

Wu, Z., Yamagishi, J., Kinnunen, T., Hanilçi, C., Sahidullah, M., Sizov, A., Evans, N., Todisco, M., and Delgado, H., 2017. ASVspoof: The automatic speaker verification spoofing and countermeasures challenge. IEEE Journal on Selected Topics in Signal Processing, 11(4), pp. 588-604. DOI: https://doi.org/10.1109/JSTSP.2017.2671435

Yang, J., Das, R.K., and Li, H., 2020. Significance of subband features for synthetic speech detection. IEEE Transactions on Information Forensics and Security, 15(c), pp. 2160-2170. DOI: https://doi.org/10.1109/TIFS.2019.2956589

Zhou, J., Hai, T., Jawawi, D.N.A., Wang, D., Ibeke, E., and Biamba, C., 2022. Voice spoofing countermeasure for voice replay attacks using deep learning. Journal of Cloud Computing, 11(1), 51. DOI: https://doi.org/10.1186/s13677-022-00306-5

Published
2024-09-12
How to Cite
Mirza, A. R. and Al-Talabani, A. K. (2024) “Time Series-Based Spoof Speech Detection Using Long Short-Term Memory and Bidirectional Long Short-Term Memory”, ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY, 12(2), pp. 119-129. doi: 10.14500/aro.11636.