Real-time speech-to-text translation in Spanish secondary classrooms: a mixed-methods study on refugee student inclusion

Ricardo Scott; Clara Vila; Daniel Pérez-Alcaraz; Olga Vaello; José Manuel Pérez-Torres; Ricardo  Ibanco-Cañete; Jorge Brotons-Mas; Cristina de-la-Peña; María José Álvarez-Alonso; Teresa Pozo-Rico

doi:10.25304/rlt.v33.3418

Ricardo Scott Department of Developmental Psychology and Didactics, University of Alicante, Alicante, Spain
Clara Vila Secondary Education Teachers of Public Schools of the Valencian Community, Alicante, Spain
Daniel Pérez-Alcaraz Secondary Education Teachers of Public Schools of the Valencian Community, Alicante, Spain
Olga Vaello Secondary Education Teachers of Public Schools of the Valencian Community, Alicante, Spain
José Manuel Pérez-Torres Secondary Education Teachers of Public Schools of the Valencian Community, Alicante, Spain
Ricardo Ibanco-Cañete Department of Developmental Psychology and Didactics, University of Alicante, Alicante, Spain
Jorge Brotons-Mas Institute of Neurosciences UMH-CSIC, Alicante, Spain / Cardenal Herrera Oria University, Elche, Spain
Cristina de-la-Peña International University of La Rioja, Madrid, Spain
María José Álvarez-Alonso Alfonso X El Sabio University, Madrid, Spain
Teresa Pozo-Rico Department of Developmental Psychology and Didactics, University of Alicante, Alicante, Spain https://orcid.org/0000-0002-5849-4600

DOI: https://doi.org/10.25304/rlt.v33.3418

Keywords: AI transcription processing, refugee education, language barriers, educational inclusion

Abstract

Following the 2022 invasion of Ukraine, thousands of Ukrainian children enrolled in schools across Europe. In Spain, most lacked prior knowledge of Spanish. This study examines whether real-time speech-to-text translation technology (STTT) can reduce classroom language barriers. Two activities – a fable reading and a neuroscience lecture – were conducted with 12–15-year-old Spanish-speaking students (n = 23) and Ukrainian students unfamiliar with Spanish but bilingual in Ukrainian and Russian. Using PowerPoint 365, the teacher’s speech was transcribed and translated into Russian – which at the time was far more reliably supported by automatic translation tools than Ukrainian – and projected onto a shared classroom display. Although this choice was based on technical and pedagogical criteria, it later drew some resistance, reflecting the sociopolitical sensitivities surrounding language use in wartime contexts. Comprehension was assessed using content-specific questionnaires. Ukrainian students scored lower than their Spanish peers but significantly higher than a control group (n = 22; p < 0.001; Cliff’s delta indicated large effect sizes). Qualitative analysis of teacher interviews highlighted improvements in comprehension and inclusion, along with implementation challenges. Taken together, these findings indicate that STTT has the potential to support newly arrived refugee students and help address multilingual education challenges.

Downloads

Download data is not yet available.

References

AbuJarour, S. (2022). Integration through education: Using ICT in education to promote the social inclusion of refugees in Germany. Journal of Information Systems Education, 33(1), 51–60. Retrieved from https://aisel.aisnet.org/jise/vol33/iss1/7

Agrawal, A. et al. (2023, October 9–13). All translation tools are not equal: Investigating the quality of language translation for forced migration. In 2023 IEEE 10th International Conference on Data Science and Advanced Analytics (DSAA) (pp. 1–10). IEEE. https://doi.org/10.1109/DSAA60987.2023.10302481

Alvarez-Alonso, M. J. et al. (2021). Boys-specific text-comprehension enhancement with dual visual-auditory text presentation among 12–14 years-old students. Frontiers Psychology, 12, 574685. https://doi.org/10.3389/fpsyg.2021.574685

Berner, K., & Alves, A. N. (2023). A scoping review of literature using speech recognition technologies by individuals with disabilities in multiple contexts. Disability and Rehabilitation: Assistive Technology, 18(7), 1139–1145. https://doi.org/10.1080/17483107.2021.1986583

Bernstein, J. et al. (1990). Automatic evaluation and training in English pronunciation. ICSLP, 90, 1185–1188. https://doi.org/10.21437/ICSLP.1990-313

Besters-Dilger, J. (2023). Language policy in Ukraine-overview and analysis. Ukrainian Analytical Digest, 1, 2–6. https://doi.org/10.3929/ethz-b-000623475

Chen, K. T. C. (2022). Speech-to-text recognition in University English as a Foreign Language Learning. Education and Information Technologies, 27(7), 9857–9875. https://doi.org/10.1007/s10639-022-11016-5

Clark, D. (2020). Tech and me: An autoethnographic account of digital literacy as an identity performance. Research in Learning Technology, 28, 2389. https://doi.org/10.25304/rlt.v28.2389

Creswell, J. W. (2015). Educational Research: Planning, Conducting, and Evaluating Quantitative and Qualitative Research. Pearson.

Dai, Y., & Wu, Z. (2023). Mobile-assisted pronunciation learning with feedback from peers and/or automatic speech recognition: A mixed-methods study. Computer Assisted Language Learning, 36(5–6), 861–884. https://doi.org/10.1080/09588221.2021.1952272

Dew, K. N. et al. (2018). Development of machine translation technology for assisting health communication: A systematic review. Journal of Biomedical Informatics, 85, 56–67. https://doi.org/10.1016/j.jbi.2018.07.018

Ehsani, F., & Knodt, E. (1998). Speech technology in computer-aided language learning: Strengths and limitations of a new CALL paradigm. Language Learning & Technology, 2(1), 54–73. https://doi.org/10.64152/10125/25032

Furui, S. et al. (2004). Speech-to-text and speech-to-speech summarization of spontaneous speech. IEEE Transactions on Speech and Audio Processing, 12(4), 401–408. https://doi.org/10.1109/TSA.2004.828699

Gao, J. et al. (2022). Review of the application of intelligent speech technology in education. Journal of China Computer-Assisted Language Learning, 2(1), 165–178. https://doi.org/10.1515/jccall-2022-0004

Gernsbacher, M. A. (2015). Video captions benefit everyone. Policy Insights from the Behavioral and Brain Sciences, 2(1), 195–202. https://doi.org/10.1177/2372732215602130

Huang, Y. M., Shadiev, R., & Hwang, W. Y. (2016). Investigating the effectiveness of speech-to-text recognition applications on learning performance and cognitive load. Computers & Education, 101, 15–28. https://doi.org/10.1016/j.compedu.2016.05.011

Instituto Nacional de Estadística (INE). (2024). Flujos de estudiantes ucranianos por meses. Retrieved October 3, 2025, from https://public.tableau.com/app/profile/instituto.nacional.de.estad.stica/viz/FlujosUcranianosMeses/Dashboard4

Ivanova, O. (2013). Bilingualism in Ukraine: Defining attitudes to Ukrainian and Russian through geographical and generational variations in language practices. Sociolinguistic Studies, 7(3), 249–272. https://doi.org/10.1558/sols.v7i3.249

Jeon, J., Lee, S., & Choi, S. (2023). A systematic review of research on speech-recognition chatbots for language learning: Implications for future directions in the era of large language models. Interactive Learning Environments, 32(8), 4613–4631. https://doi.org/10.1080/10494820.2023.2204343

Kapel Lev-ari, R., Aloni, R., & Ben-ari, A. (2024). Understanding the dyadic mental health of refugee parents and children after fleeing the 2022 Ukraine war. Psychological Trauma: Theory, Research, Practice, and Policy. https://doi.org/10.1037/tra0001715

Klatt, D. H. (1987). Review of text-to-speech conversion for English. The Journal of the Acoustical Society of America, 82(3), 737–793. https://doi.org/10.1121/1.395275

Li, J. (2022). Recent advances in end-to-end automatic speech recognition. APSIPA Transactions on Signal and Information Processing, 11(1), e8. https://doi.org/10.1561/116.00000050

Matthew, G. (2020). The effect of adding same-language subtitles to recorded lectures for non-native, English speakers in e-learning environments. Research in Learning Technology, 28, 2340. https://doi.org/10.25304/rlt.v28.2340

Matre, M. E., & Cameron, D. L. (2022). A scoping review on the use of speech-to-text technology for adolescents with learning difficulties in secondary education. Disability and Rehabilitation: Assistive Technology, 19(3), 1103–1116. https://doi.org/10.1080/17483107.2022.2149865

McKechnie, J. et al. (2018). Automated speech analysis tools for children’s speech production: A systematic literature review. International Journal of Speech-Language Pathology, 20(6), 583–598. https://doi.org/10.1080/17549507.2018.1477991

McTear, M. F. (2002). Spoken dialogue technology: Enabling the conversational user interface. ACM Computing Surveys (CSUR), 34(1), 90–169. https://doi.org/10.1145/505282.505285

Mehrish, A. et al. (2023). A review of deep learning techniques for speech processing. Information Fusion, 99(19), 101869. https://doi.org/10.1016/j.inffus.2023.101869

Morgan, D. L. (2023). Exploring the use of artificial intelligence for qualitative data analysis: The case of ChatGPT. International Journal of Qualitative Methods, 22, 1–10. https://doi.org/10.1177/16094069231211248

Nickolai, D., Schaefer, E., & Figueroa, P. (2024). Aggregating the evidence of automatic speech recognition research claims in CALL. System, 121, 103250. https://doi.org/10.1016/j.system.2024.103250

OpenAI. (2024). ChatGPT [Large language model]. Retrieved from https://chat.openai.com/

Osokina, O. et al. (2023). Impact of the Russian invasion on mental health of adolescents in Ukraine. Journal of the American Academy of Child & Adolescent Psychiatry, 62(3), 335–343. https://doi.org/10.1016/j.jaac.2022.07.845

Reddy, V. M., Vaishnavi, T., & Kumar, K. P. (2023, July 19–21). Speech-to-text and text-to-speech recognition using deep learning. In 2023 2nd International Conference on Edge Computing and Applications (ICECAA) (pp. 657–666). IEEE.

Save the Children. (2023). Back to School 2023–2024: Report on Education for Children Displaced by the Conflict in Ukraine at the Start of the Second School Year. Save the Children.

Scott, R. et al. (2022). Transcripción simultánea de voz a texto en el aula como medio de inclusión lingüística. En: Satorre Cuerda, R. (Ed.). El Profesorado, Eje Fundamental de la Transformación de la Docencia Universitaria. Octaedro, 416–427. ISBN 978-84-19506-52-8

Sedgwick, P., & Greenwood, N. (2016). Understanding the Hawthorne effect. BMJ, 2015, 351. https://doi.org/10.1136/bmj.h4672

Sethiya, N., & Maurya, C. K. (2025). End-to-end speech-to-text translation: A survey. Computer Speech & Language, 90, 101751. https://doi.org/10.1016/j.csl.2024.101751

Shadiev, R., Chen, X., & Altinay, F. (2024). A review of research on computer-aided translation technologies and their applications to assist learning and instruction. Journal of Computer Assisted Learning, 40(6), 3290–3323. https://doi.org/10.1111/jcal.13072

Shadiev, R., Chien, Y.-C., & Huang, Y.-M. (2020). Enhancing comprehension of lecture content in a foreign language as the medium of instruction: Comparing speech-to-text recognition with speech-enabled language translation. SAGE Open, 10(3), 215824402095317. https://doi.org/10.1177/2158244020953177

Shadiev, R., Huang, Y. M., & Hwang, J. P. (2017a). Investigating the effectiveness of speech-to-text recognition applications on learning performance, attention, and meditation. Educational Technology Research and Development, 65, 1239–1261. https://doi.org/10.1007/s11423-017-9516-3

Shadiev, R. et al. (2017b, August 1–4). Are STR & CAT-generated texts useful for comprehension of lecturing content in a foreign language? In 2017 10th International Conference on Ubi-Media Computing and Workshops (Ubi-Media) (pp. 1–6). IEEE. https://doi.org/10.1109/umedia.2017.8074121

Shadiev, R. et al. (2014). Review of speech-to-text recognition technology for enhancing learning. Journal of Educational Technology & Society, 17(4), 65–84.

Shadiev, R., & Liu, J. (2023). Review of research on applications of speech recognition technology to assist language learning. ReCALL. 35(1), 74–88. https://doi.org/10.1017/S095834402200012X

Shadiev, R., & Sun, A. (2019). Using texts generated by STR and CAT to facilitate student comprehension of lecture content in a foreign language. Journal of Computing in Higher Education, 32(3), 561–581. https://doi.org/10.1007/s12528-019-09246-7

Shew, A. (2020). Ableism, technoableism, and future AI. IEEE Technology and Society Magazine, 39(1), 40–85. https://doi.org/10.1109/MTS.2020.2967492

Smotrova, T. (2009). Globalization and English language teaching in Ukraine. TESOL Quarterly, 43(4), 727–732. https://doi.org/10.1002/j.1545-7249.2009.tb00200.x

Sun, W. (2023). The impact of automatic speech recognition technology on second language pronunciation and speaking skills of EFL learners: A mixed methods investigation. Frontiers in Psychology, 14, 1210187. https://doi.org/10.3389/fpsyg.2023.1210187

The World Bank. (2022). Displaced Education in Ukraine: Impact and Responses. The World Bank.

Ulum, Ö. G. (2025). Refugee voices unheard: Bridging the communication divide between Turkish police and refugees. Journal of Immigrant & Refugee Studies, 22, 1–22. https://doi.org/10.1080/15562948.2025.2529482

Warschauer, M., & Healey, D. (1998). Computers and language learning: An overview. Language Teaching, 31(2), 57–71. https://doi.org/10.1017/S0261444800012970