Content Construction for an Intelligent Tutor System in languages: a pilot study on the OneStopEnglish corpus
Abstract
During foreign language acquisition, reading represents one of the opportunities to get closer to the language. However, inappropriate texts can cause students to have a negative experience; thus, in regular courses, teachers use their experience or an editorial team to select the readings. In an automatic system, as in an Intelligent Tutor System, making recommendations appropriate to the student's profile is a priority. It is not enough to know the language level of the text. This work uses tools to classify a sample of texts from the OneStopEnglish corpus according to the Common European Framework of Reference for Languages. We create thematic groups based on Latent Semantic Analysis (LSA) and use three popular metrics of readability as a guide to suggest texts to students.
References
Al-Thanyyan, S. S., & Azmi, A. M. (2021). “Automated Text Simplification: A survey”. ACM Computing Surveys, 54(2), 1–36.
Allen, L. K., Snow, E. L., & McNamara, D. S. (2015). Are you reading my mind? Modeling students’ reading comprehension skills with natural language processing techniques. ACM International Conference Proceeding Series, 16-20-Marc, 246–254. https://doi.org/10.1145/2723576.2723617
Bax, S. (2020). Text Inspector. https://textinspector.com/
Cambridge University Press. (2015). English Profile, The CEFR for English. https://www.englishprofile.org/wordlists/evp
Cárcamo Morales, B. (2020). “Readability and types of questions in Chilean EFL high school textbooks”. TESOL Journal, 11(2), 1–15.
Crossley, S., Allen, L. K., Snow, E. L., & McNamara, D. S. (2015). Pssst... textual features... there is more to automatic essay scoring than just you! Proceedings of the Fifth International Conference on Learning Analytics And Knowledge - LAK ’15, 203–207. https://doi.org/10.1145/2723576.2723595
Fiction Express Education. (2021). Fiction Express. https://en.fictionexpress.com
Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z. (2004). “Coh-Metrix : Analysis of text on cohesion and language”. Behavior Research Methods, Instruments, & Computers, 36(2), 193–202.
Graesser, A. C., McNamara, D. S., & Louwerse, M. M. (2017). Coh-Metrix. http://cohmetrix.com/
Instituto Cervantes (2002). “Marco Común Europeo de Referencia para las Lenguas: aprendizaje, enseñanza, evaluación”. Instituto Cervantes.
Landauer, T. K., McNamara, D. S., Dennis, S., & Kintsh, W. (2006). Handbook of Latent Semantic Analysis (Vol. 7, Issue 2). Routledge.
Li, H., Gobert, J., Dickler, R., & Morad, N. (2018). “Students’ Academic Language Use When Constructing Scientific Explanations in an Intelligent Tutoring System”. Conference on Artificial Intelligence in Education, 267–281. https://doi.org/10.1007/978-3-319-93843-1_20
Liu, Y. (2020). Assessing text readability and quality with language modelsAssessing text readability and quality with language models [Master Thesis]. University of Helsinki.
McCarthy, K. S., Watanabe, M., Dai, J., & McNamara, D. S. (2020). Personalized learning in iSTART: Past modifications and future design. Journal of Research on Technology in Education, 52(3), 301–321. https://doi.org/10.1080/15391523.2020.1716201
McNamara, D. S., Graesser, A. C., McCarthy, P. M., & Cai, Z. (2014). Coh-Metrix Measures of Text Readability and Easability. En Automated Evaluation of Text and Discourse with Coh-Metrix (pp. 78–95). Cambridge University Press.
Nahatame, S. (2020). Text readability and comprehension processes during L2 reading: A computational and eye-tracking investigation. Conference of the American Association for Applied Linguistics (AAAL).
Roadtogrammar (2021). Text Analyzer. http://www.roadtogrammar.com/textanalysis/
Tejada, M. Á. Z., Gallardo, C. N., Ferradá, M. C. M., & López, M. I. C. (2015). Building a Corpus of 2L English for Automatic Assessment: The CLEC Corpus. Procedia - Social and Behavioral Sciences, 198(Cilc), 515–525. https://doi.org/10.1016/j.sbspro.2015.07.474
Textcompare.org. (2021). Textcompare.org. https://www.textcompare.org/readability/
Vajjala, S., & Lucic, I. (2018). OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification. Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications, 297–304. https://www.aclweb.org/anthology/W18-0535/
Wilkens, R., Zilio, L., & Fairon, C. (2018). SW4ALL: a CEFR-Classified and Aligned Corpus for Language Learning. Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 365–370.
Xu, W., Callison-Burch, C., & Napoles, C. (2015). Problems in Current Text Simplification Research: New Data Can Help. Transactions of the Association for Computational Linguistics, 3, 283–297.
Zarobe, Y. R. De, & Zarobe, L. R. De (Eds.). (2019). La lectura en lengua extranjera. Ediciones Octaedro.
Zhang, R. (2016). A Coh-Metrix Analysis of Two Textbooks: Successful English for Vocational Colleges and Vocational College English (An Integrated Skills Course). US-China Foreign Language, 14(5), 351–356.
Zipf, G. K. (1949). Introduction and Orientation. Human behavior and the principle of least effort: an introduction to human ecology. Addison-Wesley Press.










