Factors Influencing the Academic Performance of Freshmen: Insights from a Data-driven Analysis

Thao-Trang Huynh-Cam 1,   , Long Sheng Chen2, , Khai Vinh Huynh3
1 Foreign Languages and Informatics Center, Dong Thap University, Vietnam
2 Department of Industrial Engineering and Management, National Taipei University of Technology, Taiwan
3 Office of Quality Assurance, Dong Thap University, Vietnam

Main Article Content

Abstract

Student academic performance has been continuously examined by theorists and practitioners worldwide. Yet, the academic performance of freshmen as newcomers who often face challenges and need assistance has gained little attention from theorists and practitioners. This pilot study uncovers key factors influencing the academic performance of freshmen at a technical and vocational university in Taiwan. Using predictive analytics, this study explored how data-driven methods can enhance students' performance. The analysis utilized the recursive feature elimination (RFE) method for feature selection using two AI-driven models of a Neural Network (NN) and a Decision Tree (DT). The research sample included 1,928 freshmen from a technical and vocational university in Taiwan. The input factor dimensions comprised demographic, socio-economic, and family background variables. The output factor was the grade point average (GPA) for the first term of the academic year 2020/2021. The results showed that among the tested models, the DT surpassed the NN with an accuracy, precision, recall, and F1 of 86.0% (D1) and of approximately 90.0% (D2); and an ROC-AUC of 86% (D1) and 87% (D2). The three factors—students' fathers' careers, major, and the average monthly income of students' parents—had the highest positive and significant impact on freshmen’s academic performance. This work provided AI-driven models serving as an early warning system and identify the strongest predictors of academic performance among freshmen. It is expected to assist educational policymakers in developing proactive measures to increase the number of excellent students and to reduce the number of underperforming/at-risk students at early stages.

Article Details

References

Alhazmi, E., & Sheneamer, A. (2023). Early predicting of students performance in higher education. Ieee Access, 11, 27579-27589.
Azmitia, M., Sumabat‐Estrada, G., Cheong, Y., & Covarrubias, R. (2018). “Dropping out is not an option”: How educationally resilient first‐generation students see the future. New directions for child and adolescent development, 2018(160), 89-100.
Arcinas, M. M., Sajja, G. S., Asif, S., Gour, S., Okoronkwo, E., & Naved, M. (2021). Role of data mining in education for improving students performance for social change. Turkish Journal of Physiotherapy and Rehabilitation, 32(3), 6519-6526.
Ahmad, A., Ray, S., Tabrej Khan, M., & Nawaz, A. (2025). Student performance prediction with decision tree ensembles and feature selection techniques. Journal of Information & Knowledge Management, 24(02), 2550016.
Albreiki, B., Zaki, N., & Alashwal, H. (2021). A systematic literature review of student’ performance prediction using machine learning techniques. Education Sciences, 11(9), 552.
Browne, M. W. (2000). Cross-validation methods. Journal of mathematical psychology, 44(1), 108-132.
Bai, S., Hew, K. F., Sailer, M., & Jia, C. (2021). From top to bottom: How positions on different types of leaderboard may affect fully online student learning performance, intrinsic motivation, and course engagement. Computers & Education, 173, 104297.
Baashar, Y., Alkawsi, G., Ali, N. A., Alhussian, H., & Bahbouh, H. T. (2021, July). Predicting student’s performance using machine learning methods: A systematic literature review. In 2021 International Conference on Computer & Information Sciences (ICCOINS) (pp. 357-362). IEEE.
Brooker, A., Brooker, S., & Lawrence, J. (2017). First year students' perceptions of their difficulties. Student Success, 8(1), 49-62.
Cameron, R. B., & Rideout, C. A. (2022). ‘It’s been a challenge finding new ways to learn’: first-year students’ perceptions of adapting to learning in a university environment. Studies in Higher Education, 47(3), 668-682.
Chang, J. R., Chen, L. S., & Lin, L. W. (2021). A novel cluster based over-sampling approach for classifying imbalanced sentiment data. IAENG International Journal of Computer Science, 48(4), 1118-1128.
Çırak, C. R., Akıllı, H., & Ekinci, Y. (2024). Development of an early warning system for higher education institutions by predicting first‐year student academic performance. Higher Education Quarterly, 78(4), e12539.
Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & electrical engineering, 40(1), 16-28.
Chalapati, S., Leung, R., & Chalapati, N. (2018). Exploring factors affecting first-year students' learning experiences: A case study of a private university in Taiwan. Student Success, 9(4), 25-39.
Huynh-Cam, T. T., Chen, L. S., & Huynh, K. V. (2022). Learning performance of international students and students with disabilities: Early prediction and feature selection through educational data mining. Big Data and Cognitive Computing, 6(3), 94.
Huynh-Cam, T. T., Chen, L. S., Nguyen, V. C., Nguyen, T. H., & Lu, T. C. (2024). Why first-year e-students are dissatisfied: Machine learning methods for enhancing retention. International Journal of Applied Sciences Engineering, 21, 2023532.
Kaunang, F. J., & Rotikan, R. (2018, October). Students' academic performance prediction using data mining. In 2018 third international conference on informatics and computing (icic) (pp. 1-5). IEEE.
Kassaw, C., & Demareva, V. (2024, October). Predictor of low academic achievement among Dilla university students, southern Ethiopia, 2024. In Frontiers in Education (Vol. 9, p. 1438322). Frontiers Media SA.
Lau, E. T., Sun, L., & Yang, Q. (2019). Modelling, prediction and classification of student academic performance using artificial neural networks. SN Applied Sciences, 1(9), 982.
López-Zambrano, J., Torralbo, J. A. L., & Romero, C. (2021). Early prediction of student learning performance through data mining: A systematic review. Psicothema, 33(3), 456.
Martins, M. V., Tolledo, D., Machado, J., Baptista, L. M., & Realinho, V. (2021, March). Early prediction of student’s performance in higher education: A case study. In World Conference on Information Systems and Technologies (pp. 166-175). Cham: Springer International Publishing.
Mengash, H. A. (2020). Using data mining techniques to predict student performance to support decision making in university admission systems. Ieee Access, 8, 55462-55470.
Meehan, C., & Howells, K. (2018). ‘What really matters to freshers?’: evaluation of first year student experience of transition into university. Journal of Further and Higher Education, 42(7), 893-907.
Mulaudzi, I. C. (2023). Challenges faced by first-year university students: Navigating the transition to higher education. Journal of Education and Human Development, 12(2), 79-87.
Musso, M. F., Hernández, C. F. R., & Cascallar, E. C. (2020). Predicting key educational outcomes in academic trajectories: a machine-learning approach. Higher education, 80(5), 875-894.
Moreno-Marcos, P. M., Pong, T. C., Munoz-Merino, P. J., & Kloos, C. D. (2020). Analysis of the factors influencing learners’ performance prediction with learning analytics. IEEE Access, 8, 5264-5282.
Matzavela, V., & Alepis, E. (2021). Decision tree learning through a predictive model for student academic performance in intelligent m-learning environments. Computers and Education: Artificial Intelligence, 2, 100035.
Priyatno, A. M., & Widiyaningtyas, T. (2024). A systematic literature review: recursive feature elimination algorithms. JITK (Jurnal Ilmu Pengetahuan dan Teknologi Komputer), 9(2), 196-207.
Rodríguez, M. S., Tinajero, C., & Páramo, M. F. (2017). Pre-entry characteristics, perceived social support, adjustment and academic achievement in first-year Spanish university students: A path model. The Journal of psychology, 151(8), 722-738.
Romero, C., & Ventura, S. (2007). Educational data mining: A survey from 1995 to 2005. Expert systems with applications, 33(1), 135-146.
Sarra, A., Fontanella, L., & Di Zio, S. (2019). Identifying students at risk of academic failure within the educational data mining framework. Social Indicators Research, 146(1), 41-60.
Shapoval, V., Wang, M. C., Hara, T., & Shioya, H. (2018). Data mining in tourism data analysis: inbound visitors to Japan. Journal of Travel Research, 57(3), 310-323.
Southall, N. T., Natarajan, M., Lau, L. P. L., Jonker, A. H., Deprez, B., Guilliams, T., ... & Ardigò, D. (2019). The use or generation of biomedical data and existing medicines to discover and establish new treatments for patients with rare diseases–recommendations of the irdirc data mining and repurposing task force. Orphanet Journal of Rare Diseases, 14(1), 225.
Shu, X., & Ye, Y. (2023). Knowledge Discovery: Methods from data mining and machine learning. Social Science Research, 110, 102817.
Zaťková, T. Š., Seberini, A., & Tokovska, M. (2025). First year university life: Expectations versus reality. International Journal of Instruction, 18(3), 335-352.