Nicolas Hubert
Title of the thesis
Data Mining, Machine Learning and Recommender System for Education : towards a support system for academic guidance in higher education
Abstract
The work carried out during this PhD thesis will contribute to the AILES project (Accompagnement à l’Intégration des Lycéens dans l’Enseignement Supérieur) of the National French Research, Action “Innovation and territories”.
The aim of the thesis project is to design a decision-support system based on a set of data, representing traces of the past, and coming from multiple sources and of a heterogeneous nature. They are, for example, knowledge data in the field of guidance from experts, but also guidance trace data representing previous years, profile data of high school students, etc. The final goal is to recommend academic tracks to high school and university students.
Several locks will be studied during the thesis, either locks related to algorithms, or locks related to the application context. The first scientific challenge will be related to the identification of influencing factors in decision making, whether they are factors currently used by high school students, or factors that can be automatically identified in the data that will be searched.
The second challenge, which will be central to the thesis project, consists in designing a decision support system on data coming from multiple, heterogeneous sources whose reliability can be questioned. The problem of lack of data will be present throughout the thesis. Indeed, not only will it be impossible to have enough data for each “situation”, but also all possible situations will not appear in the data. Mechanisms of inference on missing data will have to be designed.
The decision-support system will have to be designed for a dynamic environment : new student profiles, new training (or disappearance of training), constraints for some students, etc. The system will have to be designed for a dynamic environment. Finally, the question of the nature of the single or multiple recommendation, sequential or not, the need for explainable algorithms, etc. will have to be studied.