Student course dropout prediction

The project was to design an artificial intelligence model, that would predict university course students’ dropout rate in percentages.
We delivered this project for a university based in Sweden. This university had noticed that a large number of students did not complete their studies. We wanted to design a solution that would help to monitor and finally help to reverse this trend.

The amount of time we spent on the project
Our brilliant, highly experienced team with excellent knowledge of machine learning, managed to build the project in only 6 weeks.
Finally, the model gained 92% of accuracy in course dropout predictions
Machine learning techniques

The choice of a learning algorithm is one of the most important stages of machine Learning.
We decided to employ the Random Forest algorithm in the project. It is a kind of algorithm that performs very intensive calculations. In other words it is a learning method for classification and regression that operates by constructing a multitude of decision trees based on a random subset of data.

Random Forest Algorithm proved to be the most efficient algorithm as compared to other algorithms due to its’ suitability for big data and insensitivity to data intermittence. This algorithm consisted of
0 random trees
Database Size- Number of observations
Initial number of observations
Number of observations obtained as a result of database transformation
Languages and frameworks used in the project

As a result, a machine learning model was build, that is capable of predicting the risk of a student dropout through analysing the given student activity on a learning platform.

The most difficult task during the engagement with data was the complete comprehension and creation of the full problem definition (that is understanding which data inform us about students’ success and which indicate students’ failure). Another challenge was to prepare a dataset, because the enormous dataset we had been given by the client needed cleaning and supplementation.

We decided to take into consideration not only the total student activity on a learning portal at the moment of course completion but also monitored the student’s activities and achievements during a shorter course intervals.


Our team put real effort into getting the data core, and understanding the problem which was for us to solve.

A comprehensive research problem definition was established. The cleaned and supplemented dataset was thoroughly analysed. On that grounds, we chose the most adequate features and the best model, which was later tested and optimised.

The challenge turned out to be a great success. We managed to build the algorithm, which is able to predict a risk of student course dropout with more than 90 per cent probability.


Would you like to benefit from Machine Learning to gain competitive advantage over rivals?