Telstra - Prediction of Severity of Faults on Telecommunication Network02 Mar 2016
Telstra is Australia’s largest telecommunications, media, and network technology company, offering a full range of communications services. They conducted their first-ever “recruitment” competition in Kaggle to predict the severity of service disruptions on their network - whether it is a momentary glitch or a total interruption of connectivity. The algorithm and model developed for this competition will help Telstra to predict the severity of service disruptions and provide better service to its customers.
The competition ran from 25-Nov-2015 to 29-Feb-2016 and there were 974 individuals who participated across the globe.
The dataset was comprised of multiple files each containing different features extracted from Log files, collected from various locations at different times. The target feature, Fault severity with 3 categories (0: No Fault, 1: Few Faults, and 2: Many Faults) was the one that has to be predicted from the given datasets. Again, the prediction should be the probability of each severity type (multi-class) for the given test dataset.
In this competition, Multi-class logarithmic loss was used to evaluate the performance of predicted multi-class probabilities.
I generated the following set of models to submit them individually and then to ensemble:
- Generalized Boosted Regression Model (GBM) with Out-of-Bag (OOB) estimator,
repeatedcv, and 5 separate 10-fold cross-validations
- Random Forest Model with Out-of-Bag (OOB) estimator,
repeatedcv, and 8 separate 10-fold cross-validations
- Stacking (Meta-Ensembling) of Random Forest and GBM Models
After creating the above models, I took the arithmetic mean of all the predictions. Then I ensembled by applying different weightage to each model’s predictions for submission.
Refer below for first few rows of submission file: