How significant is the "order" of libraries loaded in R?28 Dec 2016
I was working on a binary Classification challenge for which I had to compute the Performance metrics for all the Predictive models. Using XGBoost, H2O, GBM, and MLR packages, I developed 5 models for which AUC (ROCR) has to be computed.
Following is the order in which the libraries were loaded in the script:
library(caret) library(ROCR) library(scales) library(mlr)
For one of the models (GLMNet), I used the below code to predict Target feature:
glmNetPred <- predict(glmNetModel$glmnet.fit, ...)
After prediction, I ran the below code to compute ROCR prediction, and it got executed successfully:
ROCRpred <- prediction(glmNetPred, testSetActual)
But when I executed the below code to compute Area Under Curve (AUC),
AUC <- as.numeric(performance(ROCRpred, "auc")@y.values)
it gave me the following error:
Error in performance(ROCRpred, “auc”) :
Assertion on ‘pred’ failed: Must have class ‘Prediction’, but has class ‘prediction’.
Now what? I searched for help in Net for any solution but did not find any. When I did more analysis, I found that the
ROCRpred object was created using ROCR package’s
prediction function, and supplied to
performance function of
mlr package. But the
mlr package expects the object to be of type
A careful scan on the logs, when the packages were loaded, also proved the same:
> library(mlr) Attaching package: ‘mlr’ The following object is masked from 'package:ROCR': performance
What it means is that both
ROCR packages contain
performance function which is identical but have different signatures. The
performance function in
mlr package expects the parameter to be of type
Prediction whereas the same function in
ROCR package expects it to be of type
prediction and hence is the error!!
There are two ways to solve this issue:
a. Supply package name explicitly while calling the function. With this approach the package name has to be specified in each and every model files and it may reduce the code readability. Further, there are chances of missing it out in some places leading to undesired results.
ROCRpred <- ROCR::prediction(glmNetPred, testSetActual) AUC <- as.numeric(ROCR::performance(ROCRpred, "auc")@y.values)
b. Load the packages in particular order. In my case the order should have been
library(caret) library(mlr) library(scales) library(ROCR)
Since I sourced the libraries at only one place in my script for all the models, the second approach was easier for me, with which I need not specify the package name all over.
So, is it really important to load the libraries in particular order in R? Though it seems to be not in some cases, Yes, they are important in other cases!