How significant is the "order" of libraries loaded in R?
28 Dec 2016I was working on a binary Classification challenge for which I had to compute the Performance metrics for all the Predictive models. Using XGBoost, H2O, GBM, and MLR packages, I developed 5 models for which AUC (ROCR) has to be computed.
Following is the order in which the libraries were loaded in the script:
library(caret)
library(ROCR)
library(scales)
library(mlr)
For one of the models (GLMNet), I used the below code to predict Target feature:
glmNetPred <- predict(glmNetModel$glmnet.fit, ...)
After prediction, I ran the below code to compute ROCR prediction, and it got executed successfully:
ROCRpred <- prediction(glmNetPred, testSetActual)
But when I executed the below code to compute Area Under Curve (AUC),
AUC <- as.numeric(performance(ROCRpred, "auc")@y.values)
it gave me the following error:
Error in performance(ROCRpred, “auc”) :
Assertion on ‘pred’ failed: Must have class ‘Prediction’, but has class ‘prediction’.
Now what? I searched for help in Net for any solution but did not find any. When I did more analysis, I found that the ROCRpred
object was created using ROCR package’s prediction
function, and supplied to performance
function of mlr
package. But the mlr
package expects the object to be of type Prediction
.
A careful scan on the logs, when the packages were loaded, also proved the same:
> library(mlr)
Attaching package: ‘mlr’
The following object is masked from 'package:ROCR':
performance
What it means is that both mlr
and ROCR
packages contain performance
function which is identical but have different signatures. The performance
function in mlr
package expects the parameter to be of type Prediction
whereas the same function in ROCR
package expects it to be of type prediction
and hence is the error!!
There are two ways to solve this issue:
a. Supply package name explicitly while calling the function. With this approach the package name has to be specified in each and every model files and it may reduce the code readability. Further, there are chances of missing it out in some places leading to undesired results.
ROCRpred <- ROCR::prediction(glmNetPred, testSetActual)
AUC <- as.numeric(ROCR::performance(ROCRpred, "auc")@y.values)
b. Load the packages in particular order. In my case the order should have been
library(caret)
library(mlr)
library(scales)
library(ROCR)
Since I sourced the libraries at only one place in my script for all the models, the second approach was easier for me, with which I need not specify the package name all over.
So, is it really important to load the libraries in particular order in R? Though it seems to be not in some cases, Yes, they are important in other cases!