June 1, 2018
mkNER.R:
bigNER <- fread(bigFile, header=TRUE, sep=",") bigNER <- bigNER[!duplicated(bigNER) & bigNER$numFact==1,] samps <- sapply(nerSize, function(x) sample(1:n, x, replace = FALSE)) nerss <- map(samps, function(x) bigNER[x,]) for(i in 1:length(nerSize)){ trsName <- paste(paste("NERtrain",nerSize[i], sep = "-"),"csv",sep=".") write.csv(nerss[[i]],trsName, row.names = FALSE) }
For each of the 21 training sets, we'll build 9 models
naive Bayes, knn
Each model build using 10-fold cross validation and "Kappa" error metric
library(caret) library(doParallel) cl <- makeCluster(detectCores()) registerDoParallel(cl) tr1.rfs <- train(numFact~const+lin+quad+cube+nSign+pSign+sigReal, data=trs1, method="rf", metric = "Kappa", trControl = trainControl(method="cv", number = 10, allowParallel = TRUE)) tst$rfs <- predict(tr1.rfs, tst, type = "prob")[,2] stopCluster(cl)
Sample of Data Frame with 1134 confusion matrices!
mdl | TN | TP | FP | FN | ner | theta |
---|---|---|---|---|---|---|
lrs | 8000 | 0 | 0 | 16 | 2000 | 0.30 |
rfpp | 7908 | 10 | 92 | 6 | 2500 | 0.25 |
rfpp | 7877 | 8 | 123 | 8 | 2400 | 0.25 |
rfs | 7986 | 10 | 14 | 6 | 2000 | 0.50 |
knn | 7699 | 16 | 301 | 0 | 2500 | 0.20 |
rfsq | 7850 | 14 | 150 | 2 | 1400 | 0.25 |
TP | mdl | count | avg.FP | min.ner | max.ner | max.theta |
---|---|---|---|---|---|---|
16 | knn | 32 | 498.1250 | 500 | 2500 | 0.20 |
16 | rfs | 17 | 251.0000 | 500 | 1600 | 0.30 |
16 | rfsq | 7 | 378.5714 | 500 | 800 | 0.25 |
16 | rfp | 6 | 334.8333 | 500 | 800 | 0.30 |
15 | rfs | 63 | 145.6825 | 500 | 2500 | 0.50 |
15 | rfp | 59 | 188.0169 | 600 | 2400 | 0.40 |
15 | rfsq | 37 | 199.2432 | 600 | 2500 | 0.30 |
15 | knn | 36 | 320.3056 | 500 | 1900 | 0.40 |
mdl | TN | TP | FP | FN | ner | theta |
---|---|---|---|---|---|---|
rfs | 7830 | 16 | 170 | 0 | 1200 | 0.25 |
rfs | 7816 | 16 | 184 | 0 | 600 | 0.30 |
rfs | 7812 | 16 | 188 | 0 | 800 | 0.25 |
rfs | 7799 | 16 | 201 | 0 | 1200 | 0.20 |
rfs | 7790 | 16 | 210 | 0 | 1100 | 0.15 |
rfs | 7781 | 16 | 219 | 0 | 500 | 0.30 |
rfs | 7778 | 16 | 222 | 0 | 1600 | 0.15 |
variable | rfs6 | rfs8 | rfs12 |
---|---|---|---|
const | 100.00 | 100.00 | 98.74 |
lin | 78.07 | 84.51 | 100.00 |
quad | 42.50 | 43.38 | 64.10 |
cube | 68.57 | 75.10 | 99.82 |
sigReal | 40.78 | 33.22 | 34.99 |
mdl | TN | TP | FP | FN | ner | theta |
---|---|---|---|---|---|---|
knn | 7699 | 16 | 301 | 0 | 2500 | 0.15 |
knn | 7699 | 16 | 301 | 0 | 2500 | 0.20 |
knn | 7682 | 16 | 318 | 0 | 2100 | 0.20 |
knn | 7681 | 16 | 319 | 0 | 2100 | 0.15 |
knn | 7667 | 16 | 333 | 0 | 2300 | 0.15 |
knn | 7667 | 16 | 333 | 0 | 2300 | 0.20 |
knn | 7664 | 16 | 336 | 0 | 2000 | 0.20 |
variable | knn21 | knn23 | knn25 |
---|---|---|---|
const | 0.00 | 0.00 | 0.00 |
lin | 16.19 | 16.19 | 15.94 |
quad | 12.49 | 3.37 | 2.86 |
cube | 12.46 | 9.49 | 8.59 |
nSign | 28.35 | 25.41 | 25.92 |
pSign | 28.42 | 25.17 | 25.78 |
sigReal | 100.00 | 100.00 | 100.00 |
poly | knn.mean | rfs.mean | rfp.mean | n | min.ner | max.ner |
---|---|---|---|---|---|---|
-12,-49,94,96 | 0.400 | 0.319 | 0.141 | 2 | 2300 | 2500 |
-14,-57,110,112 | 0.200 | 0.458 | 0.156 | 1 | 2500 | 2500 |
12,-49,-94,96 | 0.600 | 0.510 | 0.236 | 1 | 2300 | 2300 |
2,29,105,112 | 0.200 | 0.524 | 0.184 | 1 | 2300 | 2300 |
4,44,140,140 | 0.381 | 0.149 | 0.069 | 6 | 600 | 2500 |
The following are missed by KNN
poly | knn.mean | rfs.mean | rfp.mean | n | min.ner | max.ner |
---|---|---|---|---|---|---|
-1,15,-56,63 | 0.20 | 0.328 | 0.492 | 1 | 2500 | 2500 |
-14,-57,110,112 | 0.20 | 0.391 | 0.281 | 3 | 2100 | 2500 |
2,29,105,112 | 0.19 | 0.631 | 0.471 | 6 | 600 | 2500 |
poly | pSign | nSign | disc | sigReal | ht | L2ht | numFact |
---|---|---|---|---|---|---|---|
-14,-57,110,112 | 2 | 1 | 307396996 | 3 | 112 | 167 | 2 |
-12,-49,94,96 | 2 | 1 | 165938692 | 3 | 96 | 143 | 2 |
-1,15,-56,63 | 2 | 1 | -1967 | 1 | 63 | 85 | 2 |
2,29,105,112 | 0 | 3 | 7441 | 3 | 112 | 156 | 2 |
4,44,140,140 | 0 | 3 | -35840 | 1 | 140 | 202 | 2 |
12,-49,-94,96 | 2 | 1 | 165938692 | 3 | 96 | 143 | 2 |
poly | pSign | nSign | disc | sigReal | ht | L2ht | numFact | knn.mean | rfs.mean | n | min.ner | max.ner |
---|---|---|---|---|---|---|---|---|---|---|---|---|
-4,-9,7,4 | 2 | 1 | 32353 | 3 | 9 | 12 | 2 | 0.933 | 0.907 | 6 | 600 | 2500 |
-8,-17,15,8 | 2 | 1 | 513409 | 3 | 17 | 25 | 2 | 0.880 | 0.857 | 5 | 600 | 2300 |
-2,-9,7,40 | 2 | 1 | 41273 | 3 | 40 | 41 | 2 | 0.920 | 0.867 | 5 | 600 | 2500 |
1,-3,-1,1 | 2 | 1 | 148 | 3 | 3 | 3 | 2 | 1.000 | 0.794 | 5 | 600 | 2300 |
10,-2,-6,2 | 2 | 1 | 2368 | 3 | 10 | 12 | 2 | 1.000 | 0.788 | 5 | 600 | 2300 |
13,23,9,1 | 0 | 3 | 148 | 3 | 23 | 27 | 2 | 1.000 | 0.779 | 4 | 600 | 2100 |
22,-45,-43,22 | 2 | 1 | 29292649 | 3 | 45 | 69 | 2 | 0.950 | 0.800 | 4 | 600 | 2500 |
24,-97,-190,192 | 2 | 1 | 2654410756 | 3 | 192 | 288 | 2 | 0.450 | 0.963 | 4 | 1200 | 2500 |