June 1, 2018

The Problem

Background

  • We say a polynomial \(f(x)\) is irreducible over \(\mathbb{Q}\), or simply irreducible, if it has no rational roots, i.e. there is no rational \(a\) such that \(f(a) = 0\).
  • This is equivalent to not being able to write \(f(x) = g(x)h(x)\) as a product of two (or more) polynomials of lesser degree.
  • If \(f(x)\) is reducible, then the first iterate \(f(f(x))\) must be reducible.
  • If \(f(x)\) is irreducible, then \(f(f(x))\) is not neccessarily irreducible.
  • Why is this?
  • Given \(f(x)\) can we predict what will happen?

Emergent Reducibility: Examples

  • We say a polynomial has emergent reducibility if \(f(x)\) is irreducible, but \(f(f(x))\) is reducible.
  • \(f(x) = x^2-8x+20\), then \[ f(f(x)) = (x^2-8x+20)^2-8(x^2-8x+20)+20 \] \[ \qquad = (x^2-10x+26)(x^2-6x+10) \]
  • \(f(x) = x^2-x-1\) is irreducible but \(f(f(f(x)))\) factors.
  • \(f_a(x) = -8ax^3-(8a+2)x^2+(4a-1)x+a\) irreducible but \(f_a(f_a(x))\) factors for positive integer \(a\).

Why Mathematicians care?

  • The roots of iterates form a tree
  • The automorphism groups of the trees have a natural Galois structure, and are structurally similar to \(p\)-adic Lie groups

Frequency and Imbalance

  • Brute-force search of over \(200\) million irreducible cubics
  • Found \(59\) examples of ER
  • Typical solution to class imbalance: re-balance training sets
  • How does re-balancing effect different models?

Machine Learning Process

Build Sets

  • Read data in with data.table (11GB)
  • Remove duplicates
  • Build 21 training sets
  • Each has same 43 ER cases
  • Number of non-ER varies from 500 to 2500 by 100
  • Non-ER cases are sampled from main dataset
  • One test set, 16 ER cases and 8000 non-ER

Build NER R code

mkNER.R:

  bigNER <- fread(bigFile, header=TRUE, sep=",")
  bigNER <- bigNER[!duplicated(bigNER) & bigNER$numFact==1,]

    samps <- sapply(nerSize, function(x) sample(1:n, x, 
                                            replace = FALSE))
    nerss <- map(samps, function(x) bigNER[x,])
    for(i in 1:length(nerSize)){
      trsName <- paste(paste("NERtrain",nerSize[i],
                             sep = "-"),"csv",sep=".")
      write.csv(nerss[[i]],trsName, row.names = FALSE)
    }

Models

For each of the 21 training sets, we'll build 9 models

  • 3 logistic regression with regularization (glmnet)
  • 4 random forests
  • naive Bayes, knn

  • Each model build using 10-fold cross validation and "Kappa" error metric

One of the models:

library(caret)
library(doParallel)
  cl <- makeCluster(detectCores())
  registerDoParallel(cl)
  
 tr1.rfs <- train(numFact~const+lin+quad+cube+nSign+pSign+sigReal,
                  data=trs1, method="rf", metric = "Kappa",  
                  trControl = trainControl(method="cv", 
                              number = 10, allowParallel = TRUE))

 tst$rfs <- predict(tr1.rfs, tst, type = "prob")[,2]

 stopCluster(cl)

Confusion Data Frame

Sample of Data Frame with 1134 confusion matrices!

mdl TN TP FP FN ner theta
lrs 8000 0 0 16 2000 0.30
rfpp 7908 10 92 6 2500 0.25
rfpp 7877 8 123 8 2400 0.25
rfs 7986 10 14 6 2000 0.50
knn 7699 16 301 0 2500 0.20
rfsq 7850 14 150 2 1400 0.25

ROC: Fix ner, vary \(\theta\)

ROC: Fix \(\theta\), vary ner

Animated ROC

aniroc

anirocft

Results Overview

TP mdl count avg.FP min.ner max.ner max.theta
16 knn 32 498.1250 500 2500 0.20
16 rfs 17 251.0000 500 1600 0.30
16 rfsq 7 378.5714 500 800 0.25
16 rfp 6 334.8333 500 800 0.30
15 rfs 63 145.6825 500 2500 0.50
15 rfp 59 188.0169 600 2400 0.40
15 rfsq 37 199.2432 600 2500 0.30
15 knn 36 320.3056 500 1900 0.40

Results: RF

mdl TN TP FP FN ner theta
rfs 7830 16 170 0 1200 0.25
rfs 7816 16 184 0 600 0.30
rfs 7812 16 188 0 800 0.25
rfs 7799 16 201 0 1200 0.20
rfs 7790 16 210 0 1100 0.15
rfs 7781 16 219 0 500 0.30
rfs 7778 16 222 0 1600 0.15

RFS Variable Importance

variable rfs6 rfs8 rfs12
const 100.00 100.00 98.74
lin 78.07 84.51 100.00
quad 42.50 43.38 64.10
cube 68.57 75.10 99.82
sigReal 40.78 33.22 34.99

Results: KNN

mdl TN TP FP FN ner theta
knn 7699 16 301 0 2500 0.15
knn 7699 16 301 0 2500 0.20
knn 7682 16 318 0 2100 0.20
knn 7681 16 319 0 2100 0.15
knn 7667 16 333 0 2300 0.15
knn 7667 16 333 0 2300 0.20
knn 7664 16 336 0 2000 0.20

KNN Variable Importance

variable knn21 knn23 knn25
const 0.00 0.00 0.00
lin 16.19 16.19 15.94
quad 12.49 3.37 2.86
cube 12.46 9.49 8.59
nSign 28.35 25.41 25.92
pSign 28.42 25.17 25.78
sigReal 100.00 100.00 100.00

Missed ER by RF

poly knn.mean rfs.mean rfp.mean n min.ner max.ner
-12,-49,94,96 0.400 0.319 0.141 2 2300 2500
-14,-57,110,112 0.200 0.458 0.156 1 2500 2500
12,-49,-94,96 0.600 0.510 0.236 1 2300 2300
2,29,105,112 0.200 0.524 0.184 1 2300 2300
4,44,140,140 0.381 0.149 0.069 6 600 2500

Missed ER by KNN

The following are missed by KNN

poly knn.mean rfs.mean rfp.mean n min.ner max.ner
-1,15,-56,63 0.20 0.328 0.492 1 2500 2500
-14,-57,110,112 0.20 0.391 0.281 3 2100 2500
2,29,105,112 0.19 0.631 0.471 6 600 2500

Properties of Misses

poly pSign nSign disc sigReal ht L2ht numFact
-14,-57,110,112 2 1 307396996 3 112 167 2
-12,-49,94,96 2 1 165938692 3 96 143 2
-1,15,-56,63 2 1 -1967 1 63 85 2
2,29,105,112 0 3 7441 3 112 156 2
4,44,140,140 0 3 -35840 1 140 202 2
12,-49,-94,96 2 1 165938692 3 96 143 2

Properties of Hits

poly pSign nSign disc sigReal ht L2ht numFact knn.mean rfs.mean n min.ner max.ner
-4,-9,7,4 2 1 32353 3 9 12 2 0.933 0.907 6 600 2500
-8,-17,15,8 2 1 513409 3 17 25 2 0.880 0.857 5 600 2300
-2,-9,7,40 2 1 41273 3 40 41 2 0.920 0.867 5 600 2500
1,-3,-1,1 2 1 148 3 3 3 2 1.000 0.794 5 600 2300
10,-2,-6,2 2 1 2368 3 10 12 2 1.000 0.788 5 600 2300
13,23,9,1 0 3 148 3 23 27 2 1.000 0.779 4 600 2100
22,-45,-43,22 2 1 29292649 3 45 69 2 0.950 0.800 4 600 2500
24,-97,-190,192 2 1 2654410756 3 192 288 2 0.450 0.963 4 1200 2500

Next

  • Incorporate classifiers into search code
  • Interpret RFS in detail
  • See http://jpreszler.rbind.io for cubic family paper and related work.