Translate

Wednesday, October 9, 2019

Predicting 2019 Outcomes: Part II

Here is a summary and snapshot of my latest latent class predictions.  Use at your own risk.  -RMF

Summary

View this post in wide mode on your cell phone. My best modeling almost always shows potential Democrat votes exceeding Republican votes in any county wide high turnout vote. There is a tendency in my modeling to overestimate Democrats but this is difficult to confirm completely since there is an historical tendency of predicted Democrats to turnout less than predicted Republicans. GE 2018 was  a very strong turnout for Whatcom County. As 2019 is a local election, it is unknown what the turnout will be. The data below give model predictions and GE 2018 votes for Senate*.
  • MC= Maria Cantwell
  • SH = Susan Hutchison

   sumD.Model sumMC* sumR.Model sumSH*
1:      95087 64971      49916 43757

*Does not include votes from hidden precincts.

The Whatcom ounty voter database has experience net growth year over year. This is probably because of increased registration efforts and increased net migration rates in Western WA and Whatcom County especially.
  • October 2019 WM Active = 145,414
  • October 2018 WM Active = 139,876
This gives us a 5,538 net increase year over year. Comparing registrations between the two points of 10/2018 and 10/2019:
  • 9130 new active registration numbers
  • 2098 lost (once active) registrations numbers
Those numbers are hardly indicative of voter "churn" or "flux". The WA SoS provides an excellent description of the effects of voter mobility on the voter database. Many voters are like light bulbs on a board with sketchy wiring. They toggle on and off  from active to inactive status depending on migration, in county moves or simply forgetting to register a new address after the last move. You are can also be deactivated if you fail to vote in two consecutive federal elections.

Precinct Snapshot:

VR[CountyCode == "WM" & StatusCode == "Active",.N,.(PrecinctCode)][order(PrecinctCode)]:

     PrecinctCode    N
  1:          101 1086
  2:          102  716
  3:          103  868
  4:          104  514
  5:          105  493
 ---                  
175:          609  976
176:          610  938
177:          611  697
178:          701  882
179:          801  839

   Precincts GTR.1000 LT.500 mean min  max mad  sd   var

1:       179       47     20  812   2 1446 249 291 84856


I describe my use of the poLCA library here in this post. To reiterate, latent class analysis or regression is not usually described as a "big data" or "machine learning" technique. However, the approach to prediction is essentially the same. Without 'high dimensional' data, I attempt to regress a latent class composed of manifests (age, gender, location) against covariates (e.g. 'training set' in ML speak) here derived from the AVBallotParty field of the May 24, 2016 Presidential Primary.  In this Primary, WA residents are asked to identify and AVBallotParty field as 'Democratic' or 'Republican'. 

Latent class analysis requires 'parsimony' for accuracy. The manifests should be informative enough to be able accurately regress against the covariates or latent classes. In this case I judge the accuracy of latent class regression by how close the prediction comes to a recent precinct results (Senate GE 2018). To do this, I divide up the precincts in groups of levels of support for the Democratic candidate. I then regress my manifests against covariates specific to those groups, recombine all the posteriors for a separate individual score as in the table below. I make no guarantees that I have any real idea what I am doing, but at least I am searching for an electoral prediction solution that doesn't involve spending thousands or millions of dollars to purchase someone's private Facebook data!


Age Gender PID AVBallotParty DemPred RepubPred Region Party
1 70.00 1 251 1.00 0.00 SD SD
2 67.00 2 251 1.00 0.00 SD SD
3 51.00 1 509 Republican 0.00 1.00 LR SR
4 78.00 2 224 1.00 0.00 SD SD
5 42.00 2 104 NO PARTY SELECTED 0.00 1.00 LR SR
6 88.00 1 115 Democratic 1.00 0.00 SR SD
7 61.00 1 115 NO PARTY SELECTED 0.00 1.00 SR SR
8 50.00 2 115 NO PARTY SELECTED 0.00 1.00 SR SR
9 80.00 2 115 Democratic 1.00 0.00 SR SD
10 63.00 2 115 Republican 0.00 1.00 SR SR
11 63.00 1 115 Republican 0.00 1.00 SR SR
12 29.00 2 141 1.00 0.00 SR SD
13 79.00 2 145 0.00 1.00 SR SR
14 69.00 1 145 0.00 1.00 SR SR
15 50.00 1 604 0.00 1.00 SR SR
16 22.00 2 243 0.00 1.00 SD SR

Precinct Division by GE2018 Democratic Senate Vote:

 t1[div <= .55 & div >= .45,Region:= "Ind"] # Independent
 t1[div > .55 & div <= .65,Region:= "LD"] # Light Democrat
 t1[div > .65,Region:= "SD"] # Strong Democrat
 t1[div < .45 & div >= .35,Region:= "LR"] # Light Republican
 t1[div < .35,Region:= "SR"] # Strong Republican

   Regions Cantwell Hutchison RegionTotal
1:      SD    40292     11485       67202
2:      SR     4073     11529       20870
3:     Ind     7069      7200       20044
4:      LD     7485      4841       16631
5:      LR     6052      8702       20256

Latent Class Regression Prediction Percentages by aggregated GE 2018 Senate Precinct vote (See Precinct Division above):


SD SR IND LD LR
1 86.00 28.00 57.00 75.00 43.00
2 14.00 72.00 43.00 25.00 57.00

Modeled by Predicted Party vote without Independents. (e.g. Voters pushed to one party or the other based on individual score.)

m1[,.N,.(Party)]
   Party     N
1:    SD 93951
2:    SR 48849
3:    LD  1136
4:    LR  1067

Model vs GE 2018 Senate Vote (excepts hidden precincts in GE 2018)

MC= Maria Cantwell
SH = Susan Hutchison
   sumD.Model sumMC sumR.Model sumSH
1:      95087 64971      49916 43757

Problem Precincts for LCA Prediction

These are the precinct predictions with the greatest distance from the Senate GE 2018 vote. These precincts have an absolute percentage difference between projected and Senate GE 2018 of over 25%.  Most of these precinct predictions are biased toward Democrats. The extreme Democratic votes of the WWU dorm district precincts (245,252) cause the poLCA library appropriation of Newton-Raphson to reverse their vote patterns almost entirely!


PrecinctID MariaCantwell SusanHutchison D.lca R.lca D.pct.abs.diff
1 105 207 161 404 89 25.70
2 108 393 428 943 268 29.10
3 159 115 86 117 17 30.10
4 162 341 254 623 133 25.10
5 166 418 317 839 157 27.30
6 182 530 403 1149 201 28.30
7 226 577 47 480 394 37.60
8 245 635 65 7 537 89.40
9 247 713 71 671 359 25.80
10 252 269 25 16 215 84.60
11 253 874 118 742 534 29.90
12 257 568 51 382 365 40.70
13 501 373 417 830 269 27.40
14 502 301 283 653 165 27.50
15 503 336 375 665 241 25.90
16 504 373 362 862 248 26.70
17 507 287 298 682 213 26.20

By LCA Predicted Region: Senate GE 2018 Totals and Precinct Prediction Percent

Regions Cantwell Hutchison RegionTotal LiklDemPct LiklRepPct CantPct HutchPct
1 Ind 7069 7200 20044 57.1 42.9 49.50 50.50
2 LD 7485 4841 16631 74.8 25.2 60.70 39.30
3 LR 6052 8702 20256 43.3 56.7 41.00 59.00
4 SD 40292 11485 67202 86.3 13.7 77.80 22.20
5 SR 4073 11529 20870 28 72 26.10 73.90


Prediction By Category with LastVoted

LastVoted LD LR SD SR
1 2019-08-06 369 395 36412 20838
2 2018-11-06 389 368 31609 15200
3 No Last Vote Record 131 110 10197 5316
4 2016-11-08 0 0 6475 2829
5 2019-02-12 0 0 2272 2086
6 2012-11-06 0 0 1431 406
7 2018-08-07 0 0 662 293
8 2017-11-07 0 0 641 339
9 2008-11-04 0 0 515 136
10 2014-11-04 0 0 407 0
11 2018-02-13 0 0 371 118
12 2016-05-24 0 0 365 181
13 2013-11-05 0 0 250 0
14 2010-11-02 0 0 244 0
15 2017-08-01 0 0 221 0
16 2004-11-02 0 0 211 0
17 2015-11-03 0 0 161 0
18 2016-02-09 0 0 154 0


Prediction: By Age Decade and Predicted Party

Decade LD LR SD SR
1 10 0 0 23 24
2 9 2 2 787 621
3 8 0 11 3299 2965
4 7 0 76 10324 7166
5 6 490 706 15353 8525
6 5 589 262 14021 6953
7 4 55 10 14584 6449
8 3 0 0 19292 4315
9 2 0 0 15565 9520
10 1 0 0 703 2311

Prediction: 42nd by Party

Party N
1 SR 39211
2 SD 58221
3 LD 1037
4 LR 969


Prediction: 40th by Party

Party N
1 SD 35730
2 SR 9638
3 LD 99
4 LR 98


poLCA citation:

Linzer, Drew A. and Jeffrey Lewis. 2013. "poLCA: Polytomous Variable Latent Class Analysis." R package version 1.4. http://dlinzer.github.com/poLCA.
Linzer, Drew A. and Jeffrey Lewis. 2011. "poLCA: an R Package for Polytomous Variable Latent Class Analysis." Journal of Statistical Software. 42(10): 1-29. http://www.jstatsoft.org/v42/i10

No comments: