week7RegressionResults.Rmd 23.7 KB
Newer Older
Ben Anderson's avatar
Ben Anderson committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
---
title: 'FEEG6025: Week 7 -  Logistic and Probit Regression'
author: "you@soton.ac.uk"
date: 'Last run at: `r Sys.time()`'
output:
  html_document:
    fig_caption: yes
    number_sections: yes
    toc: yes
    toc_float: yes
  html_notebook:
    fig_caption: yes
    number_sections: yes
    toc: yes
  pdf_document:
    fig_caption: yes
    number_sections: yes
    toc: yes
  word_document:
    fig_caption: yes
    toc: yes
---

# Set up libraries

You may need to install the following libraries first:

 * ggplot2
 * data.table
 * readr
 * stargazer
 * car
 * MASS
 * broom
 * knitr

You may also need to update all packages. Make sure you include 'dependencies' if  you do this.

```{r setup, include=FALSE}
# DO NOT EDIT ANYTHING IN THIS CHUNK!!!

startTime <- Sys.time()

knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(eval = TRUE) # set to FALSE for the Task desc

# if these fail to load R does not stop but will generate errors later when their functions are called
library(ggplot2) # for graphs
library(readr) # for fast data reads
library(data.table) # for data munching
library(stargazer) # for nice (but slightly complex) tables
library(car) # for regression tools & tests
library(MASS) # for probit regression
library(broom) # for tidy regression outputs
library(knitr) # for nicely formatted tables using kable

# set input file name & location
needfile <- "http://www.soton.ac.uk/~ba1e12/data/feeg6025/need_public_use_file_2014.csv.gz"
#needfile <- "~/OneDrive - University of Southampton/FEE/FEEG6025/Generic data/NEED/need_public_use_file_2014.csv.gz"
```

# Background

Today's work will use the use the DECC NEED public use file, a sample of 49,815 records selected to be representative of the housing stock and available to all via the DECC website and data.gov.uk: https://data.gov.uk/dataset/nationalenergyefficiencydataframeworkanonymiseddata.

We will use it to explore two research questions:

 * What predicts high household gas consumption in 2012?
 * What predicts being a high, medium or low gas consumer in 2012?
 
These questions will require different modelling approaches.

# Load and process data

First load the data.

```{r loadNeed, echo = FALSE}
# Code will not show up in output
# DO NOT EDIT ANYTHING IN THIS CHUNK!!!
print(paste0("Loading: ", needfile))
# this may take a few seconds depending on the location you are loading from - it is quite large!
needPulf2014DT <- as.data.table(read_csv(needfile))
print(paste0("That produced a file of ", uniqueN(needPulf2014DT$HH_ID), " households."))
```

Now process the data to remove years we are not interested in and tidy up variable labels etc. This leaves the following variables:

```{r processData, echo=FALSE}
# Code will not show up in output
# DO NOT EDIT ANYTHING IN THIS CHUNK!!!

# create/code some factor variables ----
# see https://blackboard.soton.ac.uk/webapps/blackboard/content/listContentEditable.jsp?content_id=_3196230_1&course_id=_170962_1 -> "Excel workbook explaining variable codes"

# Gas readings - too big/too small/missing/no gas/valid (see documentation)
needPulf2014DT$Gcons2012Valid <- factor(needPulf2014DT$Gcons2012Valid, 
                                        labels = c("> 50k", 
                                                   "< 100", 
                                                   "Missing",
                                                   "No gas",
                                                   "Valid 100 - 25k" ))

# Same for electricity - only two
needPulf2014DT$Econs2012Valid <- factor(needPulf2014DT$Econs2012Valid, 
                                        labels = c("> 25k", 
                                                   #"< 100", 
                                                   #"Missing",
                                                   "Valid 100 - 25k" ))

# Main heat fuel
needPulf2014DT$MAIN_HEAT_FUEL <- factor(needPulf2014DT$MAIN_HEAT_FUEL, 
                                        labels = c("Gas", "Electricity"))

# Do they have economy 7 electricity?
needPulf2014DT$E7Flag2012 <- factor(needPulf2014DT$E7Flag2012, 
                                    labels = c("Econ7 = Yes"))

# Age of property
needPulf2014DT$PROP_AGE <- factor(needPulf2014DT$PROP_AGE, 
                                  labels = c("before 1930",
                                             "1930-1949",
                                             "1950-1966",
                                             "1967-1982",
                                             "1983-1995",
                                             "1996 onwards"))

# Type of property
needPulf2014DT$PROP_TYPE <- factor(needPulf2014DT$PROP_TYPE, 
                                   labels = c("Detached",
                                              "Semi-detached",
                                              "End terrace",
                                              "Mid terrace",
                                              "Bungalow",
                                              "Flat (incl. maisonette)"))

# Floor area
needPulf2014DT$FLOOR_AREA_BAND <- factor(needPulf2014DT$FLOOR_AREA_BAND, 
                                         labels = c("1 to 50 m2",
                                                    "51-100 m2",
                                                    "101-150 m2",
                                                    "> 151 m2"))

# Electricity efficiency band
needPulf2014DT$EE_BAND <- factor(needPulf2014DT$EE_BAND, 
                                 labels = c("A or B",
                                            "C",
                                            "D",
                                            "E",
                                            "F",
                                            "G"))

# Wall construction
needPulf2014DT$WALL_CONS <- factor(needPulf2014DT$WALL_CONS, 
                                   labels = c("Cavity",
                                              "Other"))

# Cavity wall insulation
needPulf2014DT$CWI <- ifelse(is.na(needPulf2014DT$CWI), 
                             "Not known",
                             "Installed")

# Cavity wall insulation year
needPulf2014DT$CWI_YEAR <- ifelse(is.na(needPulf2014DT$CWI_YEAR), 
                             "Not known",
                             needPulf2014DT$CWI_YEAR)

# Boiler installation year
needPulf2014DT$BOILER_YEAR <- ifelse(is.na(needPulf2014DT$BOILER_YEAR), 
                             "Not known",
                             needPulf2014DT$BOILER_YEAR)

# Loft depth
needPulf2014DT$LOFT_DEPTH <- factor(needPulf2014DT$LOFT_DEPTH,
                                    labels = c("Under 150 mm", 
                                               "> 150 mm",
                                               "Unknown")
                                    ) # 99 = NA (helpfully!)

# Region
needPulf2014DT$REGION <- factor(needPulf2014DT$REGION,
                                labels = c("North East", 
                                           "North West", 
                                           "Yorkshire & The Humber",
                                           "East Midlands",
                                           "West Midlands",
                                           "East of England",
                                           "London",
                                           "South East",
                                           "South West",
                                           "Wales"))

# leave li_year & boiler_year to later

# remove elec to form a simnple data table just with gas % household attribute data ----
needGasDT <- needPulf2014DT[, !grep("Econs|E7Flag|200[0-9]|2010|2011", names(needPulf2014DT)), with = FALSE] 

names(needGasDT)
```

# Inspecting the data

What is in the data and what are the central tendencies?

```{r inspectData}
# Hint: use summary() and wrap the kable function around it to get a pretty table
kable(caption = "Summary of DECC NEED 2014 PUL file",
  summary(needGasDT)
)
```

# Testing non-parametric distributions

## Cavity wall insulation by dwelling age
We want to know if the distribution of the age of dwellings differes for those with/without cavity wall insulation.

```{r cwiByAgeTableN}
kable(caption = "Cavity wall insulation by age of dwelling (counts)",
  table(needGasDT$PROP_AGE,needGasDT$CWI)
)
```

Now re-run the table and create row proportions for ease of comparison:

```{r cwiByAgeTable%}
kable(caption = "Cavity wall insulation by age of dwelling (row proportions)",
  round(prop.table(table(needGasDT$PROP_AGE,
                         needGasDT$CWI
                         )
                   ,1 # option 1 to prop.table gives column proportions
                   ) 
        ,2
        ) # round to 2 d.p
)
```

**Please answer this and all the following questions in the blackboard test, NOT in the RMarkdown**

 * Q1: Is there a difference in the distribution of cavity wall insulation between dwelling age groups?
 
We now need to test of the differences are statisitcally significant using a chi sq test:

```{r testCwiByAge}
chisq.test(needGasDT$PROP_AGE,needGasDT$CWI)
```

 * Q2: What would you conclude from the cavity wall insulation chi sq test?

## Loft depth by dwelling age

In this example we want to test the 3 possible loft depth values:

```{r ldByAgeTable%}
kable(caption = "Loft depth by age of dwelling (row proportions)",
  round(prop.table(table(needGasDT$PROP_AGE,
                         needGasDT$LOFT_DEPTH
                         )
                   ,1 # option 1 to prop.table gives column proportions
                   ) 
        ,2
        ) # round to 2 d.p
)

```

 * Q3: What would you conclude from the Loft depth table of row proportions?
 
We now need to test of the differences are statisitcally significant using a log linear model:

```{r logCwiByAge}
# make xtab
loftXt <- xtabs(~ PROP_AGE + LOFT_DEPTH, needGasDT)
# use it in log linear model
loglm(~ PROP_AGE + LOFT_DEPTH, loftXt, fit=TRUE)
```

 * Q4: Does the log linear test suggest a significant difference between some of the groups?

# Predicting low gas consumption households

We want to test if the following characteristics are associated with low gas consumption:

 * property age (PROP_AGE)
 * property type (PROP_TYPE)
 * loft depth (LOFT_DEPTH)
 * floor area (FLOOR_AREA_BAND)
 
## Prepare the data

We define high gas consumption as being in the bottom 20% of consumers. First create an indicator for these households so we can use it as our outcome variable:
 
```{r setLowCons}
# note that there are some NAs in the data so we have to tell R to exlcue these
  
gasCut <- quantile(needGasDT$Gcons2012, probs = c(0.2), na.rm = TRUE)

# create an indicator for highest 20% vs the rest
needGasDT$Gcons2012Lo <- ifelse(needGasDT$Gcons2012 > gasCut,
                                     0, # not low
                                     1 # low
                                     )

needGasDT$Gcons2012Lo <- factor(needGasDT$Gcons2012Lo,
                                     labels = c(
                                       "Not low",
                                       "Lowest 20%"
                                     )
                                     )

# check 
#prop.table(table(needGasDT$Gcons2012Lo))

# check that we have valid gas data for all of our variables
kable(caption="Check PROP_AGE",
      table(needGasDT$PROP_AGE,needGasDT$Gcons2012Lo, useNA = "always")
)

kable(caption="Check PROP_TYPE",
      table(needGasDT$PROP_TYPE,needGasDT$Gcons2012Lo, useNA = "always")
)

kable(caption="Check LOFT_DEPTH",
      table(needGasDT$LOFT_DEPTH,needGasDT$Gcons2012Lo, useNA = "always")
)

kable(caption="Check FLOOR_AREA_BAND",
      table(needGasDT$FLOOR_AREA_BAND,needGasDT$Gcons2012Lo, useNA = "always")
)

```

If we see 'NA' as a value in the row labels it means we are missing observations for this predictor variable. 

## Construct models

Now construct a logistic regression model to test which of the above variables predict low gas consumption in 2012 using the variables above.

```{r modelLoCons}

logitGasLoM1 <- glm(formula = Gcons2012Lo ~ PROP_AGE + PROP_TYPE + LOFT_DEPTH + FLOOR_AREA_BAND, 
                   family = binomial(logit), needGasDT)
```

## Model diagnostics

Before we interpret the results we need to check the model assumptions. These are a subset of the ones we run for linear OLS models.

```{r diagnosticsLoCon}
# Diagnostics: independence of errors
# If p < 0.05 then a problem as implies autocorrelation
# but beware large sample sizes - you will get a small p value with large sample sizes, Instead look at the D-W statistic:
# if D-W is less than 1.0, there may be cause for alarm as small values of D-W indicate successive error terms are, on average, close in value to one another, or positively correlated. 
# Values arund 2 re usually considered acceptable
# If D-W > 2, successive error terms are, on average, very different in value from one another, i.e., negatively correlated.
durbinWatsonTest(logitGasLoM1)

# Diagnostics: collinearity (vif)
# if any values > 10 -> problem
vif(logitGasLoM1)

# Diagnostics: collinearity (tolerance)
# if any values < 0.2 -> possible problem
# if any values < 0.1 -> definitely a problem
1/vif(logitGasLoM1)

```

 * Q5: Which two variables appear to be to some extent collinear?

Although the values are not greater than 10 (see above) we will remove the variable with the highest variance inflation factor (vif) score, re-run the models and re-test the diagnostics.

 * Q6: Which variable should you remove?

Now re-run your corrected model:

```{r modelHiConsCorrected}

logitGasLoM2 <- glm(formula = Gcons2012Lo ~ PROP_AGE + LOFT_DEPTH + FLOOR_AREA_BAND , 
                   family = binomial(logit), needGasDT)

# Diagnostics: independence of errors
# If p < 0.05 then a problem as implies autocorrelation
# but beware large sample sizes - you will get a small p value with large sample sizes, Instead look at the D-W statistic:
# if D-W is less than 1.0, there may be cause for alarm as small values of D-W indicate successive error terms are, on average, close in value to one another, or positively correlated. 
# Values arund 2 re usually considered acceptable
# If D-W > 2, successive error terms are, on average, very different in value from one another, i.e., negatively correlated.
durbinWatsonTest(logitGasLoM2)

# Diagnostics: collinearity (vif)
# if any values > 10 -> problem
vif(logitGasLoM2)

# Diagnostics: collinearity (tolerance)
# if any values < 0.2 -> possible problem
# if any values < 0.1 -> definitely a problem
1/vif(logitGasLoM2)

```

 * Q7: Has removing that variable produced a more acceptable model?

## Model Results
Now get stargazer to produce tidy outputs of the final model:

```{r modelHiConsResults, results='asis'}
# NB: we need to force knitr to keep the html (asis)
stargazer(logitGasLoM2,
          title = "Gas model results",
          ci = TRUE, 
          single.row = TRUE,
          type = "html")

# visualise using ggplot
library(broom) # in case you didn't
logitGasLoM2DF <- tidy(logitGasLoM2)
logitGasLoM2DF$ci_lower <- logitGasLoM2DF$estimate - qnorm(0.975) * logitGasLoM2DF$std.error
logitGasLoM2DF$ci_upper <- logitGasLoM2DF$estimate + qnorm(0.975) * logitGasLoM2DF$std.error
logitGasLoM2DF$model <- "Model 1"

plotModel <- ggplot(logitGasLoM2DF, aes(x = term, y = estimate)) + 
    geom_bar(position=position_dodge(), stat="identity") +
    geom_errorbar(aes(ymin=ci_lower, ymax=ci_upper),
                  width=.2,                    # Width of the error bars
                  colour = "black", # make them visible
                  position=position_dodge(.9)
                  ) +
  labs(y = "Log odds") +
  coord_flip() # turn it round to make it legible
plotModel
```

Note that stargazer has produced `log odds` which range from -ve values (large negative effect) through 0 (no effect) to +ve values (large positive effect). Note also that the first category of the factors is not displayed because the coefficients represent a comparison to this 'missing' category.

Questions:

 * Q8: How many observations are in the model?
 * Q9: Why is this not all of them? 
 * Q10: What is the AIC score of the model? 
 * Q11: What can we conclude about the relationship between property age and the probability of being a low gas consumer?
 * Q12: Which property age is the least likely to be in the low gas consumption group?
 * Q13: What can we conclude about the relationship between floor area and the probability of being a low gas consumer? 
 * Q14: If your floor area is > 151 m2 how many times less likely are you to be in the low gas consumption group than if your floor area is < 50 m2
 * Q15: If we were to re-run this model for electricity do you think the model would perform as well?
 * Q16: Give reasons for your answer. 
 
# Predicting low gas consumption in England and Wales?

You will notice that there are two IMD_ variables:

 * IMD_ENG: English index of deprivation decile for the dwellings in England
 * IMD_WALES: Welsh index of deprivation decile for the dwellings in Wales
 
 We would like to know whether households in more deprived areas (low IMD decile) are more likely to be low gas consumers controlling for floor area etc. To do this we need to run two models, one for England and one for Wales. Before we do that we will check the data using a bar chart.

```{r imdPlots}
# Create a ggplot of the mean of a value (with 95% CI) across a factor variable
ggplot(needGasDT, aes(IMD_ENG, Gcons2012)) + 
  stat_summary(fun.y = "mean", geom = "bar", fill = "grey70") +
  stat_summary(fun.data = "mean_cl_normal", geom = "errorbar", width = 0.4) + 
  labs(title = "Mean annual gas consumption by English IMD decile (Error bars = 96% CI for the mean)")

ggplot(needGasDT, aes(IMD_WALES, Gcons2012)) + 
  stat_summary(fun.y = "mean", geom = "bar", fill = "grey70") +
  stat_summary(fun.data = "mean_cl_normal", geom = "errorbar", width = 0.4) + 
  labs(title = "Mean annual gas consumption by Welsh IMD decile (Error bars = 96% CI for the mean)")
```

Now run your models.
 
```{r imdModels}

logitGasLoM2Eng <- glm(formula = Gcons2012Lo ~ PROP_AGE + LOFT_DEPTH + FLOOR_AREA_BAND + as.factor(IMD_ENG), 
                   family = binomial(logit), needGasDT)

logitGasLoM2Wal <- glm(formula = Gcons2012Lo ~ PROP_AGE + LOFT_DEPTH + FLOOR_AREA_BAND + as.factor(IMD_WALES), 
                   family = binomial(logit), needGasDT)

```

## Model Results
Normally we should also runt he diagnostic checks but for now we will get stargazer to produce tidy outputs of the final model:

```{r modelHiConsResultsIMD, results='asis'}
# NB: we need to force knitr to keep the html (asis)
stargazer(logitGasLoM2Wal,logitGasLoM2Eng,
          title = "Gas model results",
          ci = TRUE, 
          single.row = TRUE,
          type = "html")

library(broom) # in case you didn't
logitGasLoM2WalDF <- tidy(logitGasLoM2Wal)
logitGasLoM2WalDF$ci_lower <- logitGasLoM2WalDF$estimate - qnorm(0.975) * logitGasLoM2WalDF$std.error
logitGasLoM2WalDF$ci_upper <- logitGasLoM2WalDF$estimate + qnorm(0.975) * logitGasLoM2WalDF$std.error
logitGasLoM2WalDF$model <- "Wales"

logitGasLoM2EngDF <- tidy(logitGasLoM2Eng)
logitGasLoM2EngDF$ci_lower <- logitGasLoM2EngDF$estimate - qnorm(0.975) * logitGasLoM2EngDF$std.error
logitGasLoM2EngDF$ci_upper <- logitGasLoM2EngDF$estimate + qnorm(0.975) * logitGasLoM2EngDF$std.error
logitGasLoM2EngDF$model <- "England"


# combine the results
modelsDF <- rbind(logitGasLoM2WalDF, logitGasLoM2EngDF)

# draw the plots using 'model' to seperate them
plotModels <- ggplot(modelsDF, aes(x = term, y = estimate)) + 
    geom_bar(aes(fill = model), 
             position=position_dodge(), stat="identity") +
    geom_errorbar(aes(ymin=ci_lower, ymax=ci_upper, group = model),
                  width=.2,                    # Width of the error bars
                  colour = "black", # make them visible
                  position=position_dodge(.9)
                  ) + 
  labs(y = "Log odds") +
  coord_flip() # turn it round to make it legible
  
plotModels
```

Questions:

 * Q17: If you live in a high deprivation area (low IMD decile) are you more or less likely to be a low gas user in Wales?
 * Q18: If you live in a low deprivation area (high IMD decile) are you more or less likely to be a low gas user in Wales?
 * Q19: Do we see the same pattern in England?
 * Q20: Overall what would you conclude from these models about whether people in deprived areas use more or less gas?
 
# Predicting low, medium and high gas users

In contrast to the previous questions, we now want to know what predicts being a low, medium or high gas user. To do this we will use an ordered logit (probit) regression using the same variables:

 * property age (PROP_AGE)
 * loft depth (LOFT_DEPTH)
 * floor area (FLOOR_AREA_BAND)

## Prepare the data
First we need to create the low/medium/high categories. In this case we will define high as the top 20% and medium to be the middle 60%.

```{r gasCuts}
# note that there are some NAs in the data so we have to tell R to exlcue these
  
gasCutLo <- quantile(needGasDT$Gcons2012, probs = c(0.2), na.rm = TRUE)
gasCutHi <- quantile(needGasDT$Gcons2012, probs = c(0.8), na.rm = TRUE)

# create an indicator for highest 20% vs the rest
needGasDT$Gcons2012cuts <- ifelse(needGasDT$Gcons2012 <= gasCutLo,
                                     1, # not low
                                     2 # rest
                                     )
needGasDT$Gcons2012cuts <- ifelse(needGasDT$Gcons2012 >= gasCutHi,
                                     3, # high
                                     needGasDT$Gcons2012cuts # copy previous
                                     )

needGasDT$Gcons2012cuts <- factor(needGasDT$Gcons2012cuts,
                                     labels = c(
                                       "Lowest 20%",
                                       "Middle 60%",
                                       "Highest 20%"
                                     )
                                     )

# check 
#prop.table(table(needGasDT$Gcons2012cuts))

```

## Run the model:

```{r modelGasCuts}
library(MASS) # in case you didn't do so before (install first)
probitGasCutsM1 <- glm(Gcons2012cuts ~ PROP_AGE + LOFT_DEPTH + FLOOR_AREA_BAND,
                       needGasDT, family=binomial(link="probit"))
```

## Model Results
Normally we would again conduct model diagnostics but for now just print the model results:

```{r modelHiGasCutsResults}
# NB: we need to force knitr to keep the html (asis)
summary(probitGasCutsM1)
```

In the output above, the first thing we see is the call, this is R reminding us what the model we ran was, what options we specified, etc.

Next we see the deviance residuals, which are a measure of model fit. This part of output shows the distribution of the deviance residuals for individual cases used in the model. Below we discuss how to use summaries of the deviance statistic to asses model fit.
The next part of the output shows the coefficients, their standard errors, the z-statistic (sometimes called a Wald z-statistic), and the associated p-values.
Below the table of coefficients are fit indices, including the null and deviance residuals and the AIC. Later we show an example of how you can use these values to help assess model fit.

Note that as before, this has produced `log odds` which range from -ve values (large negative effect) through 0 (no effect) to +ve values (large positive effect). 

Questions:

 * Q21: What is the AIC score of the model?
 * Q22: What would you conclude from this model?
 
We can use the confint function to obtain confidence intervals for the coefficient estimates. These will be profiled confidence intervals by default, created by profiling the likelihood function. As such, they are not necessarily symmetric.

```{r probitConfint}
confint(probitGasCutsM1)
```

# About

Analysis completed in: `r round(Sys.time() - startTime, 2)` seconds using [knitr](https://cran.r-project.org/package=knitr) in [RStudio](http://www.rstudio.com) with `r R.version.string` running on `r R.version$platform`.