# Log compile time:
startTime <- proc.time()
require(spatialec) # for parameters etc
## Loading required package: spatialec
myPackages <- c(
"data.table", # data munching
"ggplot2", # plot linear imputation model results
"here", # easy path management - https://speakerdeck.com/jennybc/zen-and-the-art-of-workflow-maintenance?slide=49
"kableExtra", # fancy tables
"skimr" # descripive stats
)
spatialec::loadLibraries(myPackages) # Beware: this will try to install packages that may not be installed
## Loading required package: data.table
## Loading required package: ggplot2
## Loading required package: here
## here() starts at /Users/ben/git.soton/ba1e12/spatialec
## Loading required package: kableExtra
## Loading required package: skimr
##
## Attaching package: 'skimr'
## The following object is masked from 'package:kableExtra':
##
## kable
## The following object is masked from 'package:stats':
##
## filter
## data.table ggplot2 here kableExtra skimr
## TRUE TRUE TRUE TRUE TRUE
spatialec::setup()
Additional R packages used in this report (data.table, ggplot2, here, kableExtra, skimr):
To process NZ Census 2013 data (from ) into a form that can be used in an ipf model of national/regional/local area demand.
It:
NZ Area Unit data is loaded as separate files.
2013 NZ Census data from NZ Stats at area unit level. For simplicity we use one file per constraint:
We could also use other tables with other constraints if they are available - e.g. N bedrooms.
NB: these files, when downloaded form the NZStats data extractor come with higher levels of aggregation in the tables. These have to be removed by extracting just area unit rows.
First load area labels as we use these to select the right data rows.
AU2013_code | AU2013_label | REGC2013_label | nMBs |
---|---|---|---|
500100 | Awanui | Northland Region | 8 |
500202 | Karikari Peninsula-Maungataniwha | Northland Region | 48 |
500203 | Taipa Bay-Mangonui | Northland Region | 29 |
500204 | Herekino | Northland Region | 36 |
500205 | Ahipara | Northland Region | 17 |
500206 | North Cape | Northland Region | 10 |
## Loading: ~/Data/NZ_Census/data/raw/areaUnits/fuelSource/TABLECODE8100_Data_47b7b3fc-0e40-431f-b313-141de4fb0013.csv
## N rows loaded: 69,366
## N unique area units (fuel data): 2020
fuel_totalHouseholds | fuel_totalStatedHouseholds | heatSourceCoal | heatSourceElectricity | heatSourceGas | heatSourceOther | heatSourceWood | |
---|---|---|---|---|---|---|---|
Bottled gas | 0 | 0 | 0 | 0 | 6036 | 0 | 0 |
Coal | 0 | 0 | 6036 | 0 | 0 | 0 | 0 |
Electricity | 0 | 0 | 0 | 6036 | 0 | 0 | 0 |
Mains gas | 0 | 0 | 0 | 0 | 6036 | 0 | 0 |
No fuels used in this dwelling | 0 | 0 | 0 | 0 | 0 | 6036 | 0 |
Not elsewhere included | 0 | 0 | 0 | 0 | 0 | 6036 | 0 |
Other fuel(s) | 0 | 0 | 0 | 0 | 0 | 6036 | 0 |
Solar power | 0 | 0 | 0 | 6036 | 0 | 0 | 0 |
Total dwellings stated | 0 | 6036 | 0 | 0 | 0 | 0 | 0 |
Total dwellings, fuel type used to heat dwelling | 6036 | 0 | 0 | 0 | 0 | 0 | 0 |
Wood | 0 | 0 | 0 | 0 | 0 | 0 | 6036 |
AU2013_code | AU2013_label | fuel_totalHouseholds | fuel_totalStatedHouseholds | heatSourceCoal | heatSourceElectricity | heatSourceGas | heatSourceOther | heatSourceWood | |
---|---|---|---|---|---|---|---|---|---|
Length:2012 | Length:2012 | Min. : 0.0 | Min. : 0.0 | Min. : 0.0 | Min. : 0.0 | Min. : 0 | Min. : 0.00 | Min. : 0.0 | |
Class :character | Class :character | 1st Qu.: 233.2 | 1st Qu.: 219.0 | 1st Qu.: 6.0 | 1st Qu.: 156.0 | 1st Qu.: 45 | 1st Qu.: 18.00 | 1st Qu.: 96.0 | |
Mode :character | Mode :character | Median : 685.5 | Median : 646.5 | Median : 15.0 | Median : 492.0 | Median : 138 | Median : 51.00 | Median : 213.0 | |
NA | NA | Mean : 776.3 | Mean : 733.4 | Mean : 30.3 | Mean : 592.6 | Mean : 201 | Mean : 76.61 | Mean : 269.5 | |
NA | NA | 3rd Qu.:1197.0 | 3rd Qu.:1131.0 | 3rd Qu.: 30.0 | 3rd Qu.: 918.0 | 3rd Qu.: 300 | 3rd Qu.: 105.75 | 3rd Qu.: 375.0 | |
NA | NA | Max. :5568.0 | Max. :4920.0 | Max. :930.0 | Max. :3342.0 | Max. :1371 | Max. :2202.00 | Max. :1764.0 |
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 141.0 153.8 154.8 168.0 233.3 123
AU2013_code | AU2013_label | fuel_totalHouseholds | fuel_totalStatedHouseholds | heatSourceCoal | heatSourceElectricity | heatSourceGas | heatSourceOther | heatSourceWood | calcSum | calcSumPc |
---|---|---|---|---|---|---|---|---|---|---|
500100 | Awanui | 123 | 105 | 3 | 78 | 27 | 27 | 39 | 174 | 141.46 |
500202 | Karikari Peninsula-Maungataniwha | 1764 | 1629 | 33 | 837 | 363 | 270 | 930 | 2433 | 137.93 |
500203 | Taipa Bay-Mangonui | 819 | 762 | 3 | 552 | 159 | 135 | 195 | 1044 | 127.47 |
500204 | Herekino | 726 | 660 | 15 | 360 | 138 | 129 | 396 | 1038 | 142.98 |
500205 | Ahipara | 399 | 342 | 6 | 204 | 99 | 99 | 108 | 516 | 129.32 |
500206 | North Cape | 174 | 150 | 0 | 84 | 30 | 57 | 51 | 222 | 127.59 |
## Warning: Removed 123 rows containing non-finite values (stat_bin).
Figure 3.1: Distribution of check sum %
## Loading: ~/Data/NZ_Census/data/raw/areaUnits/nKids/TABLECODE8141_Data_e6f03066-7bbf-4ba0-94b0-1821d5a4665a.csv
## N rows loaded: 56,754
## N unique area units (kids data): 2020
nKids_1m | nkids_totalFamilies | |
---|---|---|
Couple with adult child(ren) and dependent child(ren) under 18 only | 6036 | 0 |
Couple with adult child(ren) and dependent children under 18 and at least one child of unknown dependency status | 6036 | 0 |
Couple with dependent child(ren) under 18 and at least one child of unknown dependency status | 6036 | 0 |
Couple with dependent child(ren) under 18 only | 6036 | 0 |
One parent with adult child(ren) and dependent children under 18 only | 6036 | 0 |
One parent with adult child(ren), dependent children under 18 and at least one child of unknown dependency status | 6036 | 0 |
One parent with dependent child(ren) under 18 and at least one child of unknown dependency status | 6036 | 0 |
One parent with dependent child(ren) under 18 only | 6036 | 0 |
Total families | 0 | 6036 |
AU2013_code | AU2013_label | nKids_1m | nkids_totalFamilies | nKids_0_families | |
---|---|---|---|---|---|
Length:2012 | Length:2012 | Min. : 0.0 | Min. : 0.0 | Min. : 0.0 | |
Class :character | Class :character | 1st Qu.: 69.0 | 1st Qu.: 168.0 | 1st Qu.: 96.0 | |
Mode :character | Mode :character | Median : 222.0 | Median : 502.5 | Median : 262.5 | |
NA | NA | Mean : 261.6 | Mean : 564.8 | Mean : 303.2 | |
NA | NA | 3rd Qu.: 402.0 | 3rd Qu.: 861.0 | 3rd Qu.: 453.0 | |
NA | NA | Max. :1335.0 | Max. :2460.0 | Max. :1728.0 |
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 100 100 100 100 100 100 126
AU2013_code | AU2013_label | nKids_1m | nkids_totalFamilies | nKids_0_families | calcSum | calcSumPc |
---|---|---|---|---|---|---|
## Warning: Removed 126 rows containing non-finite values (stat_bin).
Figure 3.2: Distribution of check sum %
## Loading: ~/Data/NZ_Census/data/raw/areaUnits/nPeople/TABLECODE8169_Data_bfad6f1a-c9af-4adb-a141-e13a83e175d0.csv
## N rows loaded: 57,186
## N unique area units (people data): 2020
nPeople_1 | nPeople_2 | nPeople_3 | nPeople_4m | npeople_totalHouseholds | |
---|---|---|---|---|---|
Eight or more usual residents | 0 | 0 | 0 | 6036 | 0 |
Five usual residents | 0 | 0 | 0 | 6036 | 0 |
Four usual residents | 0 | 0 | 0 | 6036 | 0 |
One usual resident | 6036 | 0 | 0 | 0 | 0 |
Seven usual residents | 0 | 0 | 0 | 6036 | 0 |
Six usual residents | 0 | 0 | 0 | 6036 | 0 |
Three usual residents | 0 | 0 | 6036 | 0 | 0 |
Total households | 0 | 0 | 0 | 0 | 6036 |
Two usual residents | 0 | 6036 | 0 | 0 | 0 |
AU2013_code | AU2013_label | nPeople_1 | nPeople_2 | nPeople_3 | nPeople_4m | npeople_totalHouseholds | |
---|---|---|---|---|---|---|---|
Length:2012 | Length:2012 | Min. : 0.0 | Min. : 0.0 | Min. : 0.0 | Min. : 0.0 | Min. : 0.0 | |
Class :character | Class :character | 1st Qu.: 48.0 | 1st Qu.: 84.0 | 1st Qu.: 33.0 | 1st Qu.: 54.0 | 1st Qu.: 231.0 | |
Mode :character | Mode :character | Median : 132.0 | Median : 228.0 | Median :108.0 | Median : 168.0 | Median : 681.0 | |
NA | NA | Mean : 176.6 | Mean : 262.2 | Mean :126.6 | Mean : 204.7 | Mean : 770.3 | |
NA | NA | 3rd Qu.: 261.0 | 3rd Qu.: 390.8 | 3rd Qu.:198.0 | 3rd Qu.: 306.0 | 3rd Qu.:1188.8 | |
NA | NA | Max. :1797.0 | Max. :2199.0 | Max. :948.0 | Max. :1167.0 | Max. :5367.0 |
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 99.69 100.00 98.39 100.24 125.00 116
AU2013_code | AU2013_label | nPeople_1 | nPeople_2 | nPeople_3 | nPeople_4m | npeople_totalHouseholds | calcSum | calcSumPc |
---|---|---|---|---|---|---|---|---|
500205 | Ahipara | 93 | 141 | 69 | 90 | 390 | 393 | 100.77 |
500401 | Kaeo | 48 | 54 | 24 | 42 | 165 | 168 | 101.82 |
500700 | Omapere and Opononi | 69 | 63 | 18 | 33 | 180 | 183 | 101.67 |
500801 | Hokianga North | 195 | 219 | 93 | 159 | 660 | 666 | 100.91 |
501100 | Paihia | 213 | 330 | 93 | 99 | 732 | 735 | 100.41 |
501614 | Kapiro | 186 | 441 | 147 | 252 | 1023 | 1026 | 100.29 |
## Warning: Removed 116 rows containing non-finite values (stat_bin).
Figure 3.3: Test check sum %
## Loading: ~/Data/NZ_Census/data/raw/areaUnits/nRooms/TABLECODE8098_Data_62c5ce5c-23cf-44a2-b25e-b287fe9645e7.csv
## N rows loaded: 69,366
## N unique area units (rooms data): 2020
nrooms_statedtotalHouseholds | nrooms_totalHouseholds | nRooms1_4 | nRooms5_6 | nRooms7m | |
---|---|---|---|---|---|
Eight or more rooms | 0 | 0 | 0 | 0 | 6036 |
Five rooms | 0 | 0 | 0 | 6036 | 0 |
Four rooms | 0 | 0 | 6036 | 0 | 0 |
Not elsewhere included | 0 | 0 | 0 | 0 | 6036 |
One room | 0 | 0 | 6036 | 0 | 0 |
Seven rooms | 0 | 0 | 0 | 0 | 6036 |
Six rooms | 0 | 0 | 0 | 6036 | 0 |
Three rooms | 0 | 0 | 6036 | 0 | 0 |
Total dwellings stated | 6036 | 0 | 0 | 0 | 0 |
Total dwellings, number of rooms | 0 | 6036 | 0 | 0 | 0 |
Two rooms | 0 | 0 | 6036 | 0 | 0 |
AU2013_code | AU2013_label | nRooms1_4 | nRooms5_6 | nRooms7m | nrooms_statedtotalHouseholds | nrooms_totalHouseholds | |
---|---|---|---|---|---|---|---|
Length:2012 | Length:2012 | Min. : 0.0 | Min. : 0.0 | Min. : 0.0 | Min. : 0.0 | Min. : 0.0 | |
Class :character | Class :character | 1st Qu.: 24.0 | 1st Qu.: 84.0 | 1st Qu.: 108.0 | 1st Qu.: 219.0 | 1st Qu.: 233.2 | |
Mode :character | Mode :character | Median : 72.0 | Median : 261.0 | Median : 297.0 | Median : 643.5 | Median : 685.5 | |
NA | NA | Mean : 121.3 | Mean : 312.3 | Mean : 342.6 | Mean : 731.2 | Mean : 776.3 | |
NA | NA | 3rd Qu.: 165.0 | 3rd Qu.: 489.0 | 3rd Qu.: 504.8 | 3rd Qu.:1131.0 | 3rd Qu.:1197.0 | |
NA | NA | Max. :4122.0 | Max. :1758.0 | Max. :1992.0 | Max. :4968.0 | Max. :5568.0 |
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 99.69 100.00 98.50 100.28 137.50 123
AU2013_code | AU2013_label | nRooms1_4 | nRooms5_6 | nRooms7m | nrooms_statedtotalHouseholds | nrooms_totalHouseholds | calcSum | calcSumPc |
---|---|---|---|---|---|---|---|---|
500100 | Awanui | 27 | 54 | 45 | 105 | 123 | 126 | 102.44 |
500203 | Taipa Bay-Mangonui | 144 | 351 | 330 | 762 | 819 | 825 | 100.73 |
500204 | Herekino | 90 | 291 | 351 | 657 | 726 | 732 | 100.83 |
500205 | Ahipara | 69 | 153 | 180 | 342 | 399 | 402 | 100.75 |
500208 | Motutangi-Kareponia | 102 | 258 | 300 | 570 | 657 | 660 | 100.46 |
500301 | Kaitaia West | 216 | 477 | 447 | 960 | 1134 | 1140 | 100.53 |
## Warning: Removed 123 rows containing non-finite values (stat_bin).
Figure 3.4: Test check sum %
## Merging on AU2013_code * AU2013_label
## N rows of data: 2020
## N areas: 2020
As the number of kids constraint is derived from a families base we need to impute the same values for the larger base of all households so that the totals of kids + no kids households is consistent with the other constraints. In the absence of household level data from the Census on the number of children in households we assume that:
0 kids
constraintAU2013_code | nKids_0_families | nKids_0 | nKids_1m | nkids_totalFamilies | nkids_totalHouseholds | npeople_totalHouseholds |
---|---|---|---|---|---|---|
500100 | 33 | 81 | 42 | 75 | 123 | 123 |
500202 | 690 | 1200 | 522 | 1212 | 1722 | 1722 |
500203 | 300 | 606 | 165 | 465 | 771 | 771 |
500204 | 273 | 456 | 267 | 540 | 723 | 723 |
500205 | 144 | 261 | 129 | 273 | 390 | 390 |
500206 | 51 | 108 | 60 | 111 | 168 | 168 |
Save data for re-use
## Warning in `[<-.data.table`(x, j = name, value = value): Column 'count.NA'
## does not exist to remove
## Data saved as: ~/Data/NZ_Census/data/processed/2013IpfInput.csv
## Variables saved:
## [1] "AU2013_code" "AU2013_label"
## [3] "nRooms1_4" "nRooms5_6"
## [5] "nRooms7m" "nrooms_statedtotalHouseholds"
## [7] "nrooms_totalHouseholds" "calcSum"
## [9] "calcSumPc" "nPeople_1"
## [11] "nPeople_2" "nPeople_3"
## [13] "nPeople_4m" "npeople_totalHouseholds"
## [15] "i.calcSum" "i.calcSumPc"
## [17] "nKids_1m" "nkids_totalFamilies"
## [19] "nKids_0_families" "i.calcSum.2"
## [21] "i.calcSumPc.2" "fuel_totalHouseholds"
## [23] "fuel_totalStatedHouseholds" "heatSourceCoal"
## [25] "heatSourceElectricity" "heatSourceGas"
## [27] "heatSourceOther" "heatSourceWood"
## [29] "i.calcSum.1" "i.calcSumPc.1"
## [31] "REGC2013_label" "nMBs"
## [33] "nKids_0" "nkids_totalHouseholds"
Totals:
These will vary depending on the source table (families vs households vs dwellings etc)
We focus on households/families/dwellings not individuals as the spatial microsimulation will operate at the household level.
This Action has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 700386 (SPATIALEC).
This work is (c) 2019 the University of Southampton.
Analysis completed in 45.187 seconds ( 0.75 minutes) using knitr in RStudio with R version 3.5.2 (2018-12-20) running on x86_64-apple-darwin15.6.0.
## Skim summary statistics
## n obs: 2020
## n variables: 34
##
## ── Variable type:character ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## variable missing complete n min max empty n_unique
## AU2013_code 0 2020 2020 6 6 0 2020
## AU2013_label 0 2020 2020 4 34 0 2020
## REGC2013_label 0 2020 2020 12 24 0 17
##
## ── Variable type:integer ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## variable missing complete n mean sd p0
## calcSum 8 2012 2020 776.19 624.19 0
## fuel_totalHouseholds 8 2012 2020 776.29 623.96 0
## fuel_totalStatedHouseholds 8 2012 2020 733.37 590.5 0
## heatSourceCoal 8 2012 2020 30.3 62.1 0
## heatSourceElectricity 8 2012 2020 592.64 498.79 0
## heatSourceGas 8 2012 2020 200.97 205.12 0
## heatSourceOther 8 2012 2020 76.61 98.01 0
## heatSourceWood 8 2012 2020 269.51 245.25 0
## i.calcSum 8 2012 2020 770.13 618.81 0
## i.calcSum.1 8 2012 2020 1170.03 898.84 0
## i.calcSum.2 8 2012 2020 564.81 452.99 0
## nKids_0 8 2012 2020 508.75 433.1 0
## nKids_0_families 8 2012 2020 303.24 247 0
## nKids_1m 8 2012 2020 261.57 220.29 0
## nkids_totalFamilies 8 2012 2020 564.81 452.99 0
## nkids_totalHouseholds 8 2012 2020 770.32 618.78 0
## nMBs 0 2020 2020 23.09 16.9 1
## nPeople_1 8 2012 2020 176.58 168.71 0
## nPeople_2 8 2012 2020 262.21 217.39 0
## nPeople_3 8 2012 2020 126.64 106.58 0
## nPeople_4m 8 2012 2020 204.7 182.03 0
## npeople_totalHouseholds 8 2012 2020 770.32 618.78 0
## nrooms_statedtotalHouseholds 8 2012 2020 731.16 589.34 0
## nrooms_totalHouseholds 8 2012 2020 776.29 623.96 0
## nRooms1_4 8 2012 2020 121.33 182.91 0
## nRooms5_6 8 2012 2020 312.3 263.44 0
## nRooms7m 8 2012 2020 342.56 285.93 0
## p25 p50 p75 p100 hist
## 231 685.5 1194.75 5562 ▇▅▂▁▁▁▁▁
## 233.25 685.5 1197 5568 ▇▅▂▁▁▁▁▁
## 219 646.5 1131 4920 ▇▅▂▁▁▁▁▁
## 6 15 30 930 ▇▁▁▁▁▁▁▁
## 156 492 918 3342 ▇▅▃▁▁▁▁▁
## 45 138 300 1371 ▇▃▂▁▁▁▁▁
## 18 51 105.75 2202 ▇▁▁▁▁▁▁▁
## 96 213 375 1764 ▇▅▂▁▁▁▁▁
## 231 684 1191 5364 ▇▅▂▁▁▁▁▁
## 393 1071 1773.75 5922 ▇▆▅▂▁▁▁▁
## 168 502.5 861 2460 ▇▅▅▃▁▁▁▁
## 152.25 430.5 771 4929 ▇▃▁▁▁▁▁▁
## 96 262.5 453 1728 ▇▆▃▁▁▁▁▁
## 69 222 402 1335 ▇▅▃▂▁▁▁▁
## 168 502.5 861 2460 ▇▅▅▃▁▁▁▁
## 231 681 1188.75 5367 ▇▅▂▁▁▁▁▁
## 9 20 34 124 ▇▆▃▁▁▁▁▁
## 48 132 261 1797 ▇▃▁▁▁▁▁▁
## 84 228 390.75 2199 ▇▅▁▁▁▁▁▁
## 33 108 198 948 ▇▅▂▁▁▁▁▁
## 54 168 306 1167 ▇▅▃▁▁▁▁▁
## 231 681 1188.75 5367 ▇▅▂▁▁▁▁▁
## 219 643.5 1131 4968 ▇▅▂▁▁▁▁▁
## 233.25 685.5 1197 5568 ▇▅▂▁▁▁▁▁
## 24 72 165 4122 ▇▁▁▁▁▁▁▁
## 84 261 489 1758 ▇▅▃▂▁▁▁▁
## 108 297 504.75 1992 ▇▆▃▁▁▁▁▁
##
## ── Variable type:numeric ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## variable missing complete n mean sd p0 p25 p50 p75
## calcSumPc 131 1889 2020 98.5 12.46 0 99.69 100 100.28
## i.calcSumPc 124 1896 2020 98.39 12.52 0 99.69 100 100.24
## i.calcSumPc.1 131 1889 2020 154.81 25.82 0 141.04 153.8 167.98
## i.calcSumPc.2 134 1886 2020 100 0 100 100 100 100
## p100 hist
## 137.5 ▁▁▁▁▁▇▁▁
## 125 ▁▁▁▁▁▁▇▁
## 233.33 ▁▁▁▁▅▇▂▁
## 100 ▁▁▁▇▁▁▁▁
Arino de la Rubia, Eduardo, Hao Zhu, Shannon Ellis, Elin Waring, and Michael Quinn. 2017. Skimr: Skimr. https://github.com/ropenscilabs/skimr.
Dowle, M, A Srinivasan, T Short, S Lianoglou with contributions from R Saporta, and E Antonyan. 2015. Data.table: Extension of Data.frame. https://CRAN.R-project.org/package=data.table.
Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. http://ggplot2.org.
Xie, Yihui. 2016. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://CRAN.R-project.org/package=knitr.
Zhu, Hao. 2018. KableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.