1 Setup

# Log compile time:
startTime <- proc.time()

require(spatialec) # for parameters etc
## Loading required package: spatialec
myPackages <- c(
  "data.table", # data munching
  "ggplot2", # plot linear imputation model results
  "here", # easy path management - https://speakerdeck.com/jennybc/zen-and-the-art-of-workflow-maintenance?slide=49
  "kableExtra", # fancy tables
  "skimr" # descripive stats
)
spatialec::loadLibraries(myPackages) # Beware: this will try to install packages that may not be installed
## Loading required package: data.table
## Loading required package: ggplot2
## Loading required package: here
## here() starts at /Users/ben/git.soton/ba1e12/spatialec
## Loading required package: kableExtra
## Loading required package: skimr
## 
## Attaching package: 'skimr'
## The following object is masked from 'package:kableExtra':
## 
##     kable
## The following object is masked from 'package:stats':
## 
##     filter
## data.table    ggplot2       here kableExtra      skimr 
##       TRUE       TRUE       TRUE       TRUE       TRUE
spatialec::setup()

Additional R packages used in this report (data.table, ggplot2, here, kableExtra, skimr):

  • data.table - for fast data munching (Dowle et al. 2015)
  • ggplot2 - for slick graphics (Wickham 2009)
  • here - for path management
  • kableExtra - for pretty tables (Zhu 2018)
  • knitr - to create this document (Xie 2016)
  • skimr - for data summaries (Arino de la Rubia et al. 2017)

2 Report Purpose

To process NZ Census 2013 data (from ) into a form that can be used in an ipf model of national/regional/local area demand.

It:

  • Loads NZ Census area unit 2013 data
  • processes to ipf form and matching coding to Green Grid survey

3 Load data

NZ Area Unit data is loaded as separate files.

2013 NZ Census data from NZ Stats at area unit level. For simplicity we use one file per constraint:

  • n people
  • n dependent children
  • fuel source (all counted - may cause confusion as sum to > 100% of households)
  • n rooms

We could also use other tables with other constraints if they are available - e.g. N bedrooms.

NB: these files, when downloaded form the NZStats data extractor come with higher levels of aggregation in the tables. These have to be removed by extracting just area unit rows.

3.1 Load areas labels

First load area labels as we use these to select the right data rows.

Table 3.1: Example label data
AU2013_code AU2013_label REGC2013_label nMBs
500100 Awanui Northland Region 8
500202 Karikari Peninsula-Maungataniwha Northland Region 48
500203 Taipa Bay-Mangonui Northland Region 29
500204 Herekino Northland Region 36
500205 Ahipara Northland Region 17
500206 North Cape Northland Region 10

3.2 Fuel source

## Loading: ~/Data/NZ_Census/data/raw/areaUnits/fuelSource/TABLECODE8100_Data_47b7b3fc-0e40-431f-b313-141de4fb0013.csv
## N rows loaded: 69,366
## N unique area units (fuel data): 2020
Table 3.2: Cross check (all years)
fuel_totalHouseholds fuel_totalStatedHouseholds heatSourceCoal heatSourceElectricity heatSourceGas heatSourceOther heatSourceWood
Bottled gas 0 0 0 0 6036 0 0
Coal 0 0 6036 0 0 0 0
Electricity 0 0 0 6036 0 0 0
Mains gas 0 0 0 0 6036 0 0
No fuels used in this dwelling 0 0 0 0 0 6036 0
Not elsewhere included 0 0 0 0 0 6036 0
Other fuel(s) 0 0 0 0 0 6036 0
Solar power 0 0 0 6036 0 0 0
Total dwellings stated 0 6036 0 0 0 0 0
Total dwellings, fuel type used to heat dwelling 6036 0 0 0 0 0 0
Wood 0 0 0 0 0 0 6036
Table 3.2: Summary of 2013 data
AU2013_code AU2013_label fuel_totalHouseholds fuel_totalStatedHouseholds heatSourceCoal heatSourceElectricity heatSourceGas heatSourceOther heatSourceWood
Length:2012 Length:2012 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0 Min. : 0.00 Min. : 0.0
Class :character Class :character 1st Qu.: 233.2 1st Qu.: 219.0 1st Qu.: 6.0 1st Qu.: 156.0 1st Qu.: 45 1st Qu.: 18.00 1st Qu.: 96.0
Mode :character Mode :character Median : 685.5 Median : 646.5 Median : 15.0 Median : 492.0 Median : 138 Median : 51.00 Median : 213.0
NA NA Mean : 776.3 Mean : 733.4 Mean : 30.3 Mean : 592.6 Mean : 201 Mean : 76.61 Mean : 269.5
NA NA 3rd Qu.:1197.0 3rd Qu.:1131.0 3rd Qu.: 30.0 3rd Qu.: 918.0 3rd Qu.: 300 3rd Qu.: 105.75 3rd Qu.: 375.0
NA NA Max. :5568.0 Max. :4920.0 Max. :930.0 Max. :3342.0 Max. :1371 Max. :2202.00 Max. :1764.0
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     0.0   141.0   153.8   154.8   168.0   233.3     123
Table 3.2: Example areas where > 100% households coded (multiple answers allowed so this table may be quite large)
AU2013_code AU2013_label fuel_totalHouseholds fuel_totalStatedHouseholds heatSourceCoal heatSourceElectricity heatSourceGas heatSourceOther heatSourceWood calcSum calcSumPc
500100 Awanui 123 105 3 78 27 27 39 174 141.46
500202 Karikari Peninsula-Maungataniwha 1764 1629 33 837 363 270 930 2433 137.93
500203 Taipa Bay-Mangonui 819 762 3 552 159 135 195 1044 127.47
500204 Herekino 726 660 15 360 138 129 396 1038 142.98
500205 Ahipara 399 342 6 204 99 99 108 516 129.32
500206 North Cape 174 150 0 84 30 57 51 222 127.59
## Warning: Removed 123 rows containing non-finite values (stat_bin).
Distribution of check sum %

Figure 3.1: Distribution of check sum %

3.3 n Kids

## Loading: ~/Data/NZ_Census/data/raw/areaUnits/nKids/TABLECODE8141_Data_e6f03066-7bbf-4ba0-94b0-1821d5a4665a.csv
## N rows loaded: 56,754
## N unique area units (kids data): 2020
Table 3.3: Cross check (all years)
nKids_1m nkids_totalFamilies
Couple with adult child(ren) and dependent child(ren) under 18 only 6036 0
Couple with adult child(ren) and dependent children under 18 and at least one child of unknown dependency status 6036 0
Couple with dependent child(ren) under 18 and at least one child of unknown dependency status 6036 0
Couple with dependent child(ren) under 18 only 6036 0
One parent with adult child(ren) and dependent children under 18 only 6036 0
One parent with adult child(ren), dependent children under 18 and at least one child of unknown dependency status 6036 0
One parent with dependent child(ren) under 18 and at least one child of unknown dependency status 6036 0
One parent with dependent child(ren) under 18 only 6036 0
Total families 0 6036
Table 3.3: Summary of 2013 data
AU2013_code AU2013_label nKids_1m nkids_totalFamilies nKids_0_families
Length:2012 Length:2012 Min. : 0.0 Min. : 0.0 Min. : 0.0
Class :character Class :character 1st Qu.: 69.0 1st Qu.: 168.0 1st Qu.: 96.0
Mode :character Mode :character Median : 222.0 Median : 502.5 Median : 262.5
NA NA Mean : 261.6 Mean : 564.8 Mean : 303.2
NA NA 3rd Qu.: 402.0 3rd Qu.: 861.0 3rd Qu.: 453.0
NA NA Max. :1335.0 Max. :2460.0 Max. :1728.0
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     100     100     100     100     100     100     126
Table 3.3: Example areas where > 100% families coded
AU2013_code AU2013_label nKids_1m nkids_totalFamilies nKids_0_families calcSum calcSumPc
## Warning: Removed 126 rows containing non-finite values (stat_bin).
Distribution of check sum %

Figure 3.2: Distribution of check sum %

3.4 n People

## Loading: ~/Data/NZ_Census/data/raw/areaUnits/nPeople/TABLECODE8169_Data_bfad6f1a-c9af-4adb-a141-e13a83e175d0.csv
## N rows loaded: 57,186
## N unique area units (people data): 2020
Table 3.4: Cross check (all years)
nPeople_1 nPeople_2 nPeople_3 nPeople_4m npeople_totalHouseholds
Eight or more usual residents 0 0 0 6036 0
Five usual residents 0 0 0 6036 0
Four usual residents 0 0 0 6036 0
One usual resident 6036 0 0 0 0
Seven usual residents 0 0 0 6036 0
Six usual residents 0 0 0 6036 0
Three usual residents 0 0 6036 0 0
Total households 0 0 0 0 6036
Two usual residents 0 6036 0 0 0
Table 3.4: Summary of 2013 data
AU2013_code AU2013_label nPeople_1 nPeople_2 nPeople_3 nPeople_4m npeople_totalHouseholds
Length:2012 Length:2012 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0
Class :character Class :character 1st Qu.: 48.0 1st Qu.: 84.0 1st Qu.: 33.0 1st Qu.: 54.0 1st Qu.: 231.0
Mode :character Mode :character Median : 132.0 Median : 228.0 Median :108.0 Median : 168.0 Median : 681.0
NA NA Mean : 176.6 Mean : 262.2 Mean :126.6 Mean : 204.7 Mean : 770.3
NA NA 3rd Qu.: 261.0 3rd Qu.: 390.8 3rd Qu.:198.0 3rd Qu.: 306.0 3rd Qu.:1188.8
NA NA Max. :1797.0 Max. :2199.0 Max. :948.0 Max. :1167.0 Max. :5367.0
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00   99.69  100.00   98.39  100.24  125.00     116
Table 3.4: Example areas where > 100% households coded - check how close to 100% they are
AU2013_code AU2013_label nPeople_1 nPeople_2 nPeople_3 nPeople_4m npeople_totalHouseholds calcSum calcSumPc
500205 Ahipara 93 141 69 90 390 393 100.77
500401 Kaeo 48 54 24 42 165 168 101.82
500700 Omapere and Opononi 69 63 18 33 180 183 101.67
500801 Hokianga North 195 219 93 159 660 666 100.91
501100 Paihia 213 330 93 99 732 735 100.41
501614 Kapiro 186 441 147 252 1023 1026 100.29
## Warning: Removed 116 rows containing non-finite values (stat_bin).
Test check sum %

Figure 3.3: Test check sum %

3.5 n Rooms

## Loading: ~/Data/NZ_Census/data/raw/areaUnits/nRooms/TABLECODE8098_Data_62c5ce5c-23cf-44a2-b25e-b287fe9645e7.csv
## N rows loaded: 69,366
## N unique area units (rooms data): 2020
Table 3.5: Cross check (all years)
nrooms_statedtotalHouseholds nrooms_totalHouseholds nRooms1_4 nRooms5_6 nRooms7m
Eight or more rooms 0 0 0 0 6036
Five rooms 0 0 0 6036 0
Four rooms 0 0 6036 0 0
Not elsewhere included 0 0 0 0 6036
One room 0 0 6036 0 0
Seven rooms 0 0 0 0 6036
Six rooms 0 0 0 6036 0
Three rooms 0 0 6036 0 0
Total dwellings stated 6036 0 0 0 0
Total dwellings, number of rooms 0 6036 0 0 0
Two rooms 0 0 6036 0 0
Table 3.5: Summary of 2013 data
AU2013_code AU2013_label nRooms1_4 nRooms5_6 nRooms7m nrooms_statedtotalHouseholds nrooms_totalHouseholds
Length:2012 Length:2012 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0
Class :character Class :character 1st Qu.: 24.0 1st Qu.: 84.0 1st Qu.: 108.0 1st Qu.: 219.0 1st Qu.: 233.2
Mode :character Mode :character Median : 72.0 Median : 261.0 Median : 297.0 Median : 643.5 Median : 685.5
NA NA Mean : 121.3 Mean : 312.3 Mean : 342.6 Mean : 731.2 Mean : 776.3
NA NA 3rd Qu.: 165.0 3rd Qu.: 489.0 3rd Qu.: 504.8 3rd Qu.:1131.0 3rd Qu.:1197.0
NA NA Max. :4122.0 Max. :1758.0 Max. :1992.0 Max. :4968.0 Max. :5568.0
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00   99.69  100.00   98.50  100.28  137.50     123
Table 3.5: Example areas where > 100% households coded - check how close to 100% they are
AU2013_code AU2013_label nRooms1_4 nRooms5_6 nRooms7m nrooms_statedtotalHouseholds nrooms_totalHouseholds calcSum calcSumPc
500100 Awanui 27 54 45 105 123 126 102.44
500203 Taipa Bay-Mangonui 144 351 330 762 819 825 100.73
500204 Herekino 90 291 351 657 726 732 100.83
500205 Ahipara 69 153 180 342 399 402 100.75
500208 Motutangi-Kareponia 102 258 300 570 657 660 100.46
500301 Kaitaia West 216 477 447 960 1134 1140 100.53
## Warning: Removed 123 rows containing non-finite values (stat_bin).
Test check sum %

Figure 3.4: Test check sum %

4 Combine and save constraint files

## Merging on AU2013_code * AU2013_label
## N rows of data: 2020
## N areas: 2020

4.1 Impute household level kids counts

As the number of kids constraint is derived from a families base we need to impute the same values for the larger base of all households so that the totals of kids + no kids households is consistent with the other constraints. In the absence of household level data from the Census on the number of children in households we assume that:

  • households which are not families do not include children;
  • => the number of households with children = the number of households with children;
  • => we add the difference between the number of households and the nunmber of families to the 0 kids constraint
Table 4.1: Test nKids_0 imputation
AU2013_code nKids_0_families nKids_0 nKids_1m nkids_totalFamilies nkids_totalHouseholds npeople_totalHouseholds
500100 33 81 42 75 123 123
500202 690 1200 522 1212 1722 1722
500203 300 606 165 465 771 771
500204 273 456 267 540 723 723
500205 144 261 129 273 390 390
500206 51 108 60 111 168 168

4.2 Save data

Save data for re-use

## Warning in `[<-.data.table`(x, j = name, value = value): Column 'count.NA'
## does not exist to remove
## Data saved as: ~/Data/NZ_Census/data/processed/2013IpfInput.csv
## Variables saved:
##  [1] "AU2013_code"                  "AU2013_label"                
##  [3] "nRooms1_4"                    "nRooms5_6"                   
##  [5] "nRooms7m"                     "nrooms_statedtotalHouseholds"
##  [7] "nrooms_totalHouseholds"       "calcSum"                     
##  [9] "calcSumPc"                    "nPeople_1"                   
## [11] "nPeople_2"                    "nPeople_3"                   
## [13] "nPeople_4m"                   "npeople_totalHouseholds"     
## [15] "i.calcSum"                    "i.calcSumPc"                 
## [17] "nKids_1m"                     "nkids_totalFamilies"         
## [19] "nKids_0_families"             "i.calcSum.2"                 
## [21] "i.calcSumPc.2"                "fuel_totalHouseholds"        
## [23] "fuel_totalStatedHouseholds"   "heatSourceCoal"              
## [25] "heatSourceElectricity"        "heatSourceGas"               
## [27] "heatSourceOther"              "heatSourceWood"              
## [29] "i.calcSum.1"                  "i.calcSumPc.1"               
## [31] "REGC2013_label"               "nMBs"                        
## [33] "nKids_0"                      "nkids_totalHouseholds"

5 Check household totals

Totals:

These will vary depending on the source table (families vs households vs dwellings etc)

We focus on households/families/dwellings not individuals as the spatial microsimulation will operate at the household level.

6 Acknowledgements

7 About

This Action has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 700386 (SPATIALEC).

This work is (c) 2019 the University of Southampton.

Analysis completed in 45.187 seconds ( 0.75 minutes) using knitr in RStudio with R version 3.5.2 (2018-12-20) running on x86_64-apple-darwin15.6.0.

8 Annexes

8.1 Census data

## Skim summary statistics
##  n obs: 2020 
##  n variables: 34 
## 
## ── Variable type:character ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
##        variable missing complete    n min max empty n_unique
##     AU2013_code       0     2020 2020   6   6     0     2020
##    AU2013_label       0     2020 2020   4  34     0     2020
##  REGC2013_label       0     2020 2020  12  24     0       17
## 
## ── Variable type:integer ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
##                      variable missing complete    n    mean     sd p0
##                       calcSum       8     2012 2020  776.19 624.19  0
##          fuel_totalHouseholds       8     2012 2020  776.29 623.96  0
##    fuel_totalStatedHouseholds       8     2012 2020  733.37 590.5   0
##                heatSourceCoal       8     2012 2020   30.3   62.1   0
##         heatSourceElectricity       8     2012 2020  592.64 498.79  0
##                 heatSourceGas       8     2012 2020  200.97 205.12  0
##               heatSourceOther       8     2012 2020   76.61  98.01  0
##                heatSourceWood       8     2012 2020  269.51 245.25  0
##                     i.calcSum       8     2012 2020  770.13 618.81  0
##                   i.calcSum.1       8     2012 2020 1170.03 898.84  0
##                   i.calcSum.2       8     2012 2020  564.81 452.99  0
##                       nKids_0       8     2012 2020  508.75 433.1   0
##              nKids_0_families       8     2012 2020  303.24 247     0
##                      nKids_1m       8     2012 2020  261.57 220.29  0
##           nkids_totalFamilies       8     2012 2020  564.81 452.99  0
##         nkids_totalHouseholds       8     2012 2020  770.32 618.78  0
##                          nMBs       0     2020 2020   23.09  16.9   1
##                     nPeople_1       8     2012 2020  176.58 168.71  0
##                     nPeople_2       8     2012 2020  262.21 217.39  0
##                     nPeople_3       8     2012 2020  126.64 106.58  0
##                    nPeople_4m       8     2012 2020  204.7  182.03  0
##       npeople_totalHouseholds       8     2012 2020  770.32 618.78  0
##  nrooms_statedtotalHouseholds       8     2012 2020  731.16 589.34  0
##        nrooms_totalHouseholds       8     2012 2020  776.29 623.96  0
##                     nRooms1_4       8     2012 2020  121.33 182.91  0
##                     nRooms5_6       8     2012 2020  312.3  263.44  0
##                      nRooms7m       8     2012 2020  342.56 285.93  0
##     p25    p50     p75 p100     hist
##  231     685.5 1194.75 5562 ▇▅▂▁▁▁▁▁
##  233.25  685.5 1197    5568 ▇▅▂▁▁▁▁▁
##  219     646.5 1131    4920 ▇▅▂▁▁▁▁▁
##    6      15     30     930 ▇▁▁▁▁▁▁▁
##  156     492    918    3342 ▇▅▃▁▁▁▁▁
##   45     138    300    1371 ▇▃▂▁▁▁▁▁
##   18      51    105.75 2202 ▇▁▁▁▁▁▁▁
##   96     213    375    1764 ▇▅▂▁▁▁▁▁
##  231     684   1191    5364 ▇▅▂▁▁▁▁▁
##  393    1071   1773.75 5922 ▇▆▅▂▁▁▁▁
##  168     502.5  861    2460 ▇▅▅▃▁▁▁▁
##  152.25  430.5  771    4929 ▇▃▁▁▁▁▁▁
##   96     262.5  453    1728 ▇▆▃▁▁▁▁▁
##   69     222    402    1335 ▇▅▃▂▁▁▁▁
##  168     502.5  861    2460 ▇▅▅▃▁▁▁▁
##  231     681   1188.75 5367 ▇▅▂▁▁▁▁▁
##    9      20     34     124 ▇▆▃▁▁▁▁▁
##   48     132    261    1797 ▇▃▁▁▁▁▁▁
##   84     228    390.75 2199 ▇▅▁▁▁▁▁▁
##   33     108    198     948 ▇▅▂▁▁▁▁▁
##   54     168    306    1167 ▇▅▃▁▁▁▁▁
##  231     681   1188.75 5367 ▇▅▂▁▁▁▁▁
##  219     643.5 1131    4968 ▇▅▂▁▁▁▁▁
##  233.25  685.5 1197    5568 ▇▅▂▁▁▁▁▁
##   24      72    165    4122 ▇▁▁▁▁▁▁▁
##   84     261    489    1758 ▇▅▃▂▁▁▁▁
##  108     297    504.75 1992 ▇▆▃▁▁▁▁▁
## 
## ── Variable type:numeric ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
##       variable missing complete    n   mean    sd  p0    p25   p50    p75
##      calcSumPc     131     1889 2020  98.5  12.46   0  99.69 100   100.28
##    i.calcSumPc     124     1896 2020  98.39 12.52   0  99.69 100   100.24
##  i.calcSumPc.1     131     1889 2020 154.81 25.82   0 141.04 153.8 167.98
##  i.calcSumPc.2     134     1886 2020 100     0    100 100    100   100   
##    p100     hist
##  137.5  ▁▁▁▁▁▇▁▁
##  125    ▁▁▁▁▁▁▇▁
##  233.33 ▁▁▁▁▅▇▂▁
##  100    ▁▁▁▇▁▁▁▁

References

Arino de la Rubia, Eduardo, Hao Zhu, Shannon Ellis, Elin Waring, and Michael Quinn. 2017. Skimr: Skimr. https://github.com/ropenscilabs/skimr.

Dowle, M, A Srinivasan, T Short, S Lianoglou with contributions from R Saporta, and E Antonyan. 2015. Data.table: Extension of Data.frame. https://CRAN.R-project.org/package=data.table.

Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. http://ggplot2.org.

Xie, Yihui. 2016. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://CRAN.R-project.org/package=knitr.

Zhu, Hao. 2018. KableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.