diff --git a/EPCsAndCarbon/carbonCosts.Rmd b/EPCsAndCarbon/carbonCosts.Rmd
index fc0f5d50da6c3501e4b6f50bb8d7c14c329a3774..7cb938af4f950e21328b2fe80cd33733c1e2fea3 100644
--- a/EPCsAndCarbon/carbonCosts.Rmd
+++ b/EPCsAndCarbon/carbonCosts.Rmd
@@ -26,6 +26,7 @@ output:
     toc_depth: 2
     fig_width: 5
 bibliography: '`r path.expand("~/github/dataknut/refs/refs.bib")`'
+always_allow_html: true
 ---
 
 ```{r setup, include=FALSE}
@@ -34,6 +35,7 @@ knitr::opts_chunk$set(echo = TRUE)
 library(data.table)
 library(ggplot2)
 library(kableExtra)
+library(readxl)
 ```
 
 # Energy Performance Certificates (EPCs)
@@ -42,7 +44,12 @@ Apart from a few exempted buildings, a dwelling must have an EPC when constructe
 
 EPCs are not necessarily up to date. For example if a property has not been sold or let since a major upgrade, the effects of that upgrade may not be visible in the data.
 
-> check what feeds in automatically e.f. RGI installs etc
+Further reading: 
+
+ * https://epc.opendatacommunities.org/docs/guidance#technical_notes
+ * https://en.wikipedia.org/wiki/Energy_Performance_Certificate_(United_Kingdom)#Procedure
+
+> check what feeds in automatically e.f. RHI installs etc
 
 We have to assume the data we have is the _current state of play_ for these dwellings.
 
@@ -58,7 +65,7 @@ The EPC data file has `r nrow(allEPCs_DT)` records for Southampton and `r ncol(a
  * PROPERTY_TYPE: Describes the type of property such as House, Flat, Maisonette etc. This is the type differentiator for dwellings;
  * BUILT_FORM: The building type of the Property e.g. Detached, Semi-Detached, Terrace etc. Together with the Property Type, the Build Form produces a structured description of the property;
  * ENVIRONMENT_IMPACT_CURRENT: A measure of the property's current impact on the environment in terms of carbon dioxide (COâ‚‚) emissions. The higher the rating the lower the COâ‚‚ emissions. (COâ‚‚ emissions in tonnes / year) **NB this is a categorised scale calculated from CO2_EMISSIONS_CURRENT**;
- * ENERGY_CONSUMPTION_CURRENT: Current estimated total energy consumption for the property in a 12 month period (kWh/m2). Displayed on EPC as the current primary energy use per square metre of floor area. **Nb: this covers heat and hot water (and lightng?)**
+ * ENERGY_CONSUMPTION_CURRENT: Current estimated total energy consumption for the property in a 12 month period (**kWh/m2**). Displayed on EPC as the current primary energy use per square metre of floor area. **Nb: this covers heat and hot water (and lightng?)**
  * CO2_EMISSIONS_CURRENT: COâ‚‚ emissions per year in tonnes/year **NB: this is calculated from the modelled kWh energy input using (possibly) outdated carbon intensity values**;
  * TENURE: Describes the tenure type of the property. One of: Owner-occupied; Rented (social); Rented (private).
  
@@ -66,7 +73,7 @@ The EPC data file has `r nrow(allEPCs_DT)` records for Southampton and `r ncol(a
  
   * WIND_TURBINE_COUNT: Number of wind turbines; 0 if none;
   * PHOTO_SUPPLY: Percentage of photovoltaic area as a percentage of total roof area. 0% indicates that a Photovoltaic Supply is not present in the property;
-  * TOTAL_FLOOR_AREA: The total useful floor area is the total of all enclosed spaces measured to the internal face of the external walls, i.e. the gross floor area as measured in accordance with the guidance issued from time to time by the Royal Institute of Chartered Surveyors or by a body replacing that institution. (m²);
+  * TOTAL_FLOOR_AREA: The total useful floor area is the total of all enclosed spaces measured to the internal face of the external walls, i.e. the gross floor area as measured in accordance with the guidance issued from time to time by the Royal Institute of Chartered Surveyors or by a body replacing that institution. (m²) - to allow for the calculation of total energy demand;
   * POSTCODE - to allow linkage to other datasets
   * LOCAL_AUTHORITY_LABEL - for checking
   
@@ -110,14 +117,30 @@ As we can see that we have `r uniqueN(dt$BUILDING_REFERENCE_NUMBER)` unique prop
 
 This is not surprising since the kWh/y and TCO2/y values are estimated using a model but before we go any further we'd better check if these are significant in number.
 
-## Check missing property rates
+## Check 'missing' EPC rates
+
+We will do this mostly at MSOA level as it allows us to link to other MSOA level datasets. Arguably it would be better to do this at LSOA level but...
 
 First we'll use the BEIS 2018 MSOA level annual electricity data to estimate the number of meters (not properties) - some addresses can have 2 meters (e.g. standard & economy 7). This is more useful than the number of gas meters since not all dwellings have mains gas but all have an electricity meter.
 
 ```{r, checkBEIS}
 beisElecDT <- data.table::fread("~/data/beis/MSOA_DOM_ELEC_csv/MSOA_ELEC_2018.csv")
-sotonElecDT <- beisElecDT[LAName %like% "Southampton"]
-sotonElecDT[, .(nElecMeters = sum(METERS)), keyby = .(LAName)]
+sotonElecDT <- beisElecDT[LAName %like% "Southampton", .(nElecMeters = METERS,  
+                                                         beisElecMWh = KWH/1000, 
+                                                         MSOACode, LAName)
+                          ]
+
+
+beisGasDT <- data.table::fread("~/data/beis/MSOA_DOM_GAS_csv/MSOA_GAS_2018.csv")
+sotonGasDT <- beisGasDT[LAName %like% "Southampton", .(nGasMeters = METERS,  
+                                                         beisGasMWh = KWH/1000, 
+                                                         MSOACode)]
+
+setkey(sotonElecDT, MSOACode)
+setkey(sotonGasDT, MSOACode)
+sotonEnergyDT <- sotonGasDT[sotonElecDT]
+sotonEnergyDT[, beisEnergyMWh := beisElecMWh + beisGasMWh]
+#head(sotonEnergyDT)
 ```
 
 Next we'll check for the number of households reported by the 2011 Census.
@@ -134,7 +157,7 @@ setkey(deprivationDT, MSOACode)
 setkey(sotonElecDT, MSOACode)
 # link LA name from Soton elec for now
 sotonDep_DT <- deprivationDT[sotonElecDT[, .(MSOACode, LAName)]]
-sotonDep_DT[, nHHs_deprivationn := `Household Deprivation: All categories: Classification of household deprivation; measures: Value`]
+sotonDep_DT[, nHHs_deprivation := `Household Deprivation: All categories: Classification of household deprivation; measures: Value`]
 
 #sotonDep_DT[, .(nHouseholds = sum(totalHouseholds)), keyby = .(LAName)]
 
@@ -159,7 +182,7 @@ setkey(sotonTenureDT, MSOACode)
 
 sotonCensus2011_DT <- sotonTenureDT[sotonDep_DT]
 
-t <- sotonCensus2011_DT[, .(sum_Deprivation = sum(nHHs_deprivationn),
+t <- sotonCensus2011_DT[, .(sum_Deprivation = sum(nHHs_deprivation),
                             sum_Tenure = sum(nHHs_tenure)), keyby = .(LAName)]
 kableExtra::kable(t, caption = "Census derived household counts")
 ```
@@ -171,11 +194,23 @@ That's lower (as expected) but doesn't allow for dwellings that were empty on ce
 # but we can use it to check which Soton postcodes are missing from the EPC file
 soPostcodesDT <- data.table::fread(path.expand("~/data/UK_postcodes/NSPL_AUG_2020_UK/Data/multi_csv/NSPL_AUG_2020_UK_SO.csv"))
 
-soPostcodesDT <- soPostcodesDT[!is.na(doterm)] # keep current
+soPostcodesDT <- soPostcodesDT[is.na(doterm)] # keep current
 
 sotonPostcodesDT <- soPostcodesDT[laua == "E06000045"] # keep Southampton City
 
+sotonPostcodesReducedDT <- sotonPostcodesDT[, .(pcd, pcd2, pcds, laua, msoa11, lsoa11)]
+
+sotonPostcodesReducedDT[, c("pc_chunk1","pc_chunk2" ) := tstrsplit(pcds, 
+                                                                   split = " "
+                                                                   )
+                        ]
+sotonPostcodesReducedDT[, .(nEPCs = .N), keyby = .(pc_chunk1)]
+```
+We should not have single digit postcodes in the postcode data - i.e. S01 should not be there (since 1993). Southampton City is unusual in only having [double digit postcodes](https://en.wikipedia.org/wiki/SO_postcode_area).
+
+```{r, aggregateEPCsToPostcodes}
 # EPC
+# set up counters
 sotonUniqueEPCsDT[, epcIsSocialRent := ifelse(TENURE == "rental (social)", 1, 0)]
 sotonUniqueEPCsDT[, epcIsPrivateRent := ifelse(TENURE == "rental (private)", 1, 0)]
 sotonUniqueEPCsDT[, epcIsOwnerOcc := ifelse(TENURE == "owner-occupied", 1, 0)]
@@ -183,63 +218,175 @@ sotonUniqueEPCsDT[, epcIsUnknownTenure := ifelse(TENURE == "NO DATA!" |
                                           TENURE == "" , 1, 0)]
 # aggregate EPCs to postcodes
 sotonEpcPostcodes_DT <- sotonUniqueEPCsDT[, .(nEPCs = .N,
+                                              sumEPC_tCO2 = sum(CO2_EMISSIONS_CURRENT, na.rm = TRUE),
                                      n_epcIsSocialRent = sum(epcIsSocialRent, na.rm = TRUE),
                                      n_epcIsPrivateRent = sum(epcIsPrivateRent, na.rm = TRUE),
                                      n_epcIsOwnerOcc = sum(epcIsOwnerOcc, na.rm = TRUE),
                                      n_epcIsUnknownTenure = sum(epcIsUnknownTenure, na.rm = TRUE),
-                               sumEpcMWh = sum(ENERGY_CONSUMPTION_CURRENT)/1000),
+                               sumEpcMWh = sum(ENERGY_CONSUMPTION_CURRENT* TOTAL_FLOOR_AREA)/1000), # crucial as ENERGY_CONSUMPTION_CURRENT = kWh/m2
                            keyby = .(POSTCODE, LOCAL_AUTHORITY_LABEL)]
 
+sotonEpcPostcodes_DT[, c("pc_chunk1","pc_chunk2" ) := tstrsplit(POSTCODE, 
+                                                                   split = " "
+                                                                   )
+                        ]
+sotonEpcPostcodes_DT[, .(nEPCs = .N), keyby = .(pc_chunk1)]
+
+# check original EPC data for Soton - which postcodes are covered?
+allEPCs_DT[, c("pc_chunk1","pc_chunk2" ) := tstrsplit(POSTCODE, 
+                                                                   split = " "
+                                                                   )
+                        ]
+allEPCs_DT[, .(nEPCs = .N), keyby = .(pc_chunk1)]
+```
+It looks like we have EPCs for each postcode sector which is good.
+
+
+```{r, matchPostcodesToEPCPostcodes}
 # match the EPC postcode summaries to the postcode extract
+sotonPostcodesReducedDT[, POSTCODE_s := stringr::str_remove(pcds, " ")]
+setkey(sotonPostcodesReducedDT, POSTCODE_s)
+sotonPostcodesReducedDT[, MSOACode := msoa11]
+message("Number of postcodes: ",uniqueN(sotonPostcodesReducedDT$POSTCODE_s))
+
 sotonEpcPostcodes_DT[, POSTCODE_s := stringr::str_remove(POSTCODE, " ")]
 setkey(sotonEpcPostcodes_DT, POSTCODE_s)
-#head(sotonEpcPostcodes_DT)
-
-pcDT <- sotonPostcodesDT[, .(pcd, pcd2, pcds, MSOACode = msoa11)] # only keep postcode vars we need
-pcDT[, c("pc_chunk1","pc_chunk2" ) := tstrsplit(pcds, split = " ")]
-table(pcDT$pc_chunk1)
-pcDT[, POSTCODE_s := stringr::str_remove(pcds, " ")]
-setkey(pcDT, POSTCODE_s)
-#head(pcDT)
-
-sotonEpcPostcodes_DT[, c("pc_chunk1","pc_chunk2" ) := tstrsplit(POSTCODE, split = " ")]
-table(sotonEpcPostcodes_DT$pc_chunk1)
-
-dt <- sotonEpcPostcodes_DT[pcDT]
-
-sotonEpcMSOA_DT <- dt[, .(nEPCs = sum(nEPCs), 
-                                        n_epcIsSocialRent = sum(n_epcIsSocialRent),
-                                        n_epcIsPrivateRent = sum(n_epcIsPrivateRent),
-                                        n_epcIsOwnerOcc = sum(n_epcIsOwnerOcc),
-                                        n_epcIsUnknownTenure = sum(n_epcIsUnknownTenure),
-                                        sumMWh = sum(sumMWh)
+message("Number of postcodes with EPCs: ",uniqueN(sotonEpcPostcodes_DT$POSTCODE_s))
+
+dt <- sotonEpcPostcodes_DT[sotonPostcodesReducedDT]
+
+# aggregate to MSOA - watch for NAs where no EPCs in a given postcode
+sotonEpcMSOA_DT <- dt[, .(nEPCs = sum(nEPCs, na.rm = TRUE), 
+                          sumEPC_tCO2 = sum(sumEPC_tCO2, na.rm = TRUE),
+                                        n_epcIsSocialRent = sum(n_epcIsSocialRent, na.rm = TRUE),
+                                        n_epcIsPrivateRent = sum(n_epcIsPrivateRent, na.rm = TRUE),
+                                        n_epcIsOwnerOcc = sum(n_epcIsOwnerOcc, na.rm = TRUE),
+                                        n_epcIsUnknownTenure = sum(n_epcIsUnknownTenure, na.rm = TRUE),
+                                        sumEpcMWh = sum(sumEpcMWh, na.rm = TRUE)
                                         ),
                                     keyby = .(MSOACode) # change name on the fly for easier matching
                                     ] 
+
+#summary(sotonEpcMSOA_DT)
 ```
 
+So we have some postcodes with no EPCs.
+
 Join the estimates together at MSOA level for comparison. There are `r uniqueN(sotonElecDT$MSOACode)` MSOAs in Southampton.
 
-```{r, compareEpcEstimates}
+
+```{r, joinMSOA}
 # 32 LSOAs in Soton
 # add deprivation
-setkey(sotonElecDT, MSOACode)
+setkey(sotonEnergyDT, MSOACode)
 setkey(sotonCensus2011_DT, MSOACode)
 setkey(sotonEpcMSOA_DT, MSOACode)
 
-sotonMSOA_DT <- sotonCensus2011_DT[sotonElecDT]
+sotonMSOA_DT <- sotonCensus2011_DT[sotonEnergyDT]
 #names(sotonMSOA_DT)
 sotonMSOA_DT <- sotonEpcMSOA_DT[sotonMSOA_DT]
 #names(sotonMSOA_DT)
 
+# add MSOA names from the postcode LUT
+
+msoaNamesDT <- data.table::as.data.table(readxl::read_xlsx(path.expand("~/data/UK_postcodes/NSPL_AUG_2020_UK/Documents/MSOA (2011) names and codes UK as at 12_12.xlsx")))
+msoaNamesDT[, MSOACode := MSOA11CD]
+msoaNamesDT[, MSOAName := MSOA11NM]
+setkey(msoaNamesDT, MSOACode)
+
+sotonMSOA_DT <- msoaNamesDT[sotonMSOA_DT]
+
+#names(sotonMSOA_DT)
+```
+
+```{r, compareEpcEstimates}
 t <- sotonMSOA_DT[, .(nHouseholds_2011 = sum(nHHs_tenure),
-                      nElecMeters_2018 = sum(METERS),
+                      nElecMeters_2018 = sum(nElecMeters),
                       nEPCs_2020 = sum(nEPCs),
-                      sumMWh = sum(KWH)/1000)]
+                      sumEPCMWh = sum(sumEpcMWh),
+                  sumBEISMWh = sum(beisEnergyMWh),
+                  sumEPC_tCO2 = sum(sumEPC_tCO2)
+                  )]
+
+kableExtra::kable(t, caption = "Comparison of different estimates of the number of dwellings and energy demand") %>%
+  kable_styling()
+
+nHouseholds_2011f <- sum(sotonMSOA_DT$nHHs_tenure)
+nElecMeters_2018f <- sum(sotonMSOA_DT$elecMeters)
+nEPCs_2020f <- sum(sotonMSOA_DT$nEPCs)
+
+makePC <- function(x,y,r){
+  # make a percent of x/y and round it to r decimal places
+  pc <- round(100*(x/y),r)
+  return(pc)
+}
+
+```
+
+We can see that the number of EPCs we have is:
+
+  * `r makePC(nEPCs_2020f,nHouseholds_2011f,1)`% of Census 2011 households
+  * `r makePC(nEPCs_2020f,nElecMeters_2018f,1)`% of the recorded 2018 electricity meters
+
+We can also see that despite having 'missing' EPCs, the estimated total EPC-derived energy demand is marginally higher than the BEIS-derived weather corrected energy demand data. Given that the BEIS data accounts for all heating, cooking, hot water, lighting and appliance use we would expect the EPC data to be lower _even if no EPCs were missing..._
 
-kableExtra::kable(t, caption = "Comparison of different estimates of the number of dwellings")
+```{r, missingEPCbyMSOA, fig.cap="% 'missing' rates comparison"}
 
+sotonMSOA_DT[, dep0_pc := 100*(`Household Deprivation: Household is not deprived in any dimension; measures: Value`/nHHs_deprivation)]
+sotonMSOA_DT[, socRent_pc := 100*(census2011_socialRent/nHHs_tenure)]
+sotonMSOA_DT[, privRent_pc := 100*(census2011_privateRent/nHHs_tenure)]
+sotonMSOA_DT[, ownerOcc_pc := 100*(census2011_ownerOccupy/nHHs_tenure)]
+
+t <- sotonMSOA_DT[, .(MSOAName, MSOACode, nHHs_tenure,nElecMeters,nEPCs,
+                      dep0_pc, socRent_pc, privRent_pc, ownerOcc_pc,sumEpcMWh, beisEnergyMWh )]
+
+t[, pc_missingHH := makePC(nEPCs,nHHs_tenure,1)]
+t[, pc_missingMeters := makePC(nEPCs,nElecMeters,1)]
+t[, pc_energyBEIS := makePC(sumEpcMWh,beisEnergyMWh,1)]
+
+kableExtra::kable(t[order(-pc_missingHH)], digits = 2, caption = "EPC records as a % of n census households and n meters per MSOA") %>%
+  kable_styling()
+
+ggplot2::ggplot(t, aes(x = pc_missingHH, 
+                       y = pc_missingMeters,
+                       colour = round(ownerOcc_pc))) +
+  geom_abline(alpha = 0.2, slope=1, intercept=0) +
+  geom_point() +
+  scale_color_continuous(name = "% owner occupiers \n(Census 2011)", high = "red", low = "green") +
+  #theme(legend.position = "bottom") +
+  labs(x = "EPCs 2020 as % of Census 2011 households",
+       y = "EPCs 2020 as % of electricity meters 2018",
+       caption = "x = y line included for clarity")
+
+outlierMSOA <- t[pc_missingHH > 100]
 ```
+Figure \@ref(tab:missingEPCbyMSOA) suggests that rates vary considerably by MSOA but are relatively consistent across the two baseline 'truth' estimates with the exception of `r outlierMSOA$MSOACode` which appears to have many more EPCs than Census 2011 households. It is worth noting that [this MSOA](https://www.localhealth.org.uk/#c=report&chapter=c01&report=r01&selgeo1=msoa_2011.E02003577&selgeo2=eng.E92000001) covers the city centre and dock areas which have had substantial new build since 2011 and so may have households inhabiting dwellings that did not exist at Census 2011. This is also supported by the considerably higher EPC derived energy demand data compared to BEIS's 2018 data - although it suggests the dwellings are either very new (since 2018) or are yet to be occupied.
+
+As we would expect those MSOAs with the lowest EPC coverage on both baseline measures tend to have higher proportions of owner occupiers. 
+
+We can use the same approach to compare estimates of total energy demand at the MSOA level. To do this we compare:
+
+ * estimated total energy demand in MWh/year derived from the EPC estimates. This energy only relates to `current primary energy` (space heating, hot water and lighting) and of course also suffers from missing EPCs (see above)
+ * observed electricity and gas demand collated by BEIS for their sub-national statistical series. This applies to all domestic energy demand but the most recent data is for 2018 so will suffer from the absence of dwellings that are present in the most recent EPC data (see above).
+
+We should therefore not expect the values to match but we might reasonably expect a correlation.
+ 
+```{r, energyMSOAPlot, fig.cap="Energy demand comparison"}
+ggplot2::ggplot(t, aes(x = sumEpcMWh, 
+                       y = beisEnergyMWh,
+                       colour = round(ownerOcc_pc))) +
+  geom_abline(alpha = 0.2, slope=1, intercept=0) +
+  geom_point() +
+  scale_color_continuous(name = "% owner occupiers \n(Census 2011)", high = "red", low = "green") +
+  #theme(legend.position = "bottom") +
+  labs(x = "EPC 2020 derived total MWh/year",
+       y = "BEIS 2018 derived total MWh/year",
+       caption = "x = y line included for clarity")
+
+outlier <- t[sumEpcMWh > 70000]
+```
+
+\@ref(fig:energyMSOAPlot) shows that both of these are true. MSOAs with a high proportion of owner occupiers (and therefore more likely to have missing EPCs) tend to have higher observed energy demand than the EOC data suggests - they are above the reference line. MSOAs with a lower proportion of owner occupiers (and therefore more likely to have more complete EPC coverage) tend to be on or below the line. As before we have the same notable outlier (`r outlier$MSOACode`) and for the same reasons... In this case this produces a much higher energy demand estimate than the BEIS 2018 data records
 
 ## Check ENERGY_CONSUMPTION_CURRENT
 
@@ -351,7 +498,7 @@ ggplot2::ggplot(allEPCs_DT, aes(x = CO2_EMISSIONS_CURRENT,
 
 Repeat the coding for total floor area using 5 m2 as the threshold of interest.
 
-```{r, checkEmissions, fig.cap="Histogram of TOTAL_FLOOR_AREA"}
+```{r, checkFloorArea, fig.cap="Histogram of TOTAL_FLOOR_AREA"}
 ggplot2::ggplot(sotonUniqueEPCsDT, aes(x = TOTAL_FLOOR_AREA)) +
   geom_histogram(binwidth = 1)
 
@@ -398,11 +545,63 @@ finalDT <- sotonUniqueEPCsDT[ENERGY_CONSUMPTION_CURRENT > 0 &
 skimr::skim(finalDT)
 ```
 
-This leaves us with a total of `r prettyNum(nrow(finalDT), big.mark = ",")'` properties.
+This leaves us with a total of `r prettyNum(nrow(finalDT), big.mark = ",")` properties.
 `
 # Current estimated annual CO2 emmisions
 
-We can now use the cleaned data to estimated the annual CO2 emissions of all of these properties. Obviously this will not be the total CO2 emissions for **all** Southampton properties since we know ~ x% are not represented in the EPC data. 
+We can now use the cleaned data to estimated the annual CO2 emissions at:
+
+ * MSOA level for Southampton using
+   * BEIS observed data
+   * aggregated EPC data
+ * Dwelling level for Southampton using
+   * aggregated EPC data
+ 
+Obviously the EPC-derived totals will not be the total CO2 emissions for **all** Southampton properties since we know not all dwellings are represented in the EPC data (see above).
+
+## MSOA estimates
+
+Method:
+
+```{r, setCarbonFactors, echo = TRUE}
+elecCF <- 200 # CO2e/kWh https://www.icax.co.uk/Grid_Carbon_Factors.html
+gasCF <- 215 # https://www.icax.co.uk/Carbon_Emissions_Calculator.html
+```
+
+BEIS: apply 2019 mean grid carbon intensity for:
+  * electricity: `r elecCF` g 
+  * gas: `r gasCF` g CO2e/kWh 
+EPC: use estimated CO2 values - note based on 'old' electricity grid carbon intensity values ()
+
+```{r, co2BEIS}
+sotonMSOA_DT[, sumBEIS_tCO2 := (beisElecMWh/1000)*elecCF + (beisGasMWh/1000)*gasCF]
+```
+
+```{r, co2MSOAPlot, fig.cap="Energy demand comparison"}
+ggplot2::ggplot(sotonMSOA_DT, aes(x = sumBEIS_tCO2, 
+                       y = sumEPC_tCO2,
+                       colour = round(ownerOcc_pc))) +
+  geom_abline(alpha = 0.2, slope=1, intercept=0) +
+  geom_point() +
+  scale_color_continuous(name = "% owner occupiers \n(Census 2011)", high = "red", low = "green") +
+  #theme(legend.position = "bottom") +
+  labs(x = "EPC 2020 derived total T CO2/year",
+       y = "BEIS 2018 derived total T CO2/year",
+       caption = "x = y line included for clarity")
+
+#outlier <- t[sumEpcMWh > 70000]
+```
+
+\@ref(fig:energyMSOAPlot) shows that 
+
+```
+
+Scenarios
+
+National Grid’s Future Energy Scenarios:
+
+ * 2030 emissions level for electricity of 0.102 kgCO2/kWh
+ * gas unchanged
 
 # R packages used
 
@@ -412,5 +611,6 @@ We can now use the cleaned data to estimated the annual CO2 emissions of all of
  * data.table [@data.table]
  * ggplot2 [@ggplot2]
  * kableExtra [@kableExtra]
+ * readxl [@readxl]
  
 # References