Welcome to my Open Notebook

This is an Open Notebook with Selected Content - Delayed. All content is licenced with CC-BY. Find out more Here.

ONS-SCD.png

wickhams-tidy-tools-only-get-you-90-pct-the-way

Hadley Wickham’s tidy tools

In this video at 8 mins 50 seconds he says “these four tools do 90% of the job”

  • subset,
  • transform,
  • summarise, and
  • arrange
  • TODO I noticed at the website for an Rstudio course transform has been replaced by mutate as one of the “four basic verbs of data manipulation”.

Tidy Data from Drew Conway on Vimeo.

So I thought what’s the other 10? Here’s a few contenders for my work:

  • merge
  • reshape::cast and reshape::melt
  • unlist
  • t() transpose
  • sprintf or paste

R-subset

# Filter rows by criteria
subset(airquality, Temp > 90, select = c(Ozone, Temp))

## NB This is a convenience function intended for use interactively.  For
## programming it is better to use the standard subsetting functions like
## ‘[’, and in particular the non-standard evaluation of argument
## ‘subset’ can have unanticipated consequences.

with(airquality,
     airquality[Temp > 90, c("Ozone", "Temp")]
     )

# OR

airquality[airquality$Temp > 90,  c("Ozone", "Temp")] #### R-transform
# New columns that are functions of other columns       
df <- transform(airquality,
                new = -Ozone,
                Temp2 = (Temp-32)/1.8
                )
head(df) #### R-mutate
require(plyr)
# same thing as transform
df <- mutate(airquality, new = -Ozone, Temp = (Temp - 32) / 1.8)    
# Things transform can't do
df <- mutate(airquality, Temp = (Temp - 32) / 1.8, OzT = Ozone / Temp)

# mutate is rather faster than transform
system.time(transform(baseball, avg_ab = ab / g))
system.time(mutate(baseball, avg_ab = ab / g)) #### R-summarise
# New data.frame where columns are functions of existing columns
require(plyr)    
df <- ddply(.data = airquality,
            .variables = "Month",
            .fun = summarise,
            tmax = max(Temp),
            tav = mean(Temp),
            ndays = length(unique(Day))
            )
head(df)

Passing variables to ddply for summary

# Notice how the name of the variable Temp doesn't need quotes?
# this means that you need to hard code the names
# But if you want to pass variables to this inside a function we need a
# different approach.

summarise_df  <- function(x, by, var1, var2, var3)
  {
    data_out <- ddply(x,
                      by,
                      function(df) return(
                        c(
                          tmax = max(df[,var1]),
                          tav = mean(df[,var2]),
                          ndays = length(unique(df[,var3]))
                          )
                        )
                      )
    return(data_out)
  }

df2 <- summarise_df(x = airquality, by = "Month",
                   var1 = "Temp", var2 = "Temp", var3 = "Day"
                   )

head(df2)
all.equal(df,df2)
# TRUE

Another alternative, if we want to pass the dataset as string too

summarise_df2  <- function(x, by, var1, var2, var3)
  {
    data_out <- eval(
      parse(
        text =
        sprintf(
          "ddply(.data = %s,
            .variables = '%s',
            .fun = summarise,
            tmax = max(%s),
            tav = mean(%s),
            ndays = length(unique(%s))
            )", x, by, var1, var2, var3
          )
        )
      )
    return(data_out)
  }

df3 <- summarise_df2(x = "airquality", by = "Month",
                     var1 = "Temp", var2 = "Temp", var3 = "Day"
                     )
head(df3)
all.equal(df, df3)
# TRUE #### R-arrange
# Re-order the rows of a data.frame
df <- arrange(airquality, Temp, Ozone)
head(df)

Posted in  research methods


test-data-for-classification-trees

A fictitious sample dataset

For discussion, I’ll use a fictional example dataset that I’m using to work through some statistical theory related to Classification and Regression Trees (CART). In the motivating example use case we are interested in predicting the civil status (married, single, divorced/widowed) of individuals from their sex (male, female) and sector of activity (primary, secondary, tertiary). The data set is composed of 273 cases.

The data (and related statistical theory) come from:

  • Ritschard, G. (2006). Computing and using the deviance with classification trees. In Compstat 2006 - Proceedings in Computational Statistics 17th Symposium Held in Rome, Italy, 2006. Retrieved from This Link

  • Ritschard, G., Pisetta, V., & Zighed, D. (2008). Inducing and evaluating classification trees with statistical implicative criteria. Statistical Implicative Analysis. Studies in Computational Intelligence Volume 127, pp 397-419. Retrieved from This Link

Code:

# copy and paste the data from the PDF (Table 1 in both papers)
civst_gend_sector  <- read.csv(textConnection(
    "civil_status gender activity_sector number_of_cases
         married   male         primary              50
         married   male       secondary              40
         married   male        tertiary               6
         married female         primary               0
         married female       secondary              14
         married female        tertiary              10
          single   male         primary               5
          single   male       secondary               5
          single   male        tertiary              12
          single female         primary              50
          single female       secondary              30
          single female        tertiary              18
divorced/widowed   male         primary               5
divorced/widowed   male       secondary               8
divorced/widowed   male        tertiary              10
divorced/widowed female         primary               6
divorced/widowed female       secondary               2
divorced/widowed female        tertiary               2
"),sep = "")

# save this to my personal R utilities package "disentangle" 
# for use later when I am exploring functions
dir.create("inst/extdata", recursive=T)
write.csv(civst_gend_sector, "inst/extdata/civst_gend_sector.csv", row.names = F)

That is fine and good, we can use the case weights option to include number of cases but sometimes we want to use one row per person. In the next chunk of code I;ll reformat the data, and also add another fictitious variable called income and contrive an example where a certain group earns less based on their activity sector.

Code:

df <- as.data.frame(matrix(NA, nrow = 0, ncol = 3))
for(i in 1:nrow(civst_gend_sector))
    {
    #    i <- 1
        n <- civst_gend_sector$number_of_cases[i]
        if(n == 0) next
        for(j in 1:n)
            {
              df <- rbind(df, civst_gend_sector[i,1:3])              
            }
 
    }

df$income  <- rnorm(nrow(df), 1000,200)
# Let us say secondary men earn less
df$income[df$gender == "male" & df$activity == "secondary"]  <- df$income[df$gender == "male" & df$activity == "secondary"] - 500
str(df)
# save this for use later
write.csv(df, "inst/extdata/civst_gend_sector_full.csv", row.names = F)

Motivating reason for using these data

Classification and Regression Tree models (also referred to as Decision Trees) are one of the building blocks of data mining and a great tool for Exploratory Data Analysis.

I’ve mostly used Regression Trees in the past but recently got some work with social science data where Classification Trees were needed. I wanted to assess the deviance as well as the misclassification error rate for measuring the descriptive power of the tree. While this is a easy with Regression Trees it became obvious that it was not so easy with Classification Trees. This is because Classification Trees are most often evaluated by means of the error rate. The problem with the error rate is that it is not that helpful for assessing the descriptive capacity of the tree.

For example if we look at the reduction in deviance between the Null model and the fitted tree we can say that the tree explains about XYZ% of the variation. We can also test if this is a statistically significant reduction based on a chi-squared test.

Consider this example from page 310 of Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning. 2nd Edition:

  • in a two-class problem with 400 observations in each class (denote this by (400, 400))
  • suppose one split created nodes (300, 100) and (100, 300),
  • the other created nodes (200, 400) and (200, 0).
  • Both splits produce a misclassification rate of 0.25, but the second split produces a pure node and is probably preferable.

During the course of my research to try to identify the best available method to implement in my analysis I found a useful series of papers by Ritschard, with a worked example using SPSS. I hope to translate that to R in the future, but the first thing I did was grab the example data used in several of those papers out of the PDF. So seeing as this was a public dataset (I use a lot of restricted data) and because I want to be able to use it to demonstrate the use of any R functions I find or write… I thought would publish it properly.

The Tree Model

So just before we leave Ritschard and the CART method, let’s just fit the model. Let’s also install my R utilities package “disentangle”, to test that we can access the data from it.

In this analysis the civil status is the outcome (or response or decision or dependent) variable, while sex and activity sector are the predictors (or condition or independent variables).

Code:

# func
require(rpart)
require(partykit) 
require(devtools)
install_github("disentangle", "ivanhanigan")

# load
fpath <- system.file(file.path("extdata", "civst_gend_sector.csv"),
                     package = "disentangle"
                     )
civst_gend_sector <- read.csv(fpath)

# clean
str(civst_gend_sector)

# do
fit <- rpart(civil_status ~ gender + activity_sector,
             data = civst_gend_sector, weights = number_of_cases,
             control=rpart.control(minsplit=1))
# NB need minsplit to be adjusted for weights.
summary(fit)
  
# report
dir.create("images")
png("images/fit1.png", 1000, 480)
plot(as.party(fit))
dev.off()

The Result

fit1.png

Posted in  Data Documentation


simple-example-using-nmmaps

I will use the NMMAPSlite datasets for a simple example of what I am trying to do.

Code: get nmmaps data

# func
if(!require(NMMAPSlite)) install.packages('NMMAPSlite');require(NMMAPSlite)
require(mgcv)
require(splines)

######################################################
# load  
setwd('data')
initDB('data/NMMAPS') # this requires that we connect to the web,
                      # so lets get local copies
setwd('..')
cities <- getMetaData('cities')
head(cities)
citieslist <- cities$cityname
# write out a few cities for access later
for(city_i in citieslist[sample(1:nrow(cities), 9)])
{
 city <- subset(cities, cityname == city_i)$city
 data <- readCity(city)
 write.table(data, file.path('data', paste(city_i, '.csv',sep='')),
 row.names = F, sep = ',')
}
# these are all tiny, go some big ones
for(city_i in c('New York', 'Los Angeles', 'Madison', 'Boston'))
{
 city <- subset(cities, cityname == city_i)$city
 data <- readCity(city)
 write.table(data, file.path('data', paste(city_i, '.csv',sep='')),
 row.names = F, sep = ',')
}

######################################################
# now we can use these locally
dir("data")
city <- "Chicago"
data <- read.csv(sprintf("data/%s.csv", city), header=T)
str(data)
data$yy <- substr(data$date,1,4)
data$date <- as.Date(data$date)
######################################################
# check
par(mfrow=c(2,1), mar=c(4,4,3,1))
with(subset(data[,c(1,15:25)], agecat == '75p'),
  plot(date, tmax)
 )
with(subset(data[,c(1,4,15:25)], agecat == '75p'),
        plot(date, cvd, type ='l', col = 'grey')
        )
with(subset(data[,c(1,4,15:25)], agecat == '75p'),
        lines(lowess(date, cvd, f = 0.015))
        )
# I am worried about that outlier
data$date[which(data$cvd > 100)]
# [1] "1995-07-15" "1995-07-16"
 
######################################################
# do standard NMMAPS timeseries poisson GAM model
numYears<-length(names(table(data$yy)))
df <- subset(data, agecat == '75p')
df$time <- as.numeric(df$date)
fit <- gam(cvd ~ s(pm10tmean) + s(tmax) + s(dptp) + s(time, k= 7*numYears, fx=T), data = df, family = poisson)
# plot of response functions
par(mfrow=c(2,2))
plot(fit)
dev.off()
 
######################################################
# some diagnostics
summary(fit)
# note the R-sq.(adj) =   0.21
gam.check(fit)
# note the lack of a leverage plot.  for that we need glm
 
######################################################
# do same model as glm
fit2 <- glm(cvd ~ pm10tmean + ns(tmax, df = 8) + ns(dptp, df = 4) + ns(time, df = 7*numYears), data = df, family = poisson)
# plot responses
par(mfrow=c(2,2))
termplot(fit2, se =T)
dev.off()
 
# plot prediction
df$predictedCvd <- predict(fit2, df, 'response')
# baseline is given by the intercept
fit3 <- glm(cvd ~ 1, data = df, family = poisson)
df$baseline <-  predict(fit3, df, 'response')
with(subset(df, date>=as.Date('1995-01-01') & date <= as.Date('1995-07-31')),
 plot(date, cvd, type ='l', col = 'grey')
        )
with(subset(df, date>=as.Date('1995-01-01') & date <= as.Date('1995-07-31')),
        lines(date,predictedCvd)
        )
with(subset(df, date>=as.Date('1995-01-01') & date <= as.Date('1995-07-31')),
 lines(date,baseline)
        )
######################################################
# some diagnostics
# need to load a function to calculate poisson adjusted R squared
# original S code from
# The formula for pseudo-R^2 is taken from G. S. Maddalla,
# Limited-dependent and Qualitative Variables in Econometrics, Cambridge:Cambridge Univ. Press, 1983. page 40, equation 2.50.
RsquaredGlm <- function(o) {
 n <- length(o$residuals)
 ll <- logLik(o)[1]
 ll_0 <- logLik(update(o,~1))[1]
 R2 <- (1 - exp(((-2*ll) - (-2*ll_0))/n))/(1 - exp( - (-2*ll_0)/n))
 names(R2) <- 'pseudo.Rsquared'
 R2
 }
RsquaredGlm(fit2)
# 0.51
# the difference is presumably due to the arguments about how to account for unexplainable variance in the poisson distribution?
 
# significance of spline terms
drop1(fit2, test='Chisq')
# also note AIC. best model includes all of these terms
# BIC can be computed instead (but still labelled AIC) using
drop1(fit2, test='Chisq', k = log(nrow(data)))
 
# diagnostic plots
par(mfrow=c(2,2))
plot(fit2)
dev.off()
# note high leverage plus residuals points are labelled
# leverage doesn't seem to be too high though which is good
# NB the numbers refer to the row.names attribute which still refer to the original dataset, not this subset
df[row.names(df) %in% c(9354,9356),]$date
# as suspected [1] "1995-07-15" "1995-07-16"
 
######################################################
# so lets re run without these obs
df2 <- df[!row.names(df) %in% c(9354,9356),]
# to avoid duplicating code just re run fit2, replacing data=df with df2
# tmax still significant but not so extreme
# check diagnostic plots again
par(mfrow=c(2,2))
plot(fit2)
dev.off()
# looks like a well behaved model now.
 
# if we were still worried about any high leverage values we could identify these with
df3 <- na.omit(df2[,c('cvd','pm10tmean','tmax','dptp','time')])
df3$hatvalue <- hatvalues(fit2)
df3$res <- residuals(fit2, 'pearson')
with(df3, plot(hatvalue, res))
# this is the same as the fourth default glm diagnostic plot, which they label x-axis as leverage
summary(df3$hatvalue)
# gives us an idea of the distribution of hat values
# decide on a threshold and look at it
hatThreshold <- 0.1
with(subset(df3, hatvalue > hatThreshold), points(hatvalue, res, col = 'red', pch = 16))
abline(0,0)
segments(hatThreshold,-2,hatThreshold,15)
dev.off()
 
fit3 <- glm(cvd ~ pm10tmean + ns(tmax, df = 8) + ns(dptp, df = 4) + ns(time, df = 7*numYears), data = subset(df3, hatvalue < hatThreshold), family = poisson)
par(mfrow=c(2,2))
termplot(fit3, se = T)
# same same
plot(fit3)
# no better
 
# or we could go nuts with a whole number of ways of estimating influence
# check all influential observations
infl <- influence.measures(fit2)
# which observations 'are' influential
inflk <- which(apply(infl$is.inf, 1, any))
length(inflk)
 
 
######################################################
# now what about serial autocorrelation in the residuals?
 
par(mfrow = c(2,1))
with(df3, acf(res))
with(df3, pacf(res))
dev.off()
 
 
 
######################################################
# just check for overdispersion
fit <- gam(cvd ~ s(pm10tmean) + s(tmax) + s(dptp) + s(time, k= 7*numYears, fx=T), data = df, family = quasipoisson)
summary(fit)
# note the Scale est. = 1.1627
# alternatively check the glm
fit2 <- glm(cvd ~ pm10tmean + ns(tmax, df = 8) + ns(dptp, df = 4) + ns(time, df = 7*numYears), data = df, family = quasipoisson)
summary(fit2)
# (Dispersion parameter for quasipoisson family taken to be 1.222640)
# this is probably near enough to support a standard poisson model...
 
# if we have overdispersion we can use QAIC (A quasi- mode does not have a likelihood and so does not have an AIC,  by definition)
# we can use the poisson model and calculate the overdispersion
fit2 <- glm(cvd ~ pm10tmean + ns(tmax, df = 8) + ns(dptp, df = 4) + ns(time, df = 7*numYears), data = df, family = poisson)
1- pchisq(deviance(fit2), df.residual(fit2))
 
# QAIC, c is the variance inflation factor, the ratio of the residual deviance of the global (most complicated) model to the residual degrees of freedom
c=deviance(fit2)/df.residual(fit2)
QAIC.1=-2*logLik(fit2)/c + 2*(length(coef(fit2)) + 1)
QAIC.1
 
# Actually lets use QAICc which is more conservative about parameters,
QAICc.1=-2*logLik(fit2)/c + 2*(length(coef(fit2)) + 1) + 2*(length(coef(fit2)) + 1)*(length(coef(fit2)) + 1 + 1)/(nrow(na.omit(df[,c('cvd','pm10tmean','tmax','dptp','time')]))- (length(coef(fit2))+1)-1)
QAICc.1
 
 
######################################################
# the following is old work, some may be interesting
# such as the use of sinusoidal wave instead of smooth function of time
 
 
# # sine wave
# timevar <- as.data.frame(names(table(df$date)))
# index <- 1:length(names(table(df$date)))
# timevar$time2 <- index / (length(index) / (length(index)/365.25))
# names(timevar) <- c('date','timevar')
# timevar$date <- as.Date(timevar$date)
# df <- merge(df,timevar)
 
# fit <- gam(cvd ~ s(tmax) + s(dptp) + sin(timevar * 2 * pi) + cos(timevar * 2 * pi) + ns(time, df = numYears), data = df, family = poisson)
# summary(fit)
# par(mfrow=c(3,2))
# plot(fit, all.terms = T)
# dev.off()
 
# # now just explore the season fit
# fit <- gam(cvd ~ sin(timevar * 2 * pi) + cos(timevar * 2 * pi) + ns(time, df = numYears), data = df, family = poisson)
# yhat <- predict(fit)
# head(yhat)
 
# with(df, plot(date,cvd,type = 'l',col='grey', ylim = c(15,55)))
# lines(df[,'date'],exp(yhat),col='red')
 
 
# # drop1(fit, test= 'Chisq')
 
# # drop1 only works in glm?
# # fit with weather variables, use degrees of freedom estimated by gam
# fit <- glm(cvd ~ ns(tmax,8) + ns(dptp,2) + sin(timevar * 2 * pi) + cos(timevar * 2 * pi) + ns(time, df = numYears), data = df, family = poisson)
# drop1(fit, test= 'Chisq')
# # use plot.glm for diagnostics
# par(mfrow=c(2,2))
# plot(fit)
# par(mfrow=c(3,2))
# termplot(fit, se=T)
# dev.off()
 
# # cyclic spline, overlay on prior sinusoidal
# with(df, plot(date,cvd,type = 'l',col='grey', ylim = c(0,55)))
# lines(df[,'date'],exp(yhat),col='red')
 
# df$daynum <- as.numeric(format(df$date, "%j"))
# df[360:370,c('date','daynum')]
# fit <- gam(cvd ~ s(daynum, k=3, fx=T, bs = 'cp') +  s(time, k = numYears, fx = T), data = df, family = poisson)
# yhat2 <- predict(fit)
# head(yhat2)
 
# lines(df[,'date'],exp(yhat2),col='blue')
 
 
# par(mfrow=c(1,2))
# plot(fit)
 
 
# # fit weather with season
# fit <- gam(cvd ~ s(tmax) + s(dptp) +
  # s(daynum, k=3, fx=T, bs = 'cp') +  s(time, k = numYears, fx = T), data = df, family = poisson)
# par(mfrow=c(2,2))
# plot(fit)
 
# summary(fit)

Posted in  spatial dependence


reviewing-lessons

Introduction

This post is an attempt to put together a standard framework for reviewing lessons. I’ve been to numerous workshops, master classes, tutorials and lectures and always take ad hoc notes that I generally re-write and re-organise afterwards. I hope to develop a clearer methodology for reviewing what I learnt in each lesson.

Context

  • When: Either just the date and time or include more context like the broader event such as a conference, season, public holidays.

  • Who: The presenter will give a bio at the start so make notes, especially to identify what disciplinary perspective they come from.

  • Where: Includes the address, city, lecture theatre. These are cues for memory.

  • Why: Both why is the presenter here talking and also why am I hear listening. It is important to critically reflect on what I want to get out of this lesson.

  • What: What is this lesson all about? This might start with a synopsis overview and will probably move on to a sequence of notes as the presenter’s narrative unfolds, and my thoughts on the topic evolve.

Three Stages: Thesis, Antithesis, Synthesis

A guiding principle I use for writing the ‘What’ section is the three stages: thesis, antithesis, synthesis. I am not a philosopher so I don’t know the proper use of these concepts in that discipline, but for me they are useful to structure my notes as I go through the process of a lesson. Here is how I think of these stages:

  • Thesis: This is where I might pick out the key topics that are being presented, and write down my prior knowledge and preconceptions about the topic.

  • Antithesis: What’s the main message(s) of the presenter? What are their priorities? What secondary (surrogate) topics emerge around the main points? If I bring questions to the lesson are they answered by the presenter? If not why?

  • Synthesis: My new understanding of the topic.

Other tools

Other things I use are:

  • Mind maps: a central topic with a spiderweb of links extending out in a circle.
  • TODO

Posted in  research methods


spatially-structured-timeseries-vs-spatiotemporal-modelling

Spatiotemporal Regression Modelling

Spatiotemporal Regression Modelling

1 Spatially Structured Timeseries Vs Spatiotemporal Modelling

In my last post about spatiotemporal regression modelling I mentioned that I am mostly interested in "spatially structured time-series models" rather than spatial models at a single point in time. By this I mean that we have several neighbouring areal units observed over a period of time. In this framework the general methods of time series modelling are used to control for temporal autocorrelation. However this makes the methods of spatial error and spatial lag models tricky because the spatial autocorrelation needs to be assessed at many points in time.

I want to expand more on this topic because I want to be clear that the organisation of the material I am aiming to bring to this notebook topic is not aimed at purely spatial regression models (there is a lot of material and tools out there already for that). I am trying with these notes to document my learning steps toward integrating spatial methods with time-series methods to allow me to practice (and understand) spatiotemporal regression modelling.

1.1 Spatially Structured Time Series

In my most successful previous attempt to conduct a spatiotemporal analysis of Suicide and Droughts I built on my knowledge of time-series regression models from single-city air pollution studies where the whole city is the unit of analysis and the temporal variation is modelled with controlling techniques for temporal autocorrelation. These techniques are also valid for multi-city studies because it is pretty safe to assume the cities are all independent at each time point. I structured my study by Eleven large zones (Census Statistical Divisions) of NSW and assumed each of these would vary over time independent of each other, and I fitted a zone-specific time trend and cycle. This is what I call "spatially structured time-series" modelling.

I justify using this model in this case because aggregating up to these very large regions will diminish the possibility of spatial autocorrelation and because Droughts vary over large spatial zones too, we will not suffer from exposure misclassificaiton bias.

So this model is a simple time-series regression (with trend and seasonality) and an additional term for spatial Zone.

\begin{eqnarray*} log({\color{red} O_{ijk}}) & = & s({\color{red}ExposureVariable}) + {\color{blue} OtherExplanators} \\ & & + AgeGroup_{i} + Sex_{j} \\ & & + {\color{blue} SpatialZone_{k}} \\ & & + sin(Time \times 2 \times \pi) + cos(Time \times 2 \times \pi) \\ & & + Trend \\ & & + offset({\color{blue} log(Pop_{ijk})})\\ \end{eqnarray*}

Where:

  • \({\color{red}O_{ijk}}\) = Outcome (counts) by Age\(_{i}\), Sex\(_{j}\) and SpatialZone\(_{k}\)
  • {\color{red}ExposureVariable} = Data with {\color{red}Restrictive Intellectual Property~(IP)}
  • {\color{blue}OtherExplanators} = Other {\color{blue}Less Restricted} Explanatory variables
  • s( ) = penalized regression splines
  • \({\color{blue} SpatialZone_{k}}\) = {\color{blue} Less Restricted} data representing the \(SpatialZone_{k}\)
  • Trend = Longterm smooth trend(s)
  • \({\color{blue}Pop_{ijk}}\) = interpolated Census populations, by time in each group

1.2 TODO Spatiotemporal modelling

In contrast to the above model for modelling exposures that have fine resolution spatial variation (such as air pollution) the exposure misclassification effect of aggregating up to very large spatial zones will conteract the benefits of avoiding spatially autocorrelated errors and this might be unacceptable for certain research questions. Therefore it is important to move toward a spatiotemporal regression model that replaces the \(SpatialZone_{k}\) term with a more spatial error or spatial lag approach.

</html>

Posted in  spatial dependence