Examples of basic uses of mgwrsar package

Ghislain Geniaux

2018-05-10

Introduction

mgwrsar package estimates linear and local linear models with spatial autocorrelation of the following forms:

\[y=\beta_v(u_i,v_i) X_v+\,\epsilon_i\;\;\;\;\;\;\;\; (GWR)\]

\[y=\beta_c X_c+\beta_v(u_i,v_i) X_v+\,\epsilon_i\;\;\;\;\;\;\;\; (MGWR)\]

\[y=\lambda Wy+\beta_c X_c+\,\epsilon_i\;\;\;\;\;\;\;\; (MGWR-SAR(0,k,0))\]

\[y=\lambda Wy+\beta_v(u_i,v_i)X_v+\,\epsilon_i\;\;\;\;\;\;\;\; (MGWR-SAR(0,0,k))\]

\[y=\lambda Wy+\beta_c X_c+\beta_v(u_i,v_i)X_v+\,\epsilon_i\;\;\;\;\;\;\;\; (MGWR-SAR(0,k_c,k_v))\]

\[y=\lambda(u_i,v_i) Wy+\beta_c X_c+\,\epsilon_i\;\;\;\;\;\;\;\; (MGWR-SAR(1,k,0))\]

\[y=\lambda(u_i,v_i)Wy+\beta_v(u_i,v_i)X_v+\,\epsilon_i\;\;\;\;\;\;\;\; (MGWR-SAR(1,0,k))\]

\[y=\lambda(u_i,v_i)Wy+\beta_cX_c+\beta_v(u_i,v_i)X_v+\,\epsilon_i\;\;\;\;\;\;\;\; (MGWR-SAR(1,k_c,k_v))\]

For more details on the estimation method, see Geniaux, G. and Martinetti, D. (2017). A new method for dealing simultaneously with spatial autocorrelation and spatial heterogeneity in regression models. Regional Science and Urban Economics. (https://doi.org/10.1016/j.regsciurbeco.2017.04.001)

The estimation of the previous model can be done using the MGWRSAR wrapper function which requires a dataframe and a matrix of coordinates. Note that:

In addition to the ability of considering spatial autocorrelation in GWR/MGWR like models, MGWRSAR function introduces several useful technics for estimating local regression with space coordinates:

Using mgwrsar package

The MGWRSAR function requires a dataframe and a matrix of coordinates, and eventually a spatial weight matrix if model include spatiale dependence. The package includes a simulated data example which is used for this vignette.

library(mgwrsar)
#> Loading required package: Rcpp
#> Loading required package: sp
#> Loading required package: leaflet
#> Loading required package: Matrix
## loading data example
data(data_mgwrsar)
coord=as.matrix(mydata[,c("x_lat","y_lon")])
## Creating a spatial weigth matrix (sparce dgCMatrix) of 8 nearest neighbors
W=KNN(coord,8)

Estimation

Estimation of a GWR with a gauss kernel with and without parallel computation:

### with parallel computing
#ptm1<-proc.time()
#model_GWR<-MGWRSAR(formula = 'Y_gwr~X1+X2+X3', data = mydata,coord=coord, #fixed_vars=NULL,kernels=c('gauss'),H=0.13, Model = 'GWR',control=list(SE=TRUE,doMC=TRUE,ncore=4))
#(proc.time()-ptm1)[3]

### without parallel computing
ptm1<-proc.time()
model_GWR<-MGWRSAR(formula = 'Y_gwr~X1+X2+X3', data = mydata,coord=coord, fixed_vars=NULL,kernels=c('gauss'),H=0.13, Model = 'GWR',control=list(SE=TRUE,doMC=FALSE))
(proc.time()-ptm1)[3]
#> elapsed 
#>   1.004

Summary and plot of mgwrsar object using leaflet:

summary_mgwrsar(model_GWR)
#> Call:
#> MGWRSAR(formula = "Y_gwr~X1+X2+X3", data = mydata, coord = coord, 
#>     fixed_vars = NULL, kernels = c("gauss"), H = 0.13, Model = "GWR", 
#>     control = list(SE = TRUE, doMC = FALSE))
#> Model: GWR 
#> Kernels function: gauss 
#> Kernels type: GD 
#> bandwidth: 0.13 
#> Number of data points: 1000 
#> Varying parameters: Intercept X1 X2 X3 
#>         Intercept       X1       X2      X3
#> Min.     -2.63740  0.16880 -0.37306 -1.7588
#> 1st Qu.  -1.46100  0.38590  1.52729 -1.3070
#> Median   -1.08812  0.51409  1.94806 -1.0194
#> Mean     -0.97597  0.48879  1.94778 -1.0289
#> 3rd Qu.  -0.62307  0.59454  2.58871 -0.7439
#> Max.      1.06100  0.79570  3.37755 -0.3083
#> Effective degrees of freedom: 955.0674 
#> AIC 135.0056 
#> Residual sum of squares: 64.0684
plot_mgwrsar(model_GWR,type='B_coef',var='X2')
plot_mgwrsar(model_GWR,type='t_coef',var='X2')

Estimation of a GWR with spgwr package:

library(spgwr)
#> Loading required package: spData
#> To access larger datasets in this package, install the spDataLarge
#> package with: `install.packages('spDataLarge')`
#> NOTE: This package does not constitute approval of GWR
#> as a method of spatial analysis; see example(gwr)
mydataSP=mydata
coordinates(mydataSP)=c('x_lat','y_lon')
ptm1<-proc.time()
model_spgwr <- gwr(Y_gwr~X1+X2+X3, data=mydataSP, bandwidth=0.13,hatmatrix=TRUE)
(proc.time()-ptm1)[3]
#> elapsed 
#>  15.977
head(model_spgwr$SDF@data$gwr.e)
#> [1] -0.1585824 -0.1861144 -0.4450174 -0.1767796 -0.2014330  0.2670349
model_spgwr
#> Call:
#> gwr(formula = Y_gwr ~ X1 + X2 + X3, data = mydataSP, bandwidth = 0.13, 
#>     hatmatrix = TRUE)
#> Kernel function: gwr.Gauss 
#> Fixed bandwidth: 0.13 
#> Summary of GWR coefficient estimates at data points:
#>                  Min.  1st Qu.   Median  3rd Qu.     Max.  Global
#> X.Intercept. -2.63740 -1.46100 -1.08812 -0.62307  1.06100 -1.4845
#> X1            0.16880  0.38590  0.51409  0.59454  0.79570  0.5000
#> X2           -0.37306  1.52729  1.94806  2.58871  3.37755  2.5481
#> X3           -1.75876 -1.30704 -1.01941 -0.74386 -0.30828 -1.0762
#> Number of data points: 1000 
#> Effective number of parameters (residual: 2traceS - traceS'S): 62.85371 
#> Effective degrees of freedom (residual: 2traceS - traceS'S): 937.1463 
#> Sigma (residual: 2traceS - traceS'S): 0.2614678 
#> Effective number of parameters (model: traceS): 44.93259 
#> Effective degrees of freedom (model: traceS): 955.0674 
#> Sigma (model: traceS): 0.2590031 
#> Sigma (ML): 0.2531174 
#> AICc (GWR p. 61, eq 2.33; p. 96, eq. 4.21): 186.462 
#> AIC (GWR p. 96, eq. 4.22): 135.0056 
#> Residual sum of squares: 64.0684 
#> Quasi-global R2: 0.9813492

all(abs(model_GWR$residuals-model_spgwr$SDF@data$gwr.e)<0.00000000001)
#> [1] TRUE

Estimation of a GWR with leave-one out CV, using an adaptive bisquare kernel (based on 20 nearest neighbors) and a local Outlier Detection/Removal procedure:

model_GWR<-MGWRSAR(formula = 'Y_gwr~X1+X2+X3', data = mydata,coord=coord, fixed_vars=NULL,kernels=c('bisq_knn'),H=20, Model = 'GWR',control=list(isgcv=TRUE,remove_local_outlier=TRUE,outv=0.01))
summary_mgwrsar(model_GWR)
#> Call:
#> MGWRSAR(formula = "Y_gwr~X1+X2+X3", data = mydata, coord = coord, 
#>     fixed_vars = NULL, kernels = c("bisq_knn"), H = 20, Model = "GWR", 
#>     control = list(isgcv = TRUE, remove_local_outlier = TRUE, 
#>         outv = 0.01))
#> Model: GWR 
#> Kernels function: bisq_knn 
#> Kernels type: GD 
#> bandwidth: 20 
#> Number of data points: 1000 
#> Varying parameters: Intercept X1 X2 X3 
#>         Intercept        X1        X2      X3
#> Min.    -4.223718  0.001589 -1.182832 -2.0699
#> 1st Qu. -1.448438  0.347360  1.026678 -1.2974
#> Median  -0.773579  0.537315  1.660816 -1.0162
#> Mean    -0.797981  0.497172  1.665602 -1.0071
#> 3rd Qu. -0.173053  0.640754  2.456682 -0.7263
#> Max.     2.267588  0.911069  4.103331 -0.0933
#> Residual sum of squares: 16.54259

Estimation of a MGWR with stationary intercept using an adapative gaussian kernel (based on 20 nearest neighbors) + leave-one out CV estimation


model_MGWR<-MGWRSAR(formula = 'Y_mgwr~X1+X2+X3', data = mydata,coord=coord, fixed_vars=c('Intercept'),kernels=c('gauss_adapt'),H=20, Model = 'MGWR',control=list(SE=TRUE))
summary_mgwrsar(model_MGWR)
#> Call:
#> MGWRSAR(formula = "Y_mgwr~X1+X2+X3", data = mydata, coord = coord, 
#>     fixed_vars = c("Intercept"), kernels = c("gauss_adapt"), 
#>     H = 20, Model = "MGWR", control = list(SE = TRUE))
#> Model: MGWR 
#> Kernels function: gauss_adapt 
#> Kernels type: GD 
#> bandwidth: 20 
#> Number of data points: 1000 
#> Constant parameters: Intercept 
#> -0.5563375 
#> Varying parameters: X1 X2 X3 
#>              X1      X2      X3
#> Min.    0.17788 0.53761 -1.4822
#> 1st Qu. 0.37102 1.25611 -1.1637
#> Median  0.50101 1.62152 -1.0894
#> Mean    0.48714 1.62199 -1.0462
#> 3rd Qu. 0.58890 2.05394 -0.9554
#> Max.    0.82570 2.71433 -0.5656
#> Effective degrees of freedom: 924.9957 
#> AIC -272.9127 
#> Residual sum of squares: 41.38677
                       
                       
### leave-one out CV estimation
model_MGWR<-MGWRSAR(formula = 'Y_mgwr~X1+X2+X3', data = mydata,coord=coord, fixed_vars='Intercept',kernels=c('gauss_adapt'),H=20, Model = 'MGWR',control=list(isgcv=TRUE,remove_local_outlier=FALSE,outv=NULL))
summary_mgwrsar(model_MGWR)
#> Call:
#> MGWRSAR(formula = "Y_mgwr~X1+X2+X3", data = mydata, coord = coord, 
#>     fixed_vars = "Intercept", kernels = c("gauss_adapt"), H = 20, 
#>     Model = "MGWR", control = list(isgcv = TRUE, remove_local_outlier = FALSE, 
#>         outv = NULL))
#> Model: MGWR 
#> Kernels function: gauss_adapt 
#> Kernels type: GD 
#> bandwidth: 20 
#> Number of data points: 1000 
#> Constant parameters: Intercept 
#> -0.5836998 
#> Varying parameters: X1 X2 X3 
#>              X1      X2      X3
#> Min.    0.17826 0.53262 -1.4776
#> 1st Qu. 0.37341 1.27013 -1.1609
#> Median  0.50187 1.64459 -1.0867
#> Mean    0.48805 1.63952 -1.0424
#> 3rd Qu. 0.58608 2.07933 -0.9529
#> Max.    0.82377 2.72156 -0.5559
#> Residual sum of squares: 50.18494

Estimation of a MGWR with stationary intercept, using an adapative gaussian kernel (based on 20 nearest neighbors):

model_MGWRSAR_1_0_kv<-MGWRSAR(formula = 'Y_mgwrsar_1_0_kv~X1+X2+X3', data = mydata,coord=coord, fixed_vars=NULL,kernels=c('gauss_adapt'),H=20, Model = 'MGWRSAR_1_0_kv',control=list(W=W,Lambdacor=TRUE,SE=TRUE))
summary_mgwrsar(model_MGWRSAR_1_0_kv)
#> Call:
#> MGWRSAR(formula = "Y_mgwrsar_1_0_kv~X1+X2+X3", data = mydata, 
#>     coord = coord, fixed_vars = NULL, kernels = c("gauss_adapt"), 
#>     H = 20, Model = "MGWRSAR_1_0_kv", control = list(W = W, Lambdacor = TRUE, 
#>         SE = TRUE))
#> Model: MGWRSAR_1_0_kv 
#> Method for spatial autocorrelation: MGWRSAR_1_0_kv 
#> Kernels function: gauss_adapt 
#> Kernels type: GD 
#> bandwidth: 20 
#> Number of data points: 1000 
#> Varying parameters: Intercept X1 X2 X3  
#>         Intercept       X1       X2       X3  lambda
#> Min.     -8.94883  0.10555 -2.65846 -2.10631 -0.2844
#> 1st Qu.  -1.74523  0.37095  0.81483 -1.29394  0.1355
#> Median   -0.99141  0.54257  2.11968 -1.03264  0.2965
#> Mean     -1.06465  0.50832  2.18127 -1.07079  0.3152
#> 3rd Qu.  -0.21901  0.63831  3.55568 -0.73756  0.4899
#> Max.      2.20903  0.94188  9.84625 -0.34718  0.9153
#> Effective degrees of freedom: 893.3478 
#> AIC 2158.486 
#> Residual sum of squares: 456.0999

Estimation of a mgwrsar(0,kc,kv) model with stationary intercept and stationnary spatial multiplier using an adapative gaussian kernel (based on 20 nearest neighbors):

model_MGWRSAR_0_kc_kv<-MGWRSAR(formula = 'Y_mgwrsar_0_kc_kv~X1+X2+X3', data = mydata,coord=coord, fixed_vars=c('Intercept'),kernels=c('gauss_adapt'),H=20, Model = 'MGWRSAR_0_kc_kv',control=list(W=W))
summary_mgwrsar(model_MGWRSAR_0_kc_kv)
#> Call:
#> MGWRSAR(formula = "Y_mgwrsar_0_kc_kv~X1+X2+X3", data = mydata, 
#>     coord = coord, fixed_vars = c("Intercept"), kernels = c("gauss_adapt"), 
#>     H = 20, Model = "MGWRSAR_0_kc_kv", control = list(W = W))
#> Model: MGWRSAR_0_kc_kv 
#> Method for spatial autocorrelation: MGWRSAR_0_kc_kv 
#> Kernels function: gauss_adapt 
#> Kernels type: GD 
#> bandwidth: 20 
#> Number of data points: 1000 
#> Constant parameters: Intercept  
#> -0.3140325 0.2254189 
#> Varying parameters: X1 X2 X3 
#>               X1       X2      X3
#> Min.     0.18113 -0.19560 -1.6156
#> 1st Qu.  0.38038  1.30055 -1.2737
#> Median   0.50396  1.92790 -1.1129
#> Mean     0.49373  1.93858 -1.0928
#> 3rd Qu.  0.59936  2.80034 -0.9463
#> Max.     0.86079  3.67151 -0.4367
#> Residual sum of squares: 122.6273

Estimation of a mgwrsar(0,0,kv) model with stationnary spatial multiplier using an adapative gaussian kernel (based on 20 nearest neighbors):

model_MGWRSAR_0_0_kv<-MGWRSAR(formula = 'Y_mgwrsar_0_0_kv~X1+X2+X3', data = mydata,coord=coord, fixed_vars=NULL,kernels=c('gauss_adapt'),H=20, Model = 'MGWRSAR_0_0_kv',control=list(W=W))
summary_mgwrsar(model_MGWRSAR_0_0_kv)
#> Call:
#> MGWRSAR(formula = "Y_mgwrsar_0_0_kv~X1+X2+X3", data = mydata, 
#>     coord = coord, fixed_vars = NULL, kernels = c("gauss_adapt"), 
#>     H = 20, Model = "MGWRSAR_0_0_kv", control = list(W = W))
#> Model: MGWRSAR_0_0_kv 
#> Method for spatial autocorrelation: MGWRSAR_0_0_kv 
#> Kernels function: gauss_adapt 
#> Kernels type: GD 
#> bandwidth: 20 
#> Number of data points: 1000 
#> Constant parameters: 
#> 0.250141 
#> Varying parameters: Intercept X1 X2 X3 
#>         Intercept        X1        X2      X3
#> Min.    -4.928243  0.104342 -1.979095 -2.0943
#> 1st Qu. -1.429473  0.365563  1.262898 -1.2896
#> Median  -0.779578  0.536273  2.011070 -0.9916
#> Mean    -0.689637  0.502558  2.023040 -1.0512
#> 3rd Qu. -0.094623  0.630135  3.365939 -0.7410
#> Max.     2.327138  0.891468  5.401186 -0.3125
#> Residual sum of squares: 93.50372

Estimation of a mgwrsar(1,kv,kv) model with stationary intercept using an adapative gaussian kernel (based on 20 nearest neighbors):

model_MGWRSAR_1_kc_kv<-MGWRSAR(formula = 'Y_mgwrsar_1_kc_kv~X1+X2+X3', data = mydata,coord=coord, fixed_vars=c('Intercept'),kernels=c('gauss_adapt'),H=20, Model = 'MGWRSAR_1_kc_kv',control=list(W=W))
summary_mgwrsar(model_MGWRSAR_1_kc_kv)
#> Call:
#> MGWRSAR(formula = "Y_mgwrsar_1_kc_kv~X1+X2+X3", data = mydata, 
#>     coord = coord, fixed_vars = c("Intercept"), kernels = c("gauss_adapt"), 
#>     H = 20, Model = "MGWRSAR_1_kc_kv", control = list(W = W))
#> Model: MGWRSAR_1_kc_kv 
#> Method for spatial autocorrelation: MGWRSAR_1_kc_kv 
#> Kernels function: gauss_adapt 
#> Kernels type: GD 
#> bandwidth: 20 
#> Number of data points: 1000 
#> Constant parameters: Intercept 
#> -0.6698754 
#> Varying parameters: X1 X2 X3  
#>               X1       X2       X3  lambda
#> Min.     0.19982 -0.46764 -1.90291 -0.2478
#> 1st Qu.  0.41206  1.13441 -1.27422  0.1194
#> Median   0.52169  1.84084 -1.09489  0.2813
#> Mean     0.50659  1.96433 -1.09162  0.3263
#> 3rd Qu.  0.60433  2.86682 -0.88669  0.5122
#> Max.     0.83615  4.86693 -0.44568  1.2080
#> Residual sum of squares: 869.4819

Bootstrap test

mgwrsar_bootstrap_test function allows to perform a bootstrap test for Betas for mgwrsar class model (with parallel computation). Could be long, not run here.

model_GWR<-MGWRSAR(formula = 'Y_mgwr~X1+X2+X3', data = mydata,coord=coord, fixed_vars=NULL,kernels=c('gauss_adapt'),H=20, Model = 'GWR',control=list(SE=TRUE))
summary_mgwrsar(model_GWR)

model_MGWR<-MGWRSAR(formula = 'Y_mgwr~X1+X2+X3', data = mydata,coord=coord, fixed_vars=c('Intercept'),kernels=c('gauss_adapt'),H=20, Model = 'MGWR',control=list(SE=TRUE))
summary_mgwrsar(model_MGWR)

test<-mgwrsar_bootstrap_test(model_MGWR,model_GWR,B=30,domc=FALSE,ncore=1,type='standard',eps='H1',df='H1',focal='median',D=NULL)

# result 
# > test
# > p_star   0
# > T 69.92265  

Prediction

In this example, we use rows 1 to 800 as insample data for model estimation and rows 801 to 1000 as outsample for prediction.

Prediction of GWR model using sheppard kernel with 8 neighbors:

model_GWR_insample<-MGWRSAR(formula = 'Y_gwr~X1+X2+X3', data = mydata[1:800,],coord=coord[1:800,], fixed_vars=NULL,kernels=c('gauss_adapt'),H=8, Model = 'GWR',control=list())
Y_pred=predict_mgwrsar(model_GWR_insample, newdata=mydata[801:1000,], newdata_coord=coord[801:1000,], k_extra = 8, kernel_extra = "sheppard")
head(Y_pred)
#> [1] 0.5111469 2.8323339 2.8674031 0.2058560 1.8421794 0.6162536
head(mydata$Y_gwr[801:1000])
#> [1] 0.2490662 2.8329445 2.7107657 0.1949003 1.7104271 0.6286141
sqrt(mean((mydata$Y_gwr[801:1000]-Y_pred)^2)) # RMSE
#> [1] 0.1343204

Prediction of MGWRSAR_1_0_kv model using sheppard kernel with 8 neighbors and Best Linear Unbiased Predictor (BLUP) :

model_MGWRSAR_1_0_kv_insample<-MGWRSAR(formula = 'Y_mgwrsar_1_0_kv~X1+X2+X3', data = mydata[1:800,],coord=coord[1:800,], fixed_vars=NULL,kernels=c('gauss_adapt'),H=20, Model = 'MGWRSAR_1_0_kv',control=list(W=W[1:800,1:800]))
summary_mgwrsar(model_MGWRSAR_1_0_kv_insample)
#> Call:
#> MGWRSAR(formula = "Y_mgwrsar_1_0_kv~X1+X2+X3", data = mydata[1:800, 
#>     ], coord = coord[1:800, ], fixed_vars = NULL, kernels = c("gauss_adapt"), 
#>     H = 20, Model = "MGWRSAR_1_0_kv", control = list(W = W[1:800, 
#>         1:800]))
#> Model: MGWRSAR_1_0_kv 
#> Method for spatial autocorrelation: MGWRSAR_1_0_kv 
#> Kernels function: gauss_adapt 
#> Kernels type: GD 
#> bandwidth: 20 
#> Number of data points: 800 
#> Varying parameters: Intercept X1 X2 X3  
#>         Intercept       X1       X2       X3  lambda
#> Min.     -7.05380  0.10543 -2.35128 -2.19224 -0.5765
#> 1st Qu.  -1.92244  0.39116  0.91679 -1.33372  0.0956
#> Median   -0.89820  0.54991  2.10489 -1.00608  0.2728
#> Mean     -0.99662  0.51309  2.61515 -1.09223  0.2588
#> 3rd Qu.  -0.20394  0.64410  4.09003 -0.72127  0.4379
#> Max.      1.98675  0.84522  9.77120 -0.43915  1.2524
#> Residual sum of squares: 352.3345

## Using Best Linear Unbiased Predictor 
Y_pred=predict_mgwrsar(model_MGWRSAR_1_0_kv_insample, newdata=mydata[801:1000,], newdata_coord=coord[801:1000,], k_extra = 12,  W = W, type = "BPN", kernel_extra = "sheppard")
head(Y_pred)
#>           [,1]
#> [1,] 0.8833344
#> [2,] 5.5199667
#> [3,] 4.1304483
#> [4,] 1.1905310
#> [5,] 2.9239718
#> [6,] 1.8684993
head(mydata$Y_mgwrsar_1_0_kv[801:1000])
#> [1] 0.5400965 5.0848714 3.9300001 1.1680551 2.4675708 0.9905510
sqrt(mean((mydata$Y_mgwrsar_1_0_kv[801:1000]-Y_pred)^2))
#> [1] 8.442365

## without Best Linear Unbiased Predictor
Y_pred=predict_mgwrsar(model_MGWRSAR_1_0_kv_insample, newdata=mydata[801:1000,], newdata_coord=coord[801:1000,], k_extra = 12,  W = W, type = "TC", kernel_extra = "sheppard")
head(Y_pred)
#> [1] 1.435196 5.666258 4.219273 1.388306 3.459105 1.706876
head(mydata$Y_mgwrsar_1_0_kv[801:1000])
#> [1] 0.5400965 5.0848714 3.9300001 1.1680551 2.4675708 0.9905510
sqrt(mean((mydata$Y_mgwrsar_1_0_kv[801:1000]-Y_pred)^2))
#> [1] 54.01216