diff --git a/fieldsVignette.html b/fieldsVignette.html index 84db09a..1709eb9 100644 --- a/fieldsVignette.html +++ b/fieldsVignette.html @@ -1,28 +1,28 @@ - +
- - + + - +fields vignetteFor many of these functions, we present their basic usage in this vignette. We have edited out secondary arguments in the presentation to focus on the essential options.
We have tried to present this vignette as a narrative that introduces spatial statistics, describes how to use the functions in fields, and also includes some relevant theory about splines and Kriging. The goal is to get the interested reader started quickly with many data based examples. Core functions in fields return S3 objects that are compatible with generic functions such as plot, predict, surface, etc. Thus, the analysis flow will seem familiar to other methods such as lm. If the reader is familiar with Kriging, they can skip to Chapters 6-8 to see some examples.
We begin by showing quick examples using some of fields’ most important functions. This gives the user an idea of what kind of problems can be solved and how to implement the functions in fields to solve them.
We begin by showing quick examples using some of fields' most important functions. This gives the user an idea of what kind of problems can be solved and how to implement the functions in fields to solve them.
In the third chapter we discuss fitting splines to univariate data. We begin with a simple time series data set, but the discussion applies more generally to data with one predictor variable x and one response variable y. The functions splint and sreg are used to fit cubic splines to this type of data. Then we introduce Tps, a core function in the package that can fit thin plate splines.
Chapters 4-6 build up to Kriging. We have tried give an intuitive presentation of the fundamental concepts in spatial statistics, and at the same time introduce the corresponding functions in fields. Initially, the Kriging linear algebra is done manually for exposition, and then the package’s most important function spatialProcess is presented in Chapter 7.
There is a brief interlude to discuss the package’s plotting functions as well as some image processing functions in Chapters 8-9. Many of them will have been used up to this point, but here we give a thorough description of the plotting options to be used with images, surfaces, and the package’s models. See quilt.plot to get a quick plot of raw data to discern spatial patterns. There are also several functions for working with image data specifically. The functions image.smooth and smooth.2d are capable of smoothing images, and interp.surface performs fast bilinear interpolation (often useful when zooming in on images from a grid).
Chapters 4-6 build up to Kriging. We have tried give an intuitive presentation of the fundamental concepts in spatial statistics, and at the same time introduce the corresponding functions in fields. Initially, the Kriging linear algebra is done manually for exposition, and then the package's most important function spatialProcess is presented in Chapter 7.
There is a brief interlude to discuss the package's plotting functions as well as some image processing functions in Chapters 8-9. Many of them will have been used up to this point, but here we give a thorough description of the plotting options to be used with images, surfaces, and the package's models. See quilt.plot to get a quick plot of raw data to discern spatial patterns. There are also several functions for working with image data specifically. The functions image.smooth and smooth.2d are capable of smoothing images, and interp.surface performs fast bilinear interpolation (often useful when zooming in on images from a grid).
In Chapter 10, we return to covariances and Kriging in more detail, including all of the covariance options in fields. Then we show how the user can write their own covariance function for use with all of the functionality in fields. This allows use to discuss three important wrappers on Krig in the following chapters.
In Chapter 11, we cover the Kriging optimization workhorse spatialProcess. The user only needs to specify the data and a covariance model, and then spatialProcess will perform a grid search for the optimal values of the rest of the parameters and return the spatial model. Optimization methods include restricted maximum likelihood (REML) and generalized cross validation (GCV). The user should exercise caution, however, when using black box optimization routines. We hope this chapter will help explain the optimization procedure and assist the user with fitting the right model parameters.
In Chapter 12, we introduce the concept of using a compactly supported covariance function to generate a sparse covariance matrix. The function mKrig (microKrig) is set up to take advantage of this sparsity in computation by using the spam package. We return to the Tps function in Chapter 13 with spatial data. The theory of thin plate splines is presented in the framework of Kriging. Finally, we show how the fastTps function is an analogue of mKrig that can similarly take advantage of sparse linear algebra.
spatialProcess with a covariatefields makes it easy to fit a spatial model from data, predict at arbitrary locations or on a grid, and plot the results. Additional covariates (e.g. elevation) can also be included in the linear trend. We can use predictSurface to predict on a grid and output a surface object, much like predict but more convenient for plotting.
fields makes it easy to fit a spatial model from data, predict at arbitrary locations or on a grid, and plot the results. Additional covariates (e.g. elevation) can also be included in the linear trend. We can use predictSurface to predict on a grid and output a surface object, much like predict but more convenient for plotting.
data( COmonthlyMet)
# predicting average daily minimum temps for spring in Colorado
obj<- spatialProcess( CO.loc, CO.tmin.MAM.climate, Z= CO.elev)
@@ -337,7 +398,7 @@ 2.2 spatialProcess w
US(add=TRUE, col="grey")
points(CO.loc[,1], CO.loc[,2], col="magenta", pch=21, bg="white")
title("Standard errors for avg Spring daily min. temp in CO")
-Computing standard errors can be computationally expensive. An alternative is to use sim.spatialProcess to conditionally simulate the process, and then use these simulations to approximate the standard errors. In practice, a large M (e.g. 100) should be used to produce many simulations to reduce the variability when finding sample standard deviations in the Monte Carlo sample.
Computing standard errors can be computationally expensive. An alternative is to use sim.spatialProcess to conditionally simulate the process, and then use these simulations to approximate the standard errors. In practice, a large M (e.g. 100) should be used to produce many simulations to reduce the variability when finding sample standard deviations in the Monte Carlo sample.
sim <- sim.spatialProcess(obj, xp=make.surface.grid(CO.Grid)[1:5,], Z=CO.elevGrid$z[1:5], M=30)
look <- as.surface(CO.Grid, t(sim[1,]))
look2 <- as.surface(CO.Grid, t(sim[2,]))
@@ -426,7 +487,7 @@ 3.1 splint and The splint function returns a vector of values of the interpolating spline evaluated at xgrid, while sreg returns a list of class sreg. Some of the relevant components of the sreg list are the original data x and y, the smoothness parameters lambda and df (same as trace), and the residuals and fitted.values. Note that the identical parameters lam and lambda both appear for covenience.
-First, we’ll use interpolation to fit the control group data. We use the x data coordinates as nodes to interpolate, and then evaluate the model at the grid locations to plot. Interpolation of a set of data points \((x_i,y_i)_{i=1}^n\) means that the interpolating function passes exactly through these points.
+First, we'll use interpolation to fit the control group data. We use the x data coordinates as nodes to interpolate, and then evaluate the model at the grid locations to plot. Interpolation of a set of data points \((x_i,y_i)_{i=1}^n\) means that the interpolating function passes exactly through these points.
grd <- seq(range(x)[1], range(x)[2], length.out = 400)
spl <- splint(x,y,grd)
plot(y~x, pch=".", cex=5, ylab="Median Food Intake", xlab="Days")
@@ -435,7 +496,7 @@ 3.1 splint and

-Often we want to smooth the data rather than interpolate, for example if we assumed possible error in data collection. Loosely, a smoothing function follows the general “trend” of the data, but it may not exactly interpolate the data points.
+Often we want to smooth the data rather than interpolate, for example if we assumed possible error in data collection. Loosely, a smoothing function follows the general "trend" of the data, but it may not exactly interpolate the data points.
The parameters df and lambda both control the smoothing in these functions. Interpolating the data corresponds to \(\lambda=0\) and df= the number of observations. As lambda increases, the spline gets smoother.
In place of lambda, a more useful scale is in terms of the effective number of parameters (degrees of freedom) associated with the estimate. We denote this by EDF (effective degrees of freedom). There is a 1-1 relationship between \(\lambda\) and EDF, and both are useful measures in slightly different contexts: \(\lambda\) for computing with the spatial process model versus EDF for data smoothing. The user should be aware that splint defaults to lambda = 0 (interpolation), while sreg defaults to using GCV to find lambda.
#Note this small lambda almost interpolates, close to splint model fit above
@@ -488,7 +549,7 @@ 3.2 sreg and S3 meth
## lambda trA GCV shat converge
## GCV 11.13 7.446 2.378 1.387 5
## GCV.one 11.13 7.446 2.378 1.387 5
-plot gives a series of four diagnostic plots describing the fit. For sreg, plot 1 shows data vs. predicted values, and plot 2 shows predicted values vs. residuals. Plot 3 shows the criteria to select the smoothing parameter \(\lambda = \sigma^2/\rho\). The x axis has transformed \(\lambda\) in terms of effective degrees of freedom to make it easier to interpret. Note that here the GCV function is minimized while the REML is maximized. Finally, plot 4 shows a histogram of the standard residuals.
+plot gives a series of four diagnostic plots describing the fit. For sreg, plot 1 shows data vs. predicted values, and plot 2 shows predicted values vs. residuals. Plot 3 shows the criteria to select the smoothing parameter \(\lambda = \sigma^2/\rho\). The x axis has transformed \(\lambda\) in terms of effective degrees of freedom to make it easier to interpret. Note that here the GCV function is minimized while the REML is maximized. Finally, plot 4 shows a histogram of the standard residuals.
set.panel(2,2)
plot(fit) # four diagnostic plots of fit

@@ -518,7 +579,7 @@ 3.3 Tps
The Tps function is used for fitting curves and surfaces to interpolate or smooth data by thin plate spline regression. This is a very useful technique and is often as informative as more complex methods.
The assumed model for Tps is additive. Namely, \[ Y_i = f(\mathbf{X}_i) + \epsilon_i,\]
where \(f(\mathbf{X})\) is a d dimensional surface, and the object is to fit a thin plate spline surface to irregularly spaced data. \(\epsilon_i\) are uncorrelated random errors with zero mean and variances \(\sigma^2 / w_i\).
-This function also works for just a single dimension and is a special case of a spatial process estimate (Kriging). A “fast” version of this function uses a compactly supported Wendland covariance, computing the estimate for a fixed smoothing parameter.
+This function also works for just a single dimension and is a special case of a spatial process estimate (Kriging). A "fast" version of this function uses a compactly supported Wendland covariance, computing the estimate for a fixed smoothing parameter.
\({\large{\mathbf{Basic \> Usage}}}\)
@@ -530,7 +591,7 @@ 3.3 Tps
Returns a list of class Krig/mKrig (see below), which includes the predicted surface of fitted.values and residuals. Also includes the results of the GCV grid search in gcv.grid. Note that the GCV/REML optimization is done even if lambda or df is given. Please see the documentation on Krig for details of the returned arguments.
-Tps is a “wrapper function”, which is why a Krig object is returned. Moreover, any argument that can be used in a call to Krig can be used in Tps. For example, the user can specify lambda or df just as in sreg. As seen in the minimization problem, lambda determines the weight put on the smoothness condition, giving it the same interpretation. lambda is the most important argument for Tps, which is estimated by GCV if omitted. We can also includes covariates in the linear part of the model by passing the argument Z.
+Tps is a "wrapper function", which is why a Krig object is returned. Moreover, any argument that can be used in a call to Krig can be used in Tps. For example, the user can specify lambda or df just as in sreg. As seen in the minimization problem, lambda determines the weight put on the smoothness condition, giving it the same interpretation. lambda is the most important argument for Tps, which is estimated by GCV if omitted. We can also includes covariates in the linear part of the model by passing the argument Z.
Tps and fastTps are special cases of using the Krig and mKrig functions, respectively. The Tps estimator is implemented by passing the right generalized covariance function based on a radial basis function (RBF) to the more general function Krig. One advantage of this implementation is that once a Tps/Krig object is created the estimator can be found rapidly for other data and smoothing parameters provided the locations remain unchanged. This makes simulation within R efficient (see example below).
Notice how similar the summary is to sreg’s summary. sreg is actually a special case of Tps (as shown below – note that lambda is not equal for the two functions!) The m=2 default for Tps and leaving the data unscaled results in the cubic smoothing spline sreg.
Notice how similar the summary is to sreg's summary. sreg is actually a special case of Tps (as shown below -- note that lambda is not equal for the two functions!) The m=2 default for Tps and leaving the data unscaled results in the cubic smoothing spline sreg.
# compare sreg and Tps results to show the adjustment to lambda.
predict( fit)-> look
predict( fit.tps, lambda=fit$lambda*fit$N)-> look2
@@ -617,13 +678,14 @@ 3.5 Confidence intervals with
-# or try the more visually honest:
plot( fit.tps$x, fit.tps$y)
lines( fit.tps$predicted, lwd=2)
@@ -636,20 +698,23 @@ 3.5 Confidence intervals with x <- WorldBankCO2[,'Pop.urb']
y <- log10(WorldBankCO2[,'CO2.cap'])
out.fast <- fastTps(x,y,lambda=2, theta=20)
-plot(x,y)
-xgrid<- seq( min(x), max(x),,300)
+
+plot(x,y, xlab = "Urban Percent of Population", ylab = "CO2 Emissions Per Capita")
+
+xgrid<- seq( min(x), max(x), length.out = 300)
fhat.fast <- predict( out.fast,xgrid)
-#se.fast <- predictSE( out.fast)
+se.fast <- predictSE( out.fast, x = xgrid)
+
lines( xgrid, fhat.fast)
-#lines( xgrid, fhat.fast+1.96*se.fast, col=2, lty=2)
-#lines( xgrid, fhat.fast-1.96*se.fast, col=2, lty=2)
-title('fastTps fit')
-Here we’ll cover the fundamentals of spatial statistics. In particular, we will examine the basic definitions and concepts of the field before giving a geostatistics mini-course. As this is a vignette, we can only scratch the theory behind spatial statistics. A few useful resources are:
+Here we'll cover the fundamentals of spatial statistics. In particular, we will examine the basic definitions and concepts of the field before giving a geostatistics mini-course. As this is a vignette, we can only scratch the theory behind spatial statistics. A few useful resources are:
Statistics for Spatial Data by Noel Cressie
Handbook of Spatial Statistics by Alan E. Gelfand, Peter J. Diggle, Montserrat Fuentes, and Peter Guttorp
Let’s return to a previous quilt.plot. Suppose we want to predict the maximum March/April/May temperature at a new location \((-103,39.5)\) in Colorado, which is shown below as the x.
Let's return to a previous quilt.plot. Suppose we want to predict the maximum March/April/May temperature at a new location \((-103,39.5)\) in Colorado, which is shown below as the x.
An initial thought may be to use OLS regression with covariates of \(\texttt{longitude}\) and \(\texttt{latitude}\).
-Recall Tobler’s First Law of Geography: “Everything is related to everything else, but near things are more related than distant things.” We see this reflected in the plot above, as temperatures in one area are similar to the temperature in nearby areas. This spatial autocorrelation violates many typical statistical assumptions – our observations are no longer independent! Fortunately, in the field of spatial statistics, we can relax our assumption of independence and work with observations that are spatially dependent.
-To make a “spatial prediction”, we need some way to quantify the change in our observation variable (temperature) as a function of distance.
+Recall Tobler's First Law of Geography: "Everything is related to everything else, but near things are more related than distant things." We see this reflected in the plot above, as temperatures in one area are similar to the temperature in nearby areas. This spatial autocorrelation violates many typical statistical assumptions -- our observations are no longer independent! Fortunately, in the field of spatial statistics, we can relax our assumption of independence and work with observations that are spatially dependent.
+To make a "spatial prediction", we need some way to quantify the change in our observation variable (temperature) as a function of distance.
A covariance function \(\operatorname{Cov}(Y(\mathbf x), Y(\mathbf x'))\) describes the joint variability between a stochastic process \(Y(\cdot)\) at two locations \(\mathbf{x}\) and \(\mathbf{x}'\). This covariance function is vital in spatial prediction. The fields package includes common parametric covariance families (e.g. exponential and Matern) as well as nonparametric models (e.g. radial and tensor basis functions).
A covariance function \(\operatorname{Cov}(Y(\mathbf x), Y(\mathbf x'))\) describes the joint variability between a stochastic process \(Y(\cdot)\) at two locations \(\mathbf{x}\) and \(\mathbf{x}'\). This covariance function is vital in spatial prediction. The fields package includes common parametric covariance families (e.g. exponential and Matern) as well as nonparametric models (e.g. radial and tensor basis functions).
When modeling \(\operatorname{Cov}(Y(\mathbf{x}), Y(\mathbf{x}'))\), we are often forced make simplifying assumptions.
Stationarity assumes we can represent the covariance function as \[
@@ -679,7 +744,7 @@ Matern. \(\operatorname{Cov}(Y(\mathbf x), Y(\mathbf x')) = C(r) = \rho \left( \frac{2^{1-\nu}}{\Gamma(\nu)} \left( \frac{ r}{\theta} \right)^\nu K_\nu \left( \frac{r}{\theta} \right) \right) + \sigma^2 \mathbf{1}_{ \mathbf x = \mathbf x'}\), where \(K_{\nu}\) is a modified Bessel function of the second kind, of order \(\nu\).4.2 Covariance
Note that the Matern covariance function depends on parameters \((\rho,\theta,\nu,\sigma^2)\), and the Exponential covariance depends on parameters \((\rho,\theta,\sigma^2)\). The parameters \(\rho,\theta,\sigma^2\) and \(\nu\) respectively denote the marginal variance (or sill), range, nugget effect, and smoothness of our process.
-The range \(\theta\) of the process is the distance at which observations become uncorrelated. The sill \(\rho\) is the marginal variance of the spatial process. The nugget effect \(\sigma^2\) corresponds to small-scale variation such as measurement error. The smoothness \(\nu\) corresponds to how “smooth” our spatial process appears. An illustration in the vgram section shows the visual interpretation of these parameters.
The range \(\theta\) of the process is the distance at which observations become uncorrelated. The sill \(\rho\) is the marginal variance of the spatial process. The nugget effect \(\sigma^2\) corresponds to small-scale variation such as measurement error. The smoothness \(\nu\) corresponds to how "smooth" our spatial process appears. An illustration in the vgram section shows the visual interpretation of these parameters.
For such isotropic covariances, the fields packages uses relies on the function rdist, which is used to calculate the distance between two sets of locations.
rdistrdistIt is easy two find the distance between two points on the quilt.plot. Here, we use rdist.earth since our vectors consist of longitude/latitude pairs.
three.locations <- grd.CO[1:3,]
#location one is (-109.10, 36.90)
@@ -734,7 +799,7 @@ 4.2.1 rdist
4.2.2 Exponential and Matern covariances
The Exponential and Matern functions produce these isotropic covariances in R. The inputs are a matrix of distances d, either the range parameter range (this is \(\theta\) in our notation) or alpha = 1/range, and finally the marginal variance phi (which is \(\rho\) in our notation), and the smoothness nu for the Matern covariance. Note that the exponential covariance is identical to the Matern covariance with smoothnes nu = 0.5.
There are other (generalized) covariances in fields, such as RadialBasis and Wendland, that will be discussed later. To set the nugget variance \(\sigma^2\), we use the sigma2 argument in the functions Krig or mKrig that we also discuss later.
-Below are plots that illustrate the shape of these covariance functions. The Exponential, Matern, and Wendland correlations (marginal variances \(\rho=1\)) behave exactly how one would expect. That is, the covariance between two nearby objects is close to 1, indicating a strong positive correlation among the two observations. The appearance of the Radial-Basis correlation may be shocking – we will return to it later.
+Below are plots that illustrate the shape of these covariance functions. The Exponential, Matern, and Wendland correlations (marginal variances \(\rho=1\)) behave exactly how one would expect. That is, the covariance between two nearby objects is close to 1, indicating a strong positive correlation among the two observations. The appearance of the Radial-Basis correlation may be shocking -- we will return to it later.
\({\large{\mathbf{Basic \> Usage}}}\)
@@ -799,7 +864,7 @@ 4.2.4 Exp.cov and \({\large{\mathbf{Basic \> Usage}}}\)
Exp.cov(x1, x2=NULL, theta = 1, p=1)
-stationary.cov(x1, x2=NULL, Covariance = “Exponential”, Distance = “rdist”, theta = 1)
+stationary.cov(x1, x2=NULL, Covariance = "Exponential", Distance = "rdist", theta = 1)
\({\large{\mathbf{Value}}}\)
If the argument C is NULL, then the (cross-)covariance matrix is returned. If C is a vector of length n, then returned value is the multiplication of the (cross-)covariance matrix with this vector. The C functionality is used internally by all fields Kriging-related functions.
@@ -828,7 +893,7 @@ 4.2.5 Simulating Random Fields (<
Take the Cholesky Decomposition \(\Sigma = L L^T\).
Multiply \(L \boldsymbol{\epsilon}\), where \(\boldsymbol{\epsilon} \sim N(\mathbf 0, I)\).
-The resulting random vector \(L \boldsymbol \epsilon\) is an exact simulation of a Gaussian process. The only problem with this approach is the computations and storage grow rapidly for larger grids. For example, a \(128 \times 128\) image would mean that the dimension of \(\Sigma\) is huge (\(16,000 \times 16,000\)) and effectively prohibit the use of the Cholesky decomposition. For rectangular grids, one can sometimes simulate Gaussian random fields very efficiently using an alternative algorithm called “circulant embedding” that is based on the Fast Fourier Transform. The restrictions are that the covariance function needs to be stationary and the correlation range needs to the small relative to the size of the domain (see Chan and Wood, 1994).
+The resulting random vector \(L \boldsymbol \epsilon\) is an exact simulation of a Gaussian process. The only problem with this approach is the computations and storage grow rapidly for larger grids. For example, a \(128 \times 128\) image would mean that the dimension of \(\Sigma\) is huge (\(16,000 \times 16,000\)) and effectively prohibit the use of the Cholesky decomposition. For rectangular grids, one can sometimes simulate Gaussian random fields very efficiently using an alternative algorithm called "circulant embedding" that is based on the Fast Fourier Transform. The restrictions are that the covariance function needs to be stationary and the correlation range needs to the small relative to the size of the domain (see Chan and Wood, 1994).
The sim.rf function uses circulant embedding to simulate a stationary Gaussian random field (GRF) on a regular grid with unit marginal variance (i.e. \(\rho=1\)). Note that the marginal variance is readily changed by scaling the resulting GRF.
A limitation when using sim.rf presents itself with Error in sim.rf(obj) : FFT of covariance has negative values. This comes from the circulant embedding method used to create the GRF; in short, it occurs when the correlation range is too large. One fix is to increase the domain size so this correlation is then small relative to the size of the domain.
The input of sim.rf is a list that includes information about the covariance function, its FFT, and the grid for evaluation. Usually this is created by a setup call image.cov (i.e. Exp.image.cov, matern.image.cov, stationary.image.cov).
@@ -838,7 +903,7 @@ 4.2.5 Simulating Random Fields (<
sim.rf(obj)
matern.image.cov(ind1, ind2, Y, cov.obj = NULL, setup = FALSE, grid ,theta= 1.0, smoothness=.5)
-Exp.image.cov(ind1, ind2, Y, cov.obj = NULL, setup = FALSE, grid, …)
+Exp.image.cov(ind1, ind2, Y, cov.obj = NULL, setup = FALSE, grid, ...)
\({\large{\mathbf{Value}}}\)
A matrix with the random field values.
@@ -874,7 +939,7 @@ 4.3 Variograms
The definition is readily translated to an empirical variogram. Let \(N(h)\) be the set of pairs of observations \(y_i, y_j\) such that \(\| \mathbf x_i - \mathbf x_j \| = h\). The empirical variogram (vgram) is defined as \[
\widehat{\gamma}(h) = \frac{1}{2 \cdot|N(h)| } \sum_{(i,j) \in N(h)} ( y_i - y_j )^2
\]
-The default “cloud” variogram is done for each distance \(h\), but it is noisy and hard to interpret. Often, we specify a number of bins N in which values for nearby values are averaged. In this way, the choices for a variogram may seem similar to finding a histogram. The breaks argument allows the user to explicitly provide distances at which bins are created.
+The default "cloud" variogram is done for each distance \(h\), but it is noisy and hard to interpret. Often, we specify a number of bins N in which values for nearby values are averaged. In this way, the choices for a variogram may seem similar to finding a histogram. The breaks argument allows the user to explicitly provide distances at which bins are created.
crossCoVGram is the same as vgram but differences are taken across different variables rather than the same variable. boxplotVGram uses the base R boxplot function to display the variogram neatly.
@@ -903,7 +968,7 @@ 4.3 Variograms
ind: a two column matrix giving the x and y increment used to compute the shifts
vgram.full: the variogram at each of these separations
-vgram.robust: Cressie’s version of a robust variogram statistic.
+vgram.robust: Cressie's version of a robust variogram statistic.
@@ -923,9 +988,9 @@ 4.3 Variograms
4.4 Simple Kriging
The additive spatial model is \[
-Y(\mathbf x) = \mathbf Z(\mathbf x) \mathbf{d} + g(\mathbf x) + \epsilon(\mathbf x),\] where \(Y(\cdot)\) is an observation, \(\mathbf Z \mathbf{d}\) is a deterministic (nonrandom) product of covariates \(\mathbf Z\) with a weight vector \(\mathbf d\) that acts as a mean function, \(g(\cdot) \sim N(\mathbf 0, \rho \mathbf K)\) is spatial process, and \(\epsilon(\cdot) \sim N( \mathbf 0, \sigma^2 \mathbf I)\) is error (i.e. white noise).
+Y(\mathbf x) = \mathbf Z(\mathbf x) \mathbf{d} + g(\mathbf x) + \epsilon(\mathbf x),\] where \(Y(\cdot)\) is an observation, \(\mathbf Z \mathbf{d}\) is a deterministic (nonrandom) product of covariates \(\mathbf Z\) with a weight vector \(\mathbf d\) that acts as a mean function, \(g(\cdot) \sim N(\mathbf 0, \rho \mathbf K)\) is spatial process, and \(\epsilon(\cdot) \sim N( \mathbf 0, \sigma^2 \mathbf I)\) is error (i.e. white noise).
Thus, we assume that our observations \(Y(\cdot)\) are Gaussian; in particular, \(\mathbf Y \sim N( \mathbf{Zd}, \rho \mathbf K + \sigma^2 \mathbf I)\)
-For now, we’ll assume \(\mathbf Z\mathbf{d} \equiv \mathbf 0\) and focus on the spatial aspect \(g(\mathbf x)\). This is known as simple kriging, where the mean of the stochastic process is known. Denote our locations of observed points as \(\mathbf x_1,\dots,\mathbf x_n\).
+For now, we'll assume \(\mathbf Z\mathbf{d} \equiv \mathbf 0\) and focus on the spatial aspect \(g(\mathbf x)\). This is known as simple kriging, where the mean of the stochastic process is known. Denote our locations of observed points as \(\mathbf x_1,\dots,\mathbf x_n\).
Suppose we want to predict \(Y(\cdot)\) at a new location \(\mathbf x_0\). The kriging estimate \(\hat Y(\mathbf x_0)\) of our mean zero Gaussian process \(Y(\cdot)\) is given by: \[
\hat Y(\mathbf x_0) = \Sigma_0 \Sigma^{-1} \mathbf Y
\]
@@ -935,7 +1000,7 @@ 4.4 Simple Kriging
\(\Sigma = \left( \operatorname{Cov}[ Y(\mathbf x_i),Y(\mathbf x_j) ] \right)_{i,j=1}^n\) is the \(n \times n\) covariance matrix of \(Z(\cdot)\) with \(ij\)-entry equal to \(\operatorname{Cov}[Y(\mathbf x_i),Y(\mathbf x_j)]\).
Briefly, when the covariance parameters are known, the Kriging estimate represents the best prediction at \(\mathbf x_0\) based on a linear combination of the observations. By best we mean in terms of mean squared error.
-In the next section, we see an example of how to perform simple kriging. Other types of kriging (i.e. ordinary kriging and universal kriging) are explained in the appendix.
+In the next section, we see an example of how to perform simple kriging. Other types of kriging (i.e. ordinary kriging and universal kriging) are explained in the appendix.
For now, we’ll assume a zero mean (\(\mathbf Z\mathbf{d} \equiv \mathbf 0\)) and focus on the spatial aspect, which is clearly an incorrect assumption. One can remove the mean trend using least squares, but then this requires universal Kriging instead of simple Kriging. We prefer to start with simple Kriging in this exposition. Details for universal Kriging can be done in the appendix. Denote our locations of observed points as \(\mathbf x_1,\dots,\mathbf x_n\). In this example, \(n=213\) since we have 213 temperature observations.
+For now, we'll assume a zero mean (\(\mathbf Z\mathbf{d} \equiv \mathbf 0\)) and focus on the spatial aspect, which is clearly an incorrect assumption. One can remove the mean trend using least squares, but then this requires universal Kriging instead of simple Kriging. We prefer to start with simple Kriging in this exposition. Details for universal Kriging can be done in the appendix. Denote our locations of observed points as \(\mathbf x_1,\dots,\mathbf x_n\). In this example, \(n=213\) since we have 213 temperature observations.
Suppose we want to predict the temperature \(Y(\cdot)\) at a new location \(\mathbf x_0\). Recall that the simple Kriging estimate \(\hat Y(\mathbf x_0)\) of our mean zero Gaussian process \(Y(\cdot)\) is given by: \[ \hat Y(\mathbf x_0) = \Sigma_0 \Sigma^{-1} \mathbf Y \]
-Now, the question is how to determine the covariance function of \(Y(\cdot)\). For this example, we’ll consider the (isotropic) exponential covariance: \(\operatorname{Cov}[ Y(\mathbf x), Y(\mathbf x')] = \rho e^{ - \| \mathbf x - \mathbf x'\|/\theta}\). Here, \(\theta\) is a range parameter that corresponds to the length of spatial autocorrelation of our process, and \(\rho\) is the overall variance of the process.
+Now, the question is how to determine the covariance function of \(Y(\cdot)\). For this example, we'll consider the (isotropic) exponential covariance: \(\operatorname{Cov}[ Y(\mathbf x), Y(\mathbf x')] = \rho e^{ - \| \mathbf x - \mathbf x'\|/\theta}\). Here, \(\theta\) is a range parameter that corresponds to the length of spatial autocorrelation of our process, and \(\rho\) is the overall variance of the process.
The covariance matrix \(\Sigma\) thus has \(i,j\)-th entry \[ \Sigma_{i,j} = e^{ - \| \mathbf x_i - \mathbf x_j\|/\theta} + \sigma^2 \mathbf{1}_{ \{ \mathbf x_i = \mathbf x_j \} }, \] where \(\mathbf{1}_{ \{ \mathbf x_i = \mathbf x_j \} }= \begin{cases} 1 & \mathbf x_i = \mathbf x_j \\ 0 & \text{else} \end{cases}\)
@@ -1068,7 +1133,7 @@The elevation plot is a near inverse of our prediction grid – this agrees with our intuition that there are lower temperatures at higher elevations. If we include a covariate Z corresponding to elevation, then we can produce even more accurate predictions (a sneak-peek is shown below).
The elevation plot is a near inverse of our prediction grid -- this agrees with our intuition that there are lower temperatures at higher elevations. If we include a covariate Z corresponding to elevation, then we can produce even more accurate predictions (a sneak-peek is shown below).
spatialProcessspatialProcess represents one-stop shopping for fitting a spatial model in fields. It was designed so that users are able to quickly fit models and visualize them. We also add one caution: it hides several default model choices from the user. After a covariance model is chosen, a grid search is performed to find the optimal values of the parameters theta, rho, and sigma2, and then the spatial model is computed with these estimated parameters. Any other covariance parameters (e.g. the smoothness) need to be specified. The default call uses the covariance function stationary.cov, specifically with the Matern and smoothness nu=1. The optimization is done by MLESpatialProcess.
spatialProcess represents one-stop shopping for fitting a spatial model in fields. It was designed so that users are able to quickly fit models and visualize them. We also add one caution: it hides several default model choices from the user. After a covariance model is chosen, a grid search is performed to find the optimal values of the parameters theta, rho, and sigma2, and then the spatial model is computed with these estimated parameters. Any other covariance parameters (e.g. the smoothness) need to be specified. The default call uses the covariance function stationary.cov, specifically with the Matern and smoothness nu=1. The optimization is done by MLESpatialProcess.
A good way to think about the spatial model functions in fields is in the context of engines and wrappers. The two engines in fields which perform the Kriging computations are Krig and mKrig. Krig is a more numerically stable algorithm: if a covariance matrix is close to singular, mKrig will fails whereas Krig may still be able to perform the calculations. On the other hand, mKrig is set up to handle large data sets with sparse covariance matrices by using the SPAM package.
A wrapper is an easy to use function that is based on an underlying function or engine. Often default parameters values are set to make the function accessible to the user. The function Tps is a wrapper on Krig. Tps performs thin plate spline regression, which is a special case of Kriging, so it makes sense to use the Krig engine. On the other hand, the functions fastTps and spatialProcess are wrappers for mKrig. See the chapters and examples on these functions for specifics.
There is always a hazard in providing a simple to use method that makes many default choices for the spatial model. As in any analysis be aware of these choices and try alternative models and parameter values to assess the robustness of your conclusions. Also examine the residuals to check the adequacy of the fit!
@@ -1116,9 +1181,9 @@spatialProcess\({\large{\mathbf{Basic \> Usage}}}\)
-spatialProcess(x, y, Z = NULL, cov.function = “stationary.cov”,
+
-\(\> \> \> \quad\) cov.args = list(Covariance = “Matern”, smoothness = 1), theta = NULL)
-Krig(x, Y, theta, Covariance=“Matern”, smoothness, Distance=“rdist”)spatialProcess(x, y, Z = NULL, cov.function = "stationary.cov",
+\(\> \> \> \quad\) cov.args = list(Covariance = "Matern", smoothness = 1), theta = NULL)
+Krig(x, Y, theta, Covariance="Matern", smoothness, Distance="rdist")
\({\large{\mathbf{Value}}}\)
spatialProcess returns object of classes mKrig and spatialProcess. The main difference from mKrig is an extra component MLEInfo that has the results of the grid evaluation over theta (maximizing lambda), joint maximization over theta and lambda, and a grid evaluation over lambda with theta fixed at its MLE. The Krig function produces an object of class Krig.
The summary shows the value of the profile log likelihood function, the estimates for the parameters, and the number of evaluations needed in the optimization.
-We can use plot to view diagnostic plots of the fit. The first three plots are the same as sreg and mKrig objects. Plot 1 shows data vs. predicted values, and plot 2 shows predicted values vs. residuals. Plot 3 shows the criteria to select the smoothing parameter \(\lambda = \sigma^2 / \rho\). The x axis has transformed \(\lambda\) in terms of effective degrees of freedom to make it easier to interpret. Note that here the GCV function is minimized while the REML is maximized.
We can use plot to view diagnostic plots of the fit. The first three plots are the same as sreg and mKrig objects. Plot 1 shows data vs. predicted values, and plot 2 shows predicted values vs. residuals. Plot 3 shows the criteria to select the smoothing parameter \(\lambda = \sigma^2 / \rho\). The x axis has transformed \(\lambda\) in terms of effective degrees of freedom to make it easier to interpret. Note that here the GCV function is minimized while the REML is maximized.
One of the main features of spatialProcess is the ability to optimize over theta as well as the usual lambda/rho/sigma combination. For this reason, plot 4 shows the profile likelihood versus values of theta instead of a histogram of the standard residuals displayed in the plots of other class objects
set.panel(2,2)
plot(fit)
@@ -1186,7 +1251,7 @@ fit <- spatialProcess(x, y)
surface(fit,col=topo.colors(100))
Note that surface displays the full model, except Z covariates if they are included; i.e. by default Z is dropped. To additionally include the Z covariates, one can use predict or predictSurface. In these functions, the default is to retain Z in the computation.
Note that surface displays the full model, except Z covariates if they are included; i.e. by default Z is dropped. To additionally include the Z covariates, one can use predict or predictSurface. In these functions, the default is to retain Z in the computation.
We can use our spatial model to predict on a grid using predict, or predictSurface which is more convenient for plotting.
grid.list <- list(x=seq(0,17), y=seq(0,24) )
pred.grid <- predict(fit, grid=grid.list)
@@ -1237,7 +1302,7 @@ 6.2 Analysis of Coal Ash
ozone2Now we consider the ozone2 dataset in fields, which consists of CO2 observations over 89 days at a set of 153 locations. We’ll restrict ourselves to day 16 (June 18, 1987) and remove all NA values. We fit several covariance models: Matern, exponential, and Wendland. We first visually inspect the different surfaces, then we compare the parameters of the three models using summary.
Now we consider the ozone2 dataset in fields, which consists of CO2 observations over 89 days at a set of 153 locations. We'll restrict ourselves to day 16 (June 18, 1987) and remove all NA values. We fit several covariance models: Matern, exponential, and Wendland. We first visually inspect the different surfaces, then we compare the parameters of the three models using summary.
data( ozone2)
x<- ozone2$lon.lat
y<- c(ozone2$y[16,])
@@ -1261,7 +1326,7 @@ 6.3 ozone2
cov.args = list(Covariance = "Wendland",
dimension = 2, k = 2) )
rbind(obj$summary,obj2$summary,obj3$summary)
-Since the exponential is a Matern covariance with \(\nu = 0.5\), the first two fits can be compared in terms of their likelihoods. The REML value is slightly higher for obj verses obj2 (\(598.4 > 596.7\)). These are the negative log likelihoods so this suggests a preference for the exponential model. But… does it really matter in terms of spatial prediction?
Since the exponential is a Matern covariance with \(\nu = 0.5\), the first two fits can be compared in terms of their likelihoods. The REML value is slightly higher for obj verses obj2 (\(598.4 > 596.7\)). These are the negative log likelihoods so this suggests a preference for the exponential model. But... does it really matter in terms of spatial prediction?
library(sp) #for colors
set.panel(1,3)
surface(obj, col=heat.colors(100),smallplot= c(.88,.9,0.2,.8))
@@ -1305,7 +1370,7 @@ 6.3 ozone2
COmonthlyMetWe’ll return to the dataset COmonthlyMet. When using other fields functions for Kriging such as Krig and mKrig, the user has to pass in a range theta, which can be initally guessed for a quick fit by looking at a variogram.
We'll return to the dataset COmonthlyMet. When using other fields functions for Kriging such as Krig and mKrig, the user has to pass in a range theta, which can be initally guessed for a quick fit by looking at a variogram.
Note that the estimated theta without including the Z covariate CO.elev will probably change if do we include Z.
Ultimately, using a variogram to determine spatial parameters for our covariance is not a reliable method. A more rigorous way to determine parameters is via Maximum Likelihood, which is performed within spatialProcess. To be precise, the parameters theta, rho, and sigma2 are estimated using Maximum Likelihood, and the user only needs to input x, y, and a type of covariance. (Note: If a Matern covariance is used, then smoothness must be supplied. See below how one might select between different smoothness values.)
Of course, spatialProcess runs slower than the functions would with a user-supplied theta. One must take caution when using spatialProcess with large datasets. Below, we run through this climate example using spatialProcess to construct our model.
COmonthlyMetObserve the largely different values for parameters, but similar values for log-likelihood. Note that there is an interplay between smoothness and range, so directly comparing theta is not appropriate. Will the underlying spatial process be much different if we vary the smoothness? We’ll use the drop.Z command to make the differences in the spatial process clear.
Observe the largely different values for parameters, but similar values for log-likelihood. Note that there is an interplay between smoothness and range, so directly comparing theta is not appropriate. Will the underlying spatial process be much different if we vary the smoothness? We'll use the drop.Z command to make the differences in the spatial process clear.
pred0.5 <- predictSurface.Krig( out0.5,grid.list=CO.Grid,ZGrid=CO.elevGrid,drop.Z=TRUE)
pred1 <- predictSurface.Krig( out1,grid.list=CO.Grid,ZGrid=CO.elevGrid,drop.Z=TRUE)
@@ -1473,7 +1538,7 @@ 7.3 grid.list, \({\large{\mathbf{Basic \> Usage}}}\)
make.surface.grid(grid.list)
-as.surface(obj, z, order.variables=“xy”)
+as.surface(obj, z, order.variables="xy")
\({\large{\mathbf{Value}}}\)
The result will be a list with x, y, and z components suitable for plotting with functions such as persp, image, surface, contour, drape.plot.
@@ -1512,12 +1577,12 @@ 7.3 grid.list, Using surface combines image.plot with contour as its default.
surface(look.surface)

-There is a graphics function persp which can produce 3-d plots. drape.plot is a wrapper function in fields that automatically adds color based on how it’s done in image.plot/surface.
+There is a graphics function persp which can produce 3-d plots. drape.plot is a wrapper function in fields that automatically adds color based on how it's done in image.plot/surface.
set.panel(1,2)
persp(look.surface)
drape.plot(look.surface)

-In summary, we fit a spatialProcess object from our data, created a “grid.list”, constructed a full set of gridded locations using the “grid.list,” found Kriging estimates at the gridded locations, converted back to a “grid.list”, and then plotted with surface. Compare this with the convenient surface.mKrig.
+In summary, we fit a spatialProcess object from our data, created a "grid.list", constructed a full set of gridded locations using the "grid.list," found Kriging estimates at the gridded locations, converted back to a "grid.list", and then plotted with surface. Compare this with the convenient surface.mKrig.
fit <- spatialProcess(RMprecip$x, RMprecip$y, theta=20)
surface(fit)

@@ -1530,7 +1595,7 @@ 7.4 surface
\({\large{\mathbf{Basic \> Usage}}}\)
-surface( object, grid.list = NULL, extrap = FALSE, type=“C”)
+surface( object, grid.list = NULL, extrap = FALSE, type="C")
@@ -1674,8 +1739,8 @@ 8.3 drape.plot and <
\({\large{\mathbf{Basic \> Usage}}}\)
drape.plot(x, y, z, z2=NULL, col = tim.colors(64), zlim = range(z, na.rm=TRUE), zlim2 = NULL)
-drape.color(z, col = tim.colors(64), zlim = NULL,breaks,transparent.color = “white”)
-pushpin( x,y,z,p.out, height=.05,col=“black”,text=NULL,adj=-.1,cex=1.0,…)
+drape.color(z, col = tim.colors(64), zlim = NULL,breaks,transparent.color = "white")
+pushpin( x,y,z,p.out, height=.05,col="black",text=NULL,adj=-.1,cex=1.0,...)
\({\large{\mathbf{Value}}}\)
If an assignment is made, the projection matrix from persp is returned. This information can be used to add additional 3-d features to the plot, such as pushpin. See the persp help file for an example how to add additional points and lines using the trans3d function and also the example below.
@@ -1723,17 +1788,17 @@ 8.4 Color palettes
tim.colors gives a very nice spectrum that is close to the jet color scale in MATLAB® and is the default for image.plot and the like.
larry.colors is particularly useful for visualizing fields of climate variables.
two.colors is really about three different colors. For other colors try fields.color.picker to view possible choices. start="darkgreen", end="azure4" are the options used to get a nice color scale for rendering aerial photos of ski trails. (http://www.image.ucar.edu/Data/MJProject)
-designer.colors is the master function for two.colors and tim.colors. It can be useful if one wants to customize the color table to match quantiles of a distribution. e.g. if the median of the data is at .3 with respect to the range then set x equal to c(0,.3,1) and specify three colors to provide a transtion that matches the median value. In fields language, this function interpolates between a set of colors at locations x. While you can be creative about choosing these colors, just using another color scale as the basis is easy. For example, designer.color( 256, rainbow(4), x= c( 0,.2,.8,1.0)) leaves the choice of the colors to Dr. R after a thunderstorm.
+designer.colors is the master function for two.colors and tim.colors. It can be useful if one wants to customize the color table to match quantiles of a distribution. e.g. if the median of the data is at .3 with respect to the range then set x equal to c(0,.3,1) and specify three colors to provide a transtion that matches the median value. In fields language, this function interpolates between a set of colors at locations x. While you can be creative about choosing these colors, just using another color scale as the basis is easy. For example, designer.color( 256, rainbow(4), x= c( 0,.2,.8,1.0)) leaves the choice of the colors to Dr. R after a thunderstorm.
color.scale assigns colors to a numerical vector in the same way as the image function. This is useful to keep the assigment of colors consistent across several vectors by specifiying a common zlim range. Finally, plotColorScale is a simple function to plot a vector of colors to examine their values.
-Also, don’t forget the built in heat.colors, topo.colors, terrain.colors, grey.colors, etc.
+Also, don't forget the built in heat.colors, topo.colors, terrain.colors, grey.colors, etc.
\({\large{\mathbf{Basic \> Usage}}}\)
tim.colors(n = 64, alpha=1.0)
larry.colors()
-two.colors(n=256, start=“darkgreen”, end=“red”, middle=“white”,alpha=1.0)
-designer.colors( n=256, col= c(“darkgreen”, “white”, “darkred”), x=seq(0,1,, length(col)) ,alpha=1.0)
+two.colors(n=256, start="darkgreen", end="red", middle="white",alpha=1.0)
+designer.colors( n=256, col= c("darkgreen", "white", "darkred"), x=seq(0,1,, length(col)) ,alpha=1.0)
\({\large{\mathbf{Value}}}\)
A vector giving the colors in a hexadecimal format, two extra hex digits are added for the alpha channel.
@@ -1766,7 +1831,7 @@ 8.5 interp.surface
We recommend this method for fast visualization and presentation of images. However, if the actual interpolated values will be used for analysis, use a statistical method such as Tps or fastTps to get an interpolation that is based on first principles.
Here is a brief explanation of the interpolation: Suppose that the location (locx, locy) lies in between the first two grid points in both x and y. That is, locx is between x1 and x2 and locy is between y1 and y2. Let \(e_x= (l_1-x_1)/(x_2-x_1)\) and \(e_y= (l_2-y_1)/(y_2-y_1)\). The interpolant is
\[(1-e_x)(1-e_y)z_{11} + (1-e_x)(e_y)z_{12} + (e_x)(1-e_y)z_{21} + (e_x)(e_y)z_{22}\]
-where the z’s are the corresponding elements of the Z matrix. See also the akima package for fast interpolation from irregular locations.
+where the z's are the corresponding elements of the Z matrix. See also the akima package for fast interpolation from irregular locations.
\({\large{\mathbf{Basic \> Usage}}}\)
@@ -1811,7 +1876,7 @@ 9 Covariance Models Revisited
9.1 Rad.cov, Rad.cov.simple, cubic.cov, and wendland.cov
Radial basis functions are the natural covariance that comes out of the thin plate spline minimization problem (see the section Tps Revisited). The functional form is the following: \[\operatorname{Cov}(Y(\mathbf x), Y(\mathbf x')) = C(r) =c_0(m,d) *\begin{cases} r^{2m-d}, & d \text{ odd} \\ r^{2m-d} \ln(r), & d \text{ even} \end{cases}\] where \(r = \| \mathbf x - \mathbf x' \|\) and \(c_0(m,d)\) is a multiplicative constant depending on \(m\) and \(d\).
-The argument d is inferred from the dimension of the input data. If m isn’t specified by the user, by default it is set to the smallest integer such that \(p = 2m-d\) is positive. Alternatively, the user can specify p.
+The argument d is inferred from the dimension of the input data. If m isn't specified by the user, by default it is set to the smallest integer such that \(p = 2m-d\) is positive. Alternatively, the user can specify p.
Note that these RBFs are generalized covariance functions, which means they are only valid covariances when restricted to a subspace of linear combinations of the field. The function Rad.simple.cov is a straightforward implementation in R code (computes fewer checks on the data). It can be used like Exp.cov to write a new covariance function.
cubic.cov is a specific case of Rad.cov where we set d=1 and m=1.
All of these covariance functions can be equipped to take advantage of sparsity in the covariance matrix by using the SPAM package. The functions Wendland and wendland.cov allow easy use of the Wendland compactly supported polynomials as covariance functions. This will output a sparse covariance matrix, which helps shorten the computation time when working with large datasets. The main argument (besides location) to in the Wendland function is theta, which by default is set to 1. This is the taper range, or the radius of support of the function. The Wendland function is identically 0 for distances greater than theta. If the user would like to scale each coordinate independently, they can provide a matrix or the diagonal of a matrix for the argument V, which is the inverse linear transformation of the coordinates before distances are found. Another parameter that can be chosen is k, the order of the Wendland polynomial, which determines the smoothness.
@@ -1827,8 +1892,8 @@ 9.2 stationary.taper.covRad.cov(x1, x2, p = 1, m=NA)
Rad.simple.cov(x1, x2, p=1)
cubic.cov(x1, x2, theta = 1)
-stationary.cov(x1, x2=NULL, Covariance = “Exponential”, Distance = “rdist”,theta = 1, V = NULL)
-stationary.taper.cov(x1, x2, Covariance=“Exponential”,Taper=“Wendland”,theta=1.0,V=NULL)
+stationary.cov(x1, x2=NULL, Covariance = "Exponential", Distance = "rdist",theta = 1, V = NULL)
+stationary.taper.cov(x1, x2, Covariance="Exponential",Taper="Wendland",theta=1.0,V=NULL)
wendland.cov(x1, x2, theta = 1, V=NULL, k = 2)
\({\large{\mathbf{Value}}}\)
@@ -1913,7 +1978,7 @@ 9.4 stationary.taper.cov
10 mKrig
-The mKrig function is both a Kriging/linear algebra engine and also user-level function for fitting spatial models. The mKrig engine is a simple and fast version of Krig. The “m” stands for micro, being a succinct (and we hope readable) function. The function focuses on the same computations as in Krig.engine.fixed done for a fixed lambda parameter, for unique spatial locations and for data without missing values. mKrig is optimized for large data sets, sparse linear algebra, and a clear exposition of the computations. The underlying code in mKrig is more readable and straightforward, but it does fewer checks on the data than Krig. One savings in computation is that Monte Carlo simulations are used to compute the trace of the smoothing matrix \(\operatorname{tr}(A(\lambda))\), which can be used for the effective degrees of freedom of the model.
+The mKrig function is both a Kriging/linear algebra engine and also user-level function for fitting spatial models. The mKrig engine is a simple and fast version of Krig. The "m" stands for micro, being a succinct (and we hope readable) function. The function focuses on the same computations as in Krig.engine.fixed done for a fixed lambda parameter, for unique spatial locations and for data without missing values. mKrig is optimized for large data sets, sparse linear algebra, and a clear exposition of the computations. The underlying code in mKrig is more readable and straightforward, but it does fewer checks on the data than Krig. One savings in computation is that Monte Carlo simulations are used to compute the trace of the smoothing matrix \(\operatorname{tr}(A(\lambda))\), which can be used for the effective degrees of freedom of the model.
Just as in other functions, we can use cov.function and cov.args to specify which covariance we want to use in the model. mKrig is often used with a compactly supported covariance to take advantage of sparsity in computations. See fastTps as an example of how it uses mKrig.
There are other arguments that can be specified in chol.args for efficiency when dealing with a large data set. For example, nnzR is a guess at how many nonzero entries there are in the Cholesky decomposition of the linear system used to solve for the basis coefficients. Specifying this to increase size will help to avoid warnings from the spam package.
mKrig.trace is an internal function called by mKrig to estimate the effective degrees of freedom. The Kriging surface estimate at the data locations is a linear function of the data and can be represented as \(A(\lambda)y\).
@@ -1925,7 +1990,7 @@ 10 mKrig
\({\large{\mathbf{Basic \> Usage}}}\)
-mKrig(x, y, Z = NULL, cov.function = “stationary.cov”, lambda = 0, m = 2)
+mKrig(x, y, Z = NULL, cov.function = "stationary.cov", lambda = 0, m = 2)
\({\large{\mathbf{Value}}}\)
Returns an object of class mKrig.
@@ -1972,7 +2037,7 @@ 10.2 COmonthlyMet
10.3 flame
The flame data consists of a 2 column matrix x with different fuel and oxygen flow rates for the burner, and vector y with the intensity of light at a particular wavelength indicative of Lithium ions. The characteristics of an ionizing flame are varied with the intent of maximizing the intensity of emitted light for lithuim in solution. Areas outside of the measurements are where the mixture may explode! Note that the optimum is close to the boundary.
-This example shows the versatility of fields, fitting Krig, spatialProcess, and mKrig models to the data. Note how much faster mKrig is, and how even with a Wendland it is able to reproduce the other fits’ behaviours fairly well.
+This example shows the versatility of fields, fitting Krig, spatialProcess, and mKrig models to the data. Note how much faster mKrig is, and how even with a Wendland it is able to reproduce the other fits' behaviours fairly well.
x <- flame$x
y <- flame$y
look <- as.image(Z=y, x=x)
@@ -1989,7 +2054,7 @@ 10.3 flame
10.4 Krig.replicates with NWSC
-We now consider the NWSC data set. The data we consider are the temperature and relative humidity as our spatial indexing variables, and our response is power usage. The observations aren’t at unique locations, so to use mKrig or spatialProcess, we will have to collapse the observations to unique locations with the function Krig.replicates.
+We now consider the NWSC data set. The data we consider are the temperature and relative humidity as our spatial indexing variables, and our response is power usage. The observations aren't at unique locations, so to use mKrig or spatialProcess, we will have to collapse the observations to unique locations with the function Krig.replicates.
library(sp) #for colors
load("NWSC.rda")
x <- cbind(NWSC$RH, NWSC$Otemp)
@@ -2024,7 +2089,7 @@ 10.5 mKrig and a GCV
x<- ozone2$lon.lat[good,]
mKrig( x,y,cov.function="stationary.taper.cov",theta = 2.0, lambda=.01,
Taper="Wendland", Taper.args=list(theta = 1.5, k=2, dimension=2)) -> out2
-Now, we will compare many lambda’s on a grid using GCV for this small data set. One should really just use Krig or Tps, but this is an example of approximate GCV that will work for much larger data sets using sparse covariances and the Monte Carlo trace estimate.
+Now, we will compare many lambda's on a grid using GCV for this small data set. One should really just use Krig or Tps, but this is an example of approximate GCV that will work for much larger data sets using sparse covariances and the Monte Carlo trace estimate.
lgrid<- 10**seq(-1,1,,15) # a grid of lambdas
GCV<- matrix( NA, 15,20)
trA<- matrix( NA, 15,20)
@@ -2086,7 +2151,7 @@ 10.6 Collapse fixed effects with
## [8,] -525.6700 0.3666253 24.196703 5.337032 77.69217
## [9,] -529.7651 0.5806189 18.395516 5.868205 59.30883
## [10,] -540.6156 0.3602721 147.142371 7.314397 148.50003
-Names of summary: “lnProfileLike.FULL”, “lambda”, “theta”, “sigmaMLE”, “rhoMLE”, “funEval”, “gradEval”. We left out the last two columns in the previous output. It can be interesting to visualize the results as pairwise scatter plots for individual estimates.
+Names of summary: "lnProfileLike.FULL", "lambda", "theta", "sigmaMLE", "rhoMLE", "funEval", "gradEval". We left out the last two columns in the previous output. It can be interesting to visualize the results as pairwise scatter plots for individual estimates.
plot(O3MLE[,2], O3MLE[,3], log = 'xy', xlab = "lambda", ylab = "theta")
title("theta vs lambda for 10 days")

@@ -2103,7 +2168,7 @@ 10.6 Collapse fixed effects with
mKrig.coef finds the d and c coefficients representing the solution using the previous cholesky decomposition for a new data vector. This is used in computing the prediction standard error in predictSE.mKrig and can also be used to evalute the estimate efficiently at new vectors of observations provided the locations and covariance remain fixed.
-Since this data set has NA’s, we remove any columns (observation locations) with an NA. Beware of blindly removing data like this - we are just illustrating the mKrig.coef functionality.
+Since this data set has NA's, we remove any columns (observation locations) with an NA. Beware of blindly removing data like this - we are just illustrating the mKrig.coef functionality.
goody <- na.omit(t(ozone2$y))
omitted <- attr(goody, "na.action")
x <- ozone2$lon.lat[-omitted,]
@@ -2136,7 +2201,7 @@ 10.7 Approximate standard errors
O3.fit<- mKrig( x,y, Covariance="Matern", theta=.5,smoothness=1.0, lambda= .01 )
-Now, we’ll look at some approximate conditional simulations of our random field and the missing data.
+Now, we'll look at some approximate conditional simulations of our random field and the missing data.
set.seed(122)
O3.sim<- sim.mKrig.approx( O3.fit, nx=100, ny=100, gridRefinement=3, M=2 )
@@ -2180,17 +2245,17 @@ 10.8 Finding taper range
## To avoid the iteration, increase 'nearestdistnnz' option to something like
## 'spam.options(nearestdistnnz=c(923230,400))'
## (constructed 1213 lines out of 1500).
-The warning resets the memory allocation for the covariance matrix according the to values spam.options(nearestdistnnz=c(416052,400)). This is inefficient because the preliminary pass failed. The following call completes the computation in “one pass” without a warning and without having to reallocate more memory.
-spam.options(nearestdistnnz=c(4.2E5,400))
+The warning resets the memory allocation for the covariance matrix according the to values spam.options(nearestdistnnz=c(416052,400)). This is inefficient because the preliminary pass failed. The following call completes the computation in "one pass" without a warning and without having to reallocate more memory.
+options(spam.nearestdistnnz=c(4.2E5,400))
look2<- mKrig( x,y, cov.function="wendland.cov",k=2, theta=.9, lambda=1e-2)
-## Warning in nearest.dist(structure(c(0.963543639518321, 0.0860745231620967, : You ask for a 'dense' spase distance matrix, I require one more iteration.
+## Warning in nearest.dist(structure(c(0.963543639518321, 0.0860745231620967, : You ask for a 'dense' sparse distance matrix, I require one more iteration.
## To avoid the iteration, increase 'nearestdistnnz' option to something like
-## 'spam.options(nearestdistnnz=c(923230,400))'
+## 'options(spam.nearestdistnnz=c(600000,400))'
## (constructed 966 lines out of 1500).
-## Warning in nearest.dist(structure(c(0.963543639518321, 0.0860745231620967, : You ask for a 'dense' spase distance matrix, I require one more iteration.
+## Warning in nearest.dist(structure(c(0.963543639518321, 0.0860745231620967, : You ask for a 'dense' sparse distance matrix, I require one more iteration.
## To avoid the iteration, increase 'nearestdistnnz' option to something like
-## 'spam.options(nearestdistnnz=c(923230,400))'
+## 'options(spam.nearestdistnnz=c(600000,400))'
## (constructed 966 lines out of 1500).
As a check notice that print(look2) reports the number of nonzero elements consistent with the specifc allocation increase in spam.options.
Now we generate a new data set of 1500 locations.
@@ -2267,7 +2332,7 @@ 10.9 Using V,
11 Tps and fastTps
11.1 Tps and Krig
-We return to Tps, but this time consider spatial data. The Tps estimate can be thought of as a special case of a spatial process estimate using a particular generalized covariance function. Because of this correspondence the Tps function is a wrapper that determines the polynomial order for the null space and then calls Krig with the radial basis generalized covariance rad.cov. The RBF is the reproducing kernel that generates the reproducing kernel Hilbert space in which the minimization problem is set. See Wahba’s Spline Models for Observational Data.
+We return to Tps, but this time consider spatial data. The Tps estimate can be thought of as a special case of a spatial process estimate using a particular generalized covariance function. Because of this correspondence the Tps function is a wrapper that determines the polynomial order for the null space and then calls Krig with the radial basis generalized covariance rad.cov. The RBF is the reproducing kernel that generates the reproducing kernel Hilbert space in which the minimization problem is set. See Wahba's Spline Models for Observational Data.
In this way Tps requires little extra coding beyond the Krig function itself, and in fact the returned list from Tps is a Krig object and so it is adapted to all the special functions developed for this object format. One downside is some confusion in the summary of the thin plate spline as a spatial process type estimator. For example, the summary refers to a covariance model and maximum likelihood estimates. This is not something usually reported with a spline fit! Similarly, fastTps is a convenient wrapper function that calls mKrig with the Wendland covariance function.
@@ -2277,7 +2342,7 @@ 11.1 Tps and K
fastTps(x, Y, m = NULL, p = NULL, theta, lon.lat=FALSE, lambda=0)
\({\large{\mathbf{Value}}}\)
-Both functions returns a list of class Krig. m determines the degree (m-l) of the polynomial function capturing the spatial drift/trend component of the model. The default is the value such that 2m-d is greater than zero where d is the dimension of x. p is the polynomial power for Wendland radial basis functions. The default is 2m-d. theta is the tapering range that is passed to the Wendland compactly supported covariance. The covariance (i.e. the radial basis function) is zero beyond range theta. The larger theta is, the closer this model will approximate the standard thin plate spline.
+Both functions returns a list of class Krig. m determines the degree (m-l) of the polynomial function capturing the spatial drift/trend component of the model. The default is the value such that 2m-d is greater than zero where d is the dimension of x. p is the polynomial power for Wendland radial basis functions. The default is 2m-d. theta is the tapering range that is passed to the Wendland compactly supported covariance. The covariance (i.e. the radial basis function) is zero beyond range theta. The larger theta is, the closer this model will approximate the standard thin plate spline.
Some care needs to exercised in specifying the taper range theta and power p which describes the polynomial behavior of the Wendland at the origin. Note that unlike Tps the locations are not scaled to unit range and this can cause havoc in smoothing problems with variables in very different units. So rescaling the locations x <- scale(x) is a good idea for putting the variables on a common scale for smoothing. This function does have the potential to approximate estimates of Tps for very large spatial data sets. Also, the function predictSurface.fastTps has been made more efficient for the case of k = 2 and m = 2.
@@ -2423,11 +2488,11 @@ 12.2 REML
\ell(\mathbf{d},\boldsymbol{\xi}) \propto -\frac{1}{2} \left( \log( \det(\Sigma)) + (\mathbf Y - \mathbf{Zd})^T \Sigma^{-1} (\mathbf Y - \mathbf{Zd}) \right)
\]
For any fixed \(\boldsymbol \xi\), the value of \(\mathbf d\) that maximizes \(\ell\) is the Generalized Least Squares (GLS) estimate \(\hat{\mathbf d} = (\mathbf{Z}^T \Sigma^{-1} \mathbf{Z})^{-1} \mathbf{Z}^T \Sigma^{-1} \mathbf{Y}\).
-To maximize \(\ell\), one often “profiles” and eliminates \(\mathbf{d}\) in the equation for \(\ell\) by substituting \(\mathbf{d} = \hat {\mathbf d}\). Then \(\ell(\mathbf{d},\boldsymbol \xi)=\ell(\boldsymbol \xi)\) only depends on \(\boldsymbol \xi\). Numerical optimization (nonlinear methods like BFGS) is used to find the remaining parameters, and then the GLS estimate \(\hat {\mathbf d}\) is updated with the optimal parameter values. (See Handbook of Spatial Statistics, p. 46).
+To maximize \(\ell\), one often "profiles" and eliminates \(\mathbf{d}\) in the equation for \(\ell\) by substituting \(\mathbf{d} = \hat {\mathbf d}\). Then \(\ell(\mathbf{d},\boldsymbol \xi)=\ell(\boldsymbol \xi)\) only depends on \(\boldsymbol \xi\). Numerical optimization (nonlinear methods like BFGS) is used to find the remaining parameters, and then the GLS estimate \(\hat {\mathbf d}\) is updated with the optimal parameter values. (See Handbook of Spatial Statistics, p. 46).
However, this process of Maximum Likelihood Estimation (MLE) tends to give biased results, as \[
(\mathbf{Y} - \mathbf{Z} \hat{\mathbf{d}})^T \Sigma^{-1} (\mathbf Y - \mathbf{Z} \hat {\mathbf{d}})
\le (\mathbf{Y}-\mathbf{Zd}) \Sigma^{-1} (\mathbf Y - \mathbf{Zd}), \quad \forall \mathbf{d}.
-\] Intuitively, this means that \(\mathbf{Z} \hat{\mathbf d}\) is “closer” to \(\mathbf{Y}\) than any other estimate \(\mathbf{Zd}\), even if \(\mathbf{d}\) is the true parameter! As a result, the MLE for \(\boldsymbol \xi\) is biased due to this simultaneous estimation of \(\mathbf{d}\). In particular, the MLE for \(\boldsymbol \xi\) tends to underestimate the process variance.
+\] Intuitively, this means that \(\mathbf{Z} \hat{\mathbf d}\) is "closer" to \(\mathbf{Y}\) than any other estimate \(\mathbf{Zd}\), even if \(\mathbf{d}\) is the true parameter! As a result, the MLE for \(\boldsymbol \xi\) is biased due to this simultaneous estimation of \(\mathbf{d}\). In particular, the MLE for \(\boldsymbol \xi\) tends to underestimate the process variance.
This leads to Restricted Maximum Likelihood (REML), which attempts to reduce this bias when using maximum likelihood methods. We use a linear transformation \(\mathbf Y^* = \mathbf {AY}\) of the data so that the distribution of \(\mathbf{Y}^*\) does not depend on \(\mathbf{d}\). In particular, we use the matrix \(\mathbf A = \mathbf{I} - \mathbf{Z} (\mathbf{Z}^T \mathbf{Z})^{-1} \mathbf{Z}^T\) that projects \(\mathbf{Y}\) to ordinary least squares residuals. The resulting matrix product \(\mathbf{AY}\) is independent of \(\mathbf{d}\). \[
\ell_R(\boldsymbol \xi) = - \frac{1}{2} \left(\log(\det(\Sigma)) + \frac{1}{2} \log( \det( \mathbf{Z}^T \Sigma^{-1} \mathbf{Z})) + \mathbf{Y}^T (\Sigma^{-1} - \Sigma^{-1} \mathbf{Z} (\mathbf{Z}^T \Sigma^{-1} \mathbf{Z})^{-1} \mathbf{Z}^T \Sigma^{-1}) \mathbf{Y} \right)
\] This ends up being the profile log-likelihood with an additional term involving \(\log( \det( \mathbf{Z}^T \Sigma^{-1} \mathbf{Z}))\). Again, this expression must be maximized numerically with respect to \(\boldsymbol \xi\). The procedure concludes by estimating \(\mathbf d\) with GLS and the estimate \(\hat{\boldsymbol \xi}\) plugged in.
@@ -2457,12 +2522,29 @@ 12.3 GCV
+
+
+
+
+
+
+
diff --git a/fieldsVignette.pdf b/fieldsVignette.pdf
index f57ddb5..b31182f 100644
Binary files a/fieldsVignette.pdf and b/fieldsVignette.pdf differ
diff --git a/vignette/fieldsVignette.html b/vignette/fieldsVignette.html
index 84db09a..1709eb9 100644
--- a/vignette/fieldsVignette.html
+++ b/vignette/fieldsVignette.html
@@ -1,28 +1,28 @@
-
+
-
-
+
+
-
+
fields vignette
-
+
-
-
-
+
+
+
@@ -68,9 +70,7 @@
-
-
-
-
-
+
+
+
+
+
+
+
+
+
+
@@ -117,8 +178,8 @@
fields vignette
-Ashton Wiens and Mitchell Krock
-03 August, 2017
+Ashton Wiens, Mitchell Krock, and Emma Lilly
+06 August, 2020
@@ -278,10 +339,10 @@ 1 Introduction
For many of these functions, we present their basic usage in this vignette. We have edited out secondary arguments in the presentation to focus on the essential options.
We have tried to present this vignette as a narrative that introduces spatial statistics, describes how to use the functions in fields, and also includes some relevant theory about splines and Kriging. The goal is to get the interested reader started quickly with many data based examples. Core functions in fields return S3 objects that are compatible with generic functions such as plot, predict, surface, etc. Thus, the analysis flow will seem familiar to other methods such as lm. If the reader is familiar with Kriging, they can skip to Chapters 6-8 to see some examples.
-We begin by showing quick examples using some of fields’ most important functions. This gives the user an idea of what kind of problems can be solved and how to implement the functions in fields to solve them.
+We begin by showing quick examples using some of fields' most important functions. This gives the user an idea of what kind of problems can be solved and how to implement the functions in fields to solve them.
In the third chapter we discuss fitting splines to univariate data. We begin with a simple time series data set, but the discussion applies more generally to data with one predictor variable x and one response variable y. The functions splint and sreg are used to fit cubic splines to this type of data. Then we introduce Tps, a core function in the package that can fit thin plate splines.
-Chapters 4-6 build up to Kriging. We have tried give an intuitive presentation of the fundamental concepts in spatial statistics, and at the same time introduce the corresponding functions in fields. Initially, the Kriging linear algebra is done manually for exposition, and then the package’s most important function spatialProcess is presented in Chapter 7.
-There is a brief interlude to discuss the package’s plotting functions as well as some image processing functions in Chapters 8-9. Many of them will have been used up to this point, but here we give a thorough description of the plotting options to be used with images, surfaces, and the package’s models. See quilt.plot to get a quick plot of raw data to discern spatial patterns. There are also several functions for working with image data specifically. The functions image.smooth and smooth.2d are capable of smoothing images, and interp.surface performs fast bilinear interpolation (often useful when zooming in on images from a grid).
+Chapters 4-6 build up to Kriging. We have tried give an intuitive presentation of the fundamental concepts in spatial statistics, and at the same time introduce the corresponding functions in fields. Initially, the Kriging linear algebra is done manually for exposition, and then the package's most important function spatialProcess is presented in Chapter 7.
+There is a brief interlude to discuss the package's plotting functions as well as some image processing functions in Chapters 8-9. Many of them will have been used up to this point, but here we give a thorough description of the plotting options to be used with images, surfaces, and the package's models. See quilt.plot to get a quick plot of raw data to discern spatial patterns. There are also several functions for working with image data specifically. The functions image.smooth and smooth.2d are capable of smoothing images, and interp.surface performs fast bilinear interpolation (often useful when zooming in on images from a grid).
In Chapter 10, we return to covariances and Kriging in more detail, including all of the covariance options in fields. Then we show how the user can write their own covariance function for use with all of the functionality in fields. This allows use to discuss three important wrappers on Krig in the following chapters.
In Chapter 11, we cover the Kriging optimization workhorse spatialProcess. The user only needs to specify the data and a covariance model, and then spatialProcess will perform a grid search for the optimal values of the rest of the parameters and return the spatial model. Optimization methods include restricted maximum likelihood (REML) and generalized cross validation (GCV). The user should exercise caution, however, when using black box optimization routines. We hope this chapter will help explain the optimization procedure and assist the user with fitting the right model parameters.
In Chapter 12, we introduce the concept of using a compactly supported covariance function to generate a sparse covariance matrix. The function mKrig (microKrig) is set up to take advantage of this sparsity in computation by using the spam package. We return to the Tps function in Chapter 13 with spatial data. The theory of thin plate splines is presented in the framework of Kriging. Finally, we show how the fastTps function is an analogue of mKrig that can similarly take advantage of sparse linear algebra.
@@ -319,7 +380,7 @@ 2.1 Visualizing raw data with
2.2 spatialProcess with a covariate
-fields makes it easy to fit a spatial model from data, predict at arbitrary locations or on a grid, and plot the results. Additional covariates (e.g. elevation) can also be included in the linear trend. We can use predictSurface to predict on a grid and output a surface object, much like predict but more convenient for plotting.
+fields makes it easy to fit a spatial model from data, predict at arbitrary locations or on a grid, and plot the results. Additional covariates (e.g. elevation) can also be included in the linear trend. We can use predictSurface to predict on a grid and output a surface object, much like predict but more convenient for plotting.
data( COmonthlyMet)
# predicting average daily minimum temps for spring in Colorado
obj<- spatialProcess( CO.loc, CO.tmin.MAM.climate, Z= CO.elev)
@@ -337,7 +398,7 @@ 2.2 spatialProcess w
US(add=TRUE, col="grey")
points(CO.loc[,1], CO.loc[,2], col="magenta", pch=21, bg="white")
title("Standard errors for avg Spring daily min. temp in CO")
-Computing standard errors can be computationally expensive. An alternative is to use sim.spatialProcess to conditionally simulate the process, and then use these simulations to approximate the standard errors. In practice, a large M (e.g. 100) should be used to produce many simulations to reduce the variability when finding sample standard deviations in the Monte Carlo sample.
+Computing standard errors can be computationally expensive. An alternative is to use sim.spatialProcess to conditionally simulate the process, and then use these simulations to approximate the standard errors. In practice, a large M (e.g. 100) should be used to produce many simulations to reduce the variability when finding sample standard deviations in the Monte Carlo sample.
sim <- sim.spatialProcess(obj, xp=make.surface.grid(CO.Grid)[1:5,], Z=CO.elevGrid$z[1:5], M=30)
look <- as.surface(CO.Grid, t(sim[1,]))
look2 <- as.surface(CO.Grid, t(sim[2,]))
@@ -426,7 +487,7 @@ 3.1 splint and The splint function returns a vector of values of the interpolating spline evaluated at xgrid, while sreg returns a list of class sreg. Some of the relevant components of the sreg list are the original data x and y, the smoothness parameters lambda and df (same as trace), and the residuals and fitted.values. Note that the identical parameters lam and lambda both appear for covenience.
-First, we’ll use interpolation to fit the control group data. We use the x data coordinates as nodes to interpolate, and then evaluate the model at the grid locations to plot. Interpolation of a set of data points \((x_i,y_i)_{i=1}^n\) means that the interpolating function passes exactly through these points.
+First, we'll use interpolation to fit the control group data. We use the x data coordinates as nodes to interpolate, and then evaluate the model at the grid locations to plot. Interpolation of a set of data points \((x_i,y_i)_{i=1}^n\) means that the interpolating function passes exactly through these points.
grd <- seq(range(x)[1], range(x)[2], length.out = 400)
spl <- splint(x,y,grd)
plot(y~x, pch=".", cex=5, ylab="Median Food Intake", xlab="Days")
@@ -435,7 +496,7 @@ 3.1 splint and

-Often we want to smooth the data rather than interpolate, for example if we assumed possible error in data collection. Loosely, a smoothing function follows the general “trend” of the data, but it may not exactly interpolate the data points.
+Often we want to smooth the data rather than interpolate, for example if we assumed possible error in data collection. Loosely, a smoothing function follows the general "trend" of the data, but it may not exactly interpolate the data points.
The parameters df and lambda both control the smoothing in these functions. Interpolating the data corresponds to \(\lambda=0\) and df= the number of observations. As lambda increases, the spline gets smoother.
In place of lambda, a more useful scale is in terms of the effective number of parameters (degrees of freedom) associated with the estimate. We denote this by EDF (effective degrees of freedom). There is a 1-1 relationship between \(\lambda\) and EDF, and both are useful measures in slightly different contexts: \(\lambda\) for computing with the spatial process model versus EDF for data smoothing. The user should be aware that splint defaults to lambda = 0 (interpolation), while sreg defaults to using GCV to find lambda.
#Note this small lambda almost interpolates, close to splint model fit above
@@ -488,7 +549,7 @@ 3.2 sreg and S3 meth
## lambda trA GCV shat converge
## GCV 11.13 7.446 2.378 1.387 5
## GCV.one 11.13 7.446 2.378 1.387 5
-plot gives a series of four diagnostic plots describing the fit. For sreg, plot 1 shows data vs. predicted values, and plot 2 shows predicted values vs. residuals. Plot 3 shows the criteria to select the smoothing parameter \(\lambda = \sigma^2/\rho\). The x axis has transformed \(\lambda\) in terms of effective degrees of freedom to make it easier to interpret. Note that here the GCV function is minimized while the REML is maximized. Finally, plot 4 shows a histogram of the standard residuals.
+plot gives a series of four diagnostic plots describing the fit. For sreg, plot 1 shows data vs. predicted values, and plot 2 shows predicted values vs. residuals. Plot 3 shows the criteria to select the smoothing parameter \(\lambda = \sigma^2/\rho\). The x axis has transformed \(\lambda\) in terms of effective degrees of freedom to make it easier to interpret. Note that here the GCV function is minimized while the REML is maximized. Finally, plot 4 shows a histogram of the standard residuals.
set.panel(2,2)
plot(fit) # four diagnostic plots of fit

@@ -518,7 +579,7 @@ 3.3 Tps
The Tps function is used for fitting curves and surfaces to interpolate or smooth data by thin plate spline regression. This is a very useful technique and is often as informative as more complex methods.
The assumed model for Tps is additive. Namely, \[ Y_i = f(\mathbf{X}_i) + \epsilon_i,\]
where \(f(\mathbf{X})\) is a d dimensional surface, and the object is to fit a thin plate spline surface to irregularly spaced data. \(\epsilon_i\) are uncorrelated random errors with zero mean and variances \(\sigma^2 / w_i\).
-This function also works for just a single dimension and is a special case of a spatial process estimate (Kriging). A “fast” version of this function uses a compactly supported Wendland covariance, computing the estimate for a fixed smoothing parameter.
+This function also works for just a single dimension and is a special case of a spatial process estimate (Kriging). A "fast" version of this function uses a compactly supported Wendland covariance, computing the estimate for a fixed smoothing parameter.
\({\large{\mathbf{Basic \> Usage}}}\)
@@ -530,7 +591,7 @@ 3.3 Tps
Returns a list of class Krig/mKrig (see below), which includes the predicted surface of fitted.values and residuals. Also includes the results of the GCV grid search in gcv.grid. Note that the GCV/REML optimization is done even if lambda or df is given. Please see the documentation on Krig for details of the returned arguments.
-Tps is a “wrapper function”, which is why a Krig object is returned. Moreover, any argument that can be used in a call to Krig can be used in Tps. For example, the user can specify lambda or df just as in sreg. As seen in the minimization problem, lambda determines the weight put on the smoothness condition, giving it the same interpretation. lambda is the most important argument for Tps, which is estimated by GCV if omitted. We can also includes covariates in the linear part of the model by passing the argument Z.
+Tps is a "wrapper function", which is why a Krig object is returned. Moreover, any argument that can be used in a call to Krig can be used in Tps. For example, the user can specify lambda or df just as in sreg. As seen in the minimization problem, lambda determines the weight put on the smoothness condition, giving it the same interpretation. lambda is the most important argument for Tps, which is estimated by GCV if omitted. We can also includes covariates in the linear part of the model by passing the argument Z.
Tps and fastTps are special cases of using the Krig and mKrig functions, respectively. The Tps estimator is implemented by passing the right generalized covariance function based on a radial basis function (RBF) to the more general function Krig. One advantage of this implementation is that once a Tps/Krig object is created the estimator can be found rapidly for other data and smoothing parameters provided the locations remain unchanged. This makes simulation within R efficient (see example below).
@@ -596,7 +657,7 @@ 3.5 Confidence intervals with
-Notice how similar the summary is to sreg’s summary. sreg is actually a special case of Tps (as shown below – note that lambda is not equal for the two functions!) The m=2 default for Tps and leaving the data unscaled results in the cubic smoothing spline sreg.
+Notice how similar the summary is to sreg's summary. sreg is actually a special case of Tps (as shown below -- note that lambda is not equal for the two functions!) The m=2 default for Tps and leaving the data unscaled results in the cubic smoothing spline sreg.
# compare sreg and Tps results to show the adjustment to lambda.
predict( fit)-> look
predict( fit.tps, lambda=fit$lambda*fit$N)-> look2
@@ -617,13 +678,14 @@ 3.5 Confidence intervals with
-
+ col=c( 2,2,4,4), lty=2)
+legend("bottomright", legend = c("95 Pct Pointwise", "Simultaneous Bonferroni"), col = c(2,4), lty = 2, cex = 0.8)
+title( "95 Pct Pointwise and Simultaneous Intervals")
+
# or try the more visually honest:
plot( fit.tps$x, fit.tps$y)
lines( fit.tps$predicted, lwd=2)
@@ -636,20 +698,23 @@ 3.5 Confidence intervals with x <- WorldBankCO2[,'Pop.urb']
y <- log10(WorldBankCO2[,'CO2.cap'])
out.fast <- fastTps(x,y,lambda=2, theta=20)
-plot(x,y)
-xgrid<- seq( min(x), max(x),,300)
+
+plot(x,y, xlab = "Urban Percent of Population", ylab = "CO2 Emissions Per Capita")
+
+xgrid<- seq( min(x), max(x), length.out = 300)
fhat.fast <- predict( out.fast,xgrid)
-#se.fast <- predictSE( out.fast)
+se.fast <- predictSE( out.fast, x = xgrid)
+
lines( xgrid, fhat.fast)
-#lines( xgrid, fhat.fast+1.96*se.fast, col=2, lty=2)
-#lines( xgrid, fhat.fast-1.96*se.fast, col=2, lty=2)
-title('fastTps fit')
-
+lines( xgrid, fhat.fast+1.96*se.fast, col=2, lty=2)
+lines( xgrid, fhat.fast-1.96*se.fast, col=2, lty=2)
+title('`fastTps` Fit With 95% Confidence Interval')
+
4 Introduction to Spatial Statistics
-Here we’ll cover the fundamentals of spatial statistics. In particular, we will examine the basic definitions and concepts of the field before giving a geostatistics mini-course. As this is a vignette, we can only scratch the theory behind spatial statistics. A few useful resources are:
+Here we'll cover the fundamentals of spatial statistics. In particular, we will examine the basic definitions and concepts of the field before giving a geostatistics mini-course. As this is a vignette, we can only scratch the theory behind spatial statistics. A few useful resources are:
Statistics for Spatial Data by Noel Cressie
Handbook of Spatial Statistics by Alan E. Gelfand, Peter J. Diggle, Montserrat Fuentes, and Peter Guttorp
@@ -657,15 +722,15 @@ 4 Introduction to Spatial Statist
4.1 The spatial problem
-Let’s return to a previous quilt.plot. Suppose we want to predict the maximum March/April/May temperature at a new location \((-103,39.5)\) in Colorado, which is shown below as the x.
-
+Let's return to a previous quilt.plot. Suppose we want to predict the maximum March/April/May temperature at a new location \((-103,39.5)\) in Colorado, which is shown below as the x.
+
An initial thought may be to use OLS regression with covariates of \(\texttt{longitude}\) and \(\texttt{latitude}\).
-Recall Tobler’s First Law of Geography: “Everything is related to everything else, but near things are more related than distant things.” We see this reflected in the plot above, as temperatures in one area are similar to the temperature in nearby areas. This spatial autocorrelation violates many typical statistical assumptions – our observations are no longer independent! Fortunately, in the field of spatial statistics, we can relax our assumption of independence and work with observations that are spatially dependent.
-To make a “spatial prediction”, we need some way to quantify the change in our observation variable (temperature) as a function of distance.
+Recall Tobler's First Law of Geography: "Everything is related to everything else, but near things are more related than distant things." We see this reflected in the plot above, as temperatures in one area are similar to the temperature in nearby areas. This spatial autocorrelation violates many typical statistical assumptions -- our observations are no longer independent! Fortunately, in the field of spatial statistics, we can relax our assumption of independence and work with observations that are spatially dependent.
+To make a "spatial prediction", we need some way to quantify the change in our observation variable (temperature) as a function of distance.
4.2 Covariance
-A covariance function \(\operatorname{Cov}(Y(\mathbf x), Y(\mathbf x'))\) describes the joint variability between a stochastic process \(Y(\cdot)\) at two locations \(\mathbf{x}\) and \(\mathbf{x}'\). This covariance function is vital in spatial prediction. The fields package includes common parametric covariance families (e.g. exponential and Matern) as well as nonparametric models (e.g. radial and tensor basis functions).
+A covariance function \(\operatorname{Cov}(Y(\mathbf x), Y(\mathbf x'))\) describes the joint variability between a stochastic process \(Y(\cdot)\) at two locations \(\mathbf{x}\) and \(\mathbf{x}'\). This covariance function is vital in spatial prediction. The fields package includes common parametric covariance families (e.g. exponential and Matern) as well as nonparametric models (e.g. radial and tensor basis functions).
When modeling \(\operatorname{Cov}(Y(\mathbf{x}), Y(\mathbf{x}'))\), we are often forced make simplifying assumptions.
Stationarity assumes we can represent the covariance function as \[
@@ -679,7 +744,7 @@ 4.2 Covariance
Matern. \(\operatorname{Cov}(Y(\mathbf x), Y(\mathbf x')) = C(r) = \rho \left( \frac{2^{1-\nu}}{\Gamma(\nu)} \left( \frac{ r}{\theta} \right)^\nu K_\nu \left( \frac{r}{\theta} \right) \right) + \sigma^2 \mathbf{1}_{ \mathbf x = \mathbf x'}\), where \(K_{\nu}\) is a modified Bessel function of the second kind, of order \(\nu\).
Note that the Matern covariance function depends on parameters \((\rho,\theta,\nu,\sigma^2)\), and the Exponential covariance depends on parameters \((\rho,\theta,\sigma^2)\). The parameters \(\rho,\theta,\sigma^2\) and \(\nu\) respectively denote the marginal variance (or sill), range, nugget effect, and smoothness of our process.
-The range \(\theta\) of the process is the distance at which observations become uncorrelated. The sill \(\rho\) is the marginal variance of the spatial process. The nugget effect \(\sigma^2\) corresponds to small-scale variation such as measurement error. The smoothness \(\nu\) corresponds to how “smooth” our spatial process appears. An illustration in the vgram section shows the visual interpretation of these parameters.
+The range \(\theta\) of the process is the distance at which observations become uncorrelated. The sill \(\rho\) is the marginal variance of the spatial process. The nugget effect \(\sigma^2\) corresponds to small-scale variation such as measurement error. The smoothness \(\nu\) corresponds to how "smooth" our spatial process appears. An illustration in the vgram section shows the visual interpretation of these parameters.
For such isotropic covariances, the fields packages uses relies on the function rdist, which is used to calculate the distance between two sets of locations.
4.2.1 rdist
@@ -718,7 +783,7 @@ 4.2.1 rdist
quilt.plot(grd.CO,y.CO)
US(add=TRUE)
-
+
It is easy two find the distance between two points on the quilt.plot. Here, we use rdist.earth since our vectors consist of longitude/latitude pairs.
three.locations <- grd.CO[1:3,]
#location one is (-109.10, 36.90)
@@ -734,7 +799,7 @@ 4.2.1 rdist
4.2.2 Exponential and Matern covariances
The Exponential and Matern functions produce these isotropic covariances in R. The inputs are a matrix of distances d, either the range parameter range (this is \(\theta\) in our notation) or alpha = 1/range, and finally the marginal variance phi (which is \(\rho\) in our notation), and the smoothness nu for the Matern covariance. Note that the exponential covariance is identical to the Matern covariance with smoothnes nu = 0.5.
There are other (generalized) covariances in fields, such as RadialBasis and Wendland, that will be discussed later. To set the nugget variance \(\sigma^2\), we use the sigma2 argument in the functions Krig or mKrig that we also discuss later.
-Below are plots that illustrate the shape of these covariance functions. The Exponential, Matern, and Wendland correlations (marginal variances \(\rho=1\)) behave exactly how one would expect. That is, the covariance between two nearby objects is close to 1, indicating a strong positive correlation among the two observations. The appearance of the Radial-Basis correlation may be shocking – we will return to it later.
+Below are plots that illustrate the shape of these covariance functions. The Exponential, Matern, and Wendland correlations (marginal variances \(\rho=1\)) behave exactly how one would expect. That is, the covariance between two nearby objects is close to 1, indicating a strong positive correlation among the two observations. The appearance of the Radial-Basis correlation may be shocking -- we will return to it later.
\({\large{\mathbf{Basic \> Usage}}}\)
@@ -799,7 +864,7 @@ 4.2.4 Exp.cov and \({\large{\mathbf{Basic \> Usage}}}\)
Exp.cov(x1, x2=NULL, theta = 1, p=1)
-stationary.cov(x1, x2=NULL, Covariance = “Exponential”, Distance = “rdist”, theta = 1)
+stationary.cov(x1, x2=NULL, Covariance = "Exponential", Distance = "rdist", theta = 1)
\({\large{\mathbf{Value}}}\)
If the argument C is NULL, then the (cross-)covariance matrix is returned. If C is a vector of length n, then returned value is the multiplication of the (cross-)covariance matrix with this vector. The C functionality is used internally by all fields Kriging-related functions.
@@ -828,7 +893,7 @@ 4.2.5 Simulating Random Fields (<
Take the Cholesky Decomposition \(\Sigma = L L^T\).
Multiply \(L \boldsymbol{\epsilon}\), where \(\boldsymbol{\epsilon} \sim N(\mathbf 0, I)\).
-The resulting random vector \(L \boldsymbol \epsilon\) is an exact simulation of a Gaussian process. The only problem with this approach is the computations and storage grow rapidly for larger grids. For example, a \(128 \times 128\) image would mean that the dimension of \(\Sigma\) is huge (\(16,000 \times 16,000\)) and effectively prohibit the use of the Cholesky decomposition. For rectangular grids, one can sometimes simulate Gaussian random fields very efficiently using an alternative algorithm called “circulant embedding” that is based on the Fast Fourier Transform. The restrictions are that the covariance function needs to be stationary and the correlation range needs to the small relative to the size of the domain (see Chan and Wood, 1994).
+The resulting random vector \(L \boldsymbol \epsilon\) is an exact simulation of a Gaussian process. The only problem with this approach is the computations and storage grow rapidly for larger grids. For example, a \(128 \times 128\) image would mean that the dimension of \(\Sigma\) is huge (\(16,000 \times 16,000\)) and effectively prohibit the use of the Cholesky decomposition. For rectangular grids, one can sometimes simulate Gaussian random fields very efficiently using an alternative algorithm called "circulant embedding" that is based on the Fast Fourier Transform. The restrictions are that the covariance function needs to be stationary and the correlation range needs to the small relative to the size of the domain (see Chan and Wood, 1994).
The sim.rf function uses circulant embedding to simulate a stationary Gaussian random field (GRF) on a regular grid with unit marginal variance (i.e. \(\rho=1\)). Note that the marginal variance is readily changed by scaling the resulting GRF.
A limitation when using sim.rf presents itself with Error in sim.rf(obj) : FFT of covariance has negative values. This comes from the circulant embedding method used to create the GRF; in short, it occurs when the correlation range is too large. One fix is to increase the domain size so this correlation is then small relative to the size of the domain.
The input of sim.rf is a list that includes information about the covariance function, its FFT, and the grid for evaluation. Usually this is created by a setup call image.cov (i.e. Exp.image.cov, matern.image.cov, stationary.image.cov).
@@ -838,7 +903,7 @@ 4.2.5 Simulating Random Fields (<
sim.rf(obj)
matern.image.cov(ind1, ind2, Y, cov.obj = NULL, setup = FALSE, grid ,theta= 1.0, smoothness=.5)
-Exp.image.cov(ind1, ind2, Y, cov.obj = NULL, setup = FALSE, grid, …)
+Exp.image.cov(ind1, ind2, Y, cov.obj = NULL, setup = FALSE, grid, ...)
\({\large{\mathbf{Value}}}\)
A matrix with the random field values.
@@ -874,7 +939,7 @@ 4.3 Variograms
The definition is readily translated to an empirical variogram. Let \(N(h)\) be the set of pairs of observations \(y_i, y_j\) such that \(\| \mathbf x_i - \mathbf x_j \| = h\). The empirical variogram (vgram) is defined as \[
\widehat{\gamma}(h) = \frac{1}{2 \cdot|N(h)| } \sum_{(i,j) \in N(h)} ( y_i - y_j )^2
\]
-The default “cloud” variogram is done for each distance \(h\), but it is noisy and hard to interpret. Often, we specify a number of bins N in which values for nearby values are averaged. In this way, the choices for a variogram may seem similar to finding a histogram. The breaks argument allows the user to explicitly provide distances at which bins are created.
+The default "cloud" variogram is done for each distance \(h\), but it is noisy and hard to interpret. Often, we specify a number of bins N in which values for nearby values are averaged. In this way, the choices for a variogram may seem similar to finding a histogram. The breaks argument allows the user to explicitly provide distances at which bins are created.
crossCoVGram is the same as vgram but differences are taken across different variables rather than the same variable. boxplotVGram uses the base R boxplot function to display the variogram neatly.
@@ -903,7 +968,7 @@ 4.3 Variograms
ind: a two column matrix giving the x and y increment used to compute the shifts
vgram.full: the variogram at each of these separations
-vgram.robust: Cressie’s version of a robust variogram statistic.
+vgram.robust: Cressie's version of a robust variogram statistic.
@@ -923,9 +988,9 @@ 4.3 Variograms
4.4 Simple Kriging
The additive spatial model is \[
-Y(\mathbf x) = \mathbf Z(\mathbf x) \mathbf{d} + g(\mathbf x) + \epsilon(\mathbf x),\] where \(Y(\cdot)\) is an observation, \(\mathbf Z \mathbf{d}\) is a deterministic (nonrandom) product of covariates \(\mathbf Z\) with a weight vector \(\mathbf d\) that acts as a mean function, \(g(\cdot) \sim N(\mathbf 0, \rho \mathbf K)\) is spatial process, and \(\epsilon(\cdot) \sim N( \mathbf 0, \sigma^2 \mathbf I)\) is error (i.e. white noise).
+Y(\mathbf x) = \mathbf Z(\mathbf x) \mathbf{d} + g(\mathbf x) + \epsilon(\mathbf x),\] where \(Y(\cdot)\) is an observation, \(\mathbf Z \mathbf{d}\) is a deterministic (nonrandom) product of covariates \(\mathbf Z\) with a weight vector \(\mathbf d\) that acts as a mean function, \(g(\cdot) \sim N(\mathbf 0, \rho \mathbf K)\) is spatial process, and \(\epsilon(\cdot) \sim N( \mathbf 0, \sigma^2 \mathbf I)\) is error (i.e. white noise).
Thus, we assume that our observations \(Y(\cdot)\) are Gaussian; in particular, \(\mathbf Y \sim N( \mathbf{Zd}, \rho \mathbf K + \sigma^2 \mathbf I)\)
-For now, we’ll assume \(\mathbf Z\mathbf{d} \equiv \mathbf 0\) and focus on the spatial aspect \(g(\mathbf x)\). This is known as simple kriging, where the mean of the stochastic process is known. Denote our locations of observed points as \(\mathbf x_1,\dots,\mathbf x_n\).
+For now, we'll assume \(\mathbf Z\mathbf{d} \equiv \mathbf 0\) and focus on the spatial aspect \(g(\mathbf x)\). This is known as simple kriging, where the mean of the stochastic process is known. Denote our locations of observed points as \(\mathbf x_1,\dots,\mathbf x_n\).
Suppose we want to predict \(Y(\cdot)\) at a new location \(\mathbf x_0\). The kriging estimate \(\hat Y(\mathbf x_0)\) of our mean zero Gaussian process \(Y(\cdot)\) is given by: \[
\hat Y(\mathbf x_0) = \Sigma_0 \Sigma^{-1} \mathbf Y
\]
@@ -935,7 +1000,7 @@ 4.4 Simple Kriging
\(\Sigma = \left( \operatorname{Cov}[ Y(\mathbf x_i),Y(\mathbf x_j) ] \right)_{i,j=1}^n\) is the \(n \times n\) covariance matrix of \(Z(\cdot)\) with \(ij\)-entry equal to \(\operatorname{Cov}[Y(\mathbf x_i),Y(\mathbf x_j)]\).
Briefly, when the covariance parameters are known, the Kriging estimate represents the best prediction at \(\mathbf x_0\) based on a linear combination of the observations. By best we mean in terms of mean squared error.
-In the next section, we see an example of how to perform simple kriging. Other types of kriging (i.e. ordinary kriging and universal kriging) are explained in the appendix.
+In the next section, we see an example of how to perform simple kriging. Other types of kriging (i.e. ordinary kriging and universal kriging) are explained in the appendix.
@@ -956,11 +1021,11 @@ 5.1 Fitting a Variogram
US(add=TRUE)
points(-103,39.5, pch=4,lwd=3,cex=1.25, col='black')

-For now, we’ll assume a zero mean (\(\mathbf Z\mathbf{d} \equiv \mathbf 0\)) and focus on the spatial aspect, which is clearly an incorrect assumption. One can remove the mean trend using least squares, but then this requires universal Kriging instead of simple Kriging. We prefer to start with simple Kriging in this exposition. Details for universal Kriging can be done in the appendix. Denote our locations of observed points as \(\mathbf x_1,\dots,\mathbf x_n\). In this example, \(n=213\) since we have 213 temperature observations.
+For now, we'll assume a zero mean (\(\mathbf Z\mathbf{d} \equiv \mathbf 0\)) and focus on the spatial aspect, which is clearly an incorrect assumption. One can remove the mean trend using least squares, but then this requires universal Kriging instead of simple Kriging. We prefer to start with simple Kriging in this exposition. Details for universal Kriging can be done in the appendix. Denote our locations of observed points as \(\mathbf x_1,\dots,\mathbf x_n\). In this example, \(n=213\) since we have 213 temperature observations.
Suppose we want to predict the temperature \(Y(\cdot)\) at a new location \(\mathbf x_0\). Recall that the simple Kriging estimate \(\hat Y(\mathbf x_0)\) of our mean zero Gaussian process \(Y(\cdot)\) is given by: \[
\hat Y(\mathbf x_0) = \Sigma_0 \Sigma^{-1} \mathbf Y
\]
-Now, the question is how to determine the covariance function of \(Y(\cdot)\). For this example, we’ll consider the (isotropic) exponential covariance: \(\operatorname{Cov}[ Y(\mathbf x), Y(\mathbf x')] = \rho e^{ - \| \mathbf x - \mathbf x'\|/\theta}\). Here, \(\theta\) is a range parameter that corresponds to the length of spatial autocorrelation of our process, and \(\rho\) is the overall variance of the process.
+Now, the question is how to determine the covariance function of \(Y(\cdot)\). For this example, we'll consider the (isotropic) exponential covariance: \(\operatorname{Cov}[ Y(\mathbf x), Y(\mathbf x')] = \rho e^{ - \| \mathbf x - \mathbf x'\|/\theta}\). Here, \(\theta\) is a range parameter that corresponds to the length of spatial autocorrelation of our process, and \(\rho\) is the overall variance of the process.
The covariance matrix \(\Sigma\) thus has \(i,j\)-th entry \[
\Sigma_{i,j} = e^{ - \| \mathbf x_i - \mathbf x_j\|/\theta} + \sigma^2 \mathbf{1}_{ \{ \mathbf x_i = \mathbf x_j \} },
\] where \(\mathbf{1}_{ \{ \mathbf x_i = \mathbf x_j \} }= \begin{cases} 1 & \mathbf x_i = \mathbf x_j \\ 0 & \text{else} \end{cases}\)
@@ -1068,7 +1133,7 @@ 5.2 Kriging predictor
title("Elevation in CO")
US(add=TRUE,col='grey')

-The elevation plot is a near inverse of our prediction grid – this agrees with our intuition that there are lower temperatures at higher elevations. If we include a covariate Z corresponding to elevation, then we can produce even more accurate predictions (a sneak-peek is shown below).
+The elevation plot is a near inverse of our prediction grid -- this agrees with our intuition that there are lower temperatures at higher elevations. If we include a covariate Z corresponding to elevation, then we can produce even more accurate predictions (a sneak-peek is shown below).

@@ -1108,7 +1173,7 @@ 6 spatialProcess
However, we want to make it clear that the spatial polynomial \(P(\cdot)\) is (by default) included in the kriging estimate, while other covariates \(\mathbf{Z}\) must be specified.
-spatialProcess represents one-stop shopping for fitting a spatial model in fields. It was designed so that users are able to quickly fit models and visualize them. We also add one caution: it hides several default model choices from the user. After a covariance model is chosen, a grid search is performed to find the optimal values of the parameters theta, rho, and sigma2, and then the spatial model is computed with these estimated parameters. Any other covariance parameters (e.g. the smoothness) need to be specified. The default call uses the covariance function stationary.cov, specifically with the Matern and smoothness nu=1. The optimization is done by MLESpatialProcess.
+spatialProcess represents one-stop shopping for fitting a spatial model in fields. It was designed so that users are able to quickly fit models and visualize them. We also add one caution: it hides several default model choices from the user. After a covariance model is chosen, a grid search is performed to find the optimal values of the parameters theta, rho, and sigma2, and then the spatial model is computed with these estimated parameters. Any other covariance parameters (e.g. the smoothness) need to be specified. The default call uses the covariance function stationary.cov, specifically with the Matern and smoothness nu=1. The optimization is done by MLESpatialProcess.
A good way to think about the spatial model functions in fields is in the context of engines and wrappers. The two engines in fields which perform the Kriging computations are Krig and mKrig. Krig is a more numerically stable algorithm: if a covariance matrix is close to singular, mKrig will fails whereas Krig may still be able to perform the calculations. On the other hand, mKrig is set up to handle large data sets with sparse covariance matrices by using the SPAM package.
A wrapper is an easy to use function that is based on an underlying function or engine. Often default parameters values are set to make the function accessible to the user. The function Tps is a wrapper on Krig. Tps performs thin plate spline regression, which is a special case of Kriging, so it makes sense to use the Krig engine. On the other hand, the functions fastTps and spatialProcess are wrappers for mKrig. See the chapters and examples on these functions for specifics.
There is always a hazard in providing a simple to use method that makes many default choices for the spatial model. As in any analysis be aware of these choices and try alternative models and parameter values to assess the robustness of your conclusions. Also examine the residuals to check the adequacy of the fit!
@@ -1116,9 +1181,9 @@ 6 spatialProcess
\({\large{\mathbf{Basic \> Usage}}}\)
-spatialProcess(x, y, Z = NULL, cov.function = “stationary.cov”,
-\(\> \> \> \quad\) cov.args = list(Covariance = “Matern”, smoothness = 1), theta = NULL)
-Krig(x, Y, theta, Covariance=“Matern”, smoothness, Distance=“rdist”)
+spatialProcess(x, y, Z = NULL, cov.function = "stationary.cov",
+\(\> \> \> \quad\) cov.args = list(Covariance = "Matern", smoothness = 1), theta = NULL)
+Krig(x, Y, theta, Covariance="Matern", smoothness, Distance="rdist")
\({\large{\mathbf{Value}}}\)
spatialProcess returns object of classes mKrig and spatialProcess. The main difference from mKrig is an extra component MLEInfo that has the results of the grid evaluation over theta (maximizing lambda), joint maximization over theta and lambda, and a grid evaluation over lambda with theta fixed at its MLE. The Krig function produces an object of class Krig.
@@ -1168,7 +1233,7 @@ 6.1 Analysis of soil pH
## gradEval
## 1.00000000
The summary shows the value of the profile log likelihood function, the estimates for the parameters, and the number of evaluations needed in the optimization.
-We can use plot to view diagnostic plots of the fit. The first three plots are the same as sreg and mKrig objects. Plot 1 shows data vs. predicted values, and plot 2 shows predicted values vs. residuals. Plot 3 shows the criteria to select the smoothing parameter \(\lambda = \sigma^2 / \rho\). The x axis has transformed \(\lambda\) in terms of effective degrees of freedom to make it easier to interpret. Note that here the GCV function is minimized while the REML is maximized.
+We can use plot to view diagnostic plots of the fit. The first three plots are the same as sreg and mKrig objects. Plot 1 shows data vs. predicted values, and plot 2 shows predicted values vs. residuals. Plot 3 shows the criteria to select the smoothing parameter \(\lambda = \sigma^2 / \rho\). The x axis has transformed \(\lambda\) in terms of effective degrees of freedom to make it easier to interpret. Note that here the GCV function is minimized while the REML is maximized.
One of the main features of spatialProcess is the ability to optimize over theta as well as the usual lambda/rho/sigma combination. For this reason, plot 4 shows the profile likelihood versus values of theta instead of a histogram of the standard residuals displayed in the plots of other class objects
set.panel(2,2)
plot(fit)
@@ -1186,7 +1251,7 @@ 6.2 Analysis of Coal Ash
fit <- spatialProcess(x, y)
surface(fit,col=topo.colors(100))

-Note that surface displays the full model, except Z covariates if they are included; i.e. by default Z is dropped. To additionally include the Z covariates, one can use predict or predictSurface. In these functions, the default is to retain Z in the computation.
+Note that surface displays the full model, except Z covariates if they are included; i.e. by default Z is dropped. To additionally include the Z covariates, one can use predict or predictSurface. In these functions, the default is to retain Z in the computation.
We can use our spatial model to predict on a grid using predict, or predictSurface which is more convenient for plotting.
grid.list <- list(x=seq(0,17), y=seq(0,24) )
pred.grid <- predict(fit, grid=grid.list)
@@ -1237,7 +1302,7 @@ 6.2 Analysis of Coal Ash
6.3 ozone2
-Now we consider the ozone2 dataset in fields, which consists of CO2 observations over 89 days at a set of 153 locations. We’ll restrict ourselves to day 16 (June 18, 1987) and remove all NA values. We fit several covariance models: Matern, exponential, and Wendland. We first visually inspect the different surfaces, then we compare the parameters of the three models using summary.
+Now we consider the ozone2 dataset in fields, which consists of CO2 observations over 89 days at a set of 153 locations. We'll restrict ourselves to day 16 (June 18, 1987) and remove all NA values. We fit several covariance models: Matern, exponential, and Wendland. We first visually inspect the different surfaces, then we compare the parameters of the three models using summary.
data( ozone2)
x<- ozone2$lon.lat
y<- c(ozone2$y[16,])
@@ -1261,7 +1326,7 @@ 6.3 ozone2
cov.args = list(Covariance = "Wendland",
dimension = 2, k = 2) )
rbind(obj$summary,obj2$summary,obj3$summary)
-Since the exponential is a Matern covariance with \(\nu = 0.5\), the first two fits can be compared in terms of their likelihoods. The REML value is slightly higher for obj verses obj2 (\(598.4 > 596.7\)). These are the negative log likelihoods so this suggests a preference for the exponential model. But… does it really matter in terms of spatial prediction?
+Since the exponential is a Matern covariance with \(\nu = 0.5\), the first two fits can be compared in terms of their likelihoods. The REML value is slightly higher for obj verses obj2 (\(598.4 > 596.7\)). These are the negative log likelihoods so this suggests a preference for the exponential model. But... does it really matter in terms of spatial prediction?
library(sp) #for colors
set.panel(1,3)
surface(obj, col=heat.colors(100),smallplot= c(.88,.9,0.2,.8))
@@ -1305,7 +1370,7 @@ 6.3 ozone2
6.4 COmonthlyMet
-We’ll return to the dataset COmonthlyMet. When using other fields functions for Kriging such as Krig and mKrig, the user has to pass in a range theta, which can be initally guessed for a quick fit by looking at a variogram.
+We'll return to the dataset COmonthlyMet. When using other fields functions for Kriging such as Krig and mKrig, the user has to pass in a range theta, which can be initally guessed for a quick fit by looking at a variogram.
Note that the estimated theta without including the Z covariate CO.elev will probably change if do we include Z.
Ultimately, using a variogram to determine spatial parameters for our covariance is not a reliable method. A more rigorous way to determine parameters is via Maximum Likelihood, which is performed within spatialProcess. To be precise, the parameters theta, rho, and sigma2 are estimated using Maximum Likelihood, and the user only needs to input x, y, and a type of covariance. (Note: If a Matern covariance is used, then smoothness must be supplied. See below how one might select between different smoothness values.)
Of course, spatialProcess runs slower than the functions would with a user-supplied theta. One must take caution when using spatialProcess with large datasets. Below, we run through this climate example using spatialProcess to construct our model.
@@ -1338,7 +1403,7 @@ 6.4 COmonthlyMet
-Observe the largely different values for parameters, but similar values for log-likelihood. Note that there is an interplay between smoothness and range, so directly comparing theta is not appropriate. Will the underlying spatial process be much different if we vary the smoothness? We’ll use the drop.Z command to make the differences in the spatial process clear.
+Observe the largely different values for parameters, but similar values for log-likelihood. Note that there is an interplay between smoothness and range, so directly comparing theta is not appropriate. Will the underlying spatial process be much different if we vary the smoothness? We'll use the drop.Z command to make the differences in the spatial process clear.
pred0.5 <- predictSurface.Krig( out0.5,grid.list=CO.Grid,ZGrid=CO.elevGrid,drop.Z=TRUE)
pred1 <- predictSurface.Krig( out1,grid.list=CO.Grid,ZGrid=CO.elevGrid,drop.Z=TRUE)
@@ -1473,7 +1538,7 @@ 7.3 grid.list, \({\large{\mathbf{Basic \> Usage}}}\)
make.surface.grid(grid.list)
-as.surface(obj, z, order.variables=“xy”)
+as.surface(obj, z, order.variables="xy")
\({\large{\mathbf{Value}}}\)
The result will be a list with x, y, and z components suitable for plotting with functions such as persp, image, surface, contour, drape.plot.
@@ -1512,12 +1577,12 @@ 7.3 grid.list, Using surface combines image.plot with contour as its default.
surface(look.surface)

-There is a graphics function persp which can produce 3-d plots. drape.plot is a wrapper function in fields that automatically adds color based on how it’s done in image.plot/surface.
+There is a graphics function persp which can produce 3-d plots. drape.plot is a wrapper function in fields that automatically adds color based on how it's done in image.plot/surface.
set.panel(1,2)
persp(look.surface)
drape.plot(look.surface)

-In summary, we fit a spatialProcess object from our data, created a “grid.list”, constructed a full set of gridded locations using the “grid.list,” found Kriging estimates at the gridded locations, converted back to a “grid.list”, and then plotted with surface. Compare this with the convenient surface.mKrig.
+In summary, we fit a spatialProcess object from our data, created a "grid.list", constructed a full set of gridded locations using the "grid.list," found Kriging estimates at the gridded locations, converted back to a "grid.list", and then plotted with surface. Compare this with the convenient surface.mKrig.
fit <- spatialProcess(RMprecip$x, RMprecip$y, theta=20)
surface(fit)

@@ -1530,7 +1595,7 @@ 7.4 surface
\({\large{\mathbf{Basic \> Usage}}}\)
-surface( object, grid.list = NULL, extrap = FALSE, type=“C”)
+surface( object, grid.list = NULL, extrap = FALSE, type="C")
@@ -1674,8 +1739,8 @@ 8.3 drape.plot and <
\({\large{\mathbf{Basic \> Usage}}}\)
drape.plot(x, y, z, z2=NULL, col = tim.colors(64), zlim = range(z, na.rm=TRUE), zlim2 = NULL)
-drape.color(z, col = tim.colors(64), zlim = NULL,breaks,transparent.color = “white”)
-pushpin( x,y,z,p.out, height=.05,col=“black”,text=NULL,adj=-.1,cex=1.0,…)
+drape.color(z, col = tim.colors(64), zlim = NULL,breaks,transparent.color = "white")
+pushpin( x,y,z,p.out, height=.05,col="black",text=NULL,adj=-.1,cex=1.0,...)
\({\large{\mathbf{Value}}}\)
If an assignment is made, the projection matrix from persp is returned. This information can be used to add additional 3-d features to the plot, such as pushpin. See the persp help file for an example how to add additional points and lines using the trans3d function and also the example below.
@@ -1723,17 +1788,17 @@ 8.4 Color palettes
tim.colors gives a very nice spectrum that is close to the jet color scale in MATLAB® and is the default for image.plot and the like.
larry.colors is particularly useful for visualizing fields of climate variables.
two.colors is really about three different colors. For other colors try fields.color.picker to view possible choices. start="darkgreen", end="azure4" are the options used to get a nice color scale for rendering aerial photos of ski trails. (http://www.image.ucar.edu/Data/MJProject)
-designer.colors is the master function for two.colors and tim.colors. It can be useful if one wants to customize the color table to match quantiles of a distribution. e.g. if the median of the data is at .3 with respect to the range then set x equal to c(0,.3,1) and specify three colors to provide a transtion that matches the median value. In fields language, this function interpolates between a set of colors at locations x. While you can be creative about choosing these colors, just using another color scale as the basis is easy. For example, designer.color( 256, rainbow(4), x= c( 0,.2,.8,1.0)) leaves the choice of the colors to Dr. R after a thunderstorm.
+designer.colors is the master function for two.colors and tim.colors. It can be useful if one wants to customize the color table to match quantiles of a distribution. e.g. if the median of the data is at .3 with respect to the range then set x equal to c(0,.3,1) and specify three colors to provide a transtion that matches the median value. In fields language, this function interpolates between a set of colors at locations x. While you can be creative about choosing these colors, just using another color scale as the basis is easy. For example, designer.color( 256, rainbow(4), x= c( 0,.2,.8,1.0)) leaves the choice of the colors to Dr. R after a thunderstorm.
color.scale assigns colors to a numerical vector in the same way as the image function. This is useful to keep the assigment of colors consistent across several vectors by specifiying a common zlim range. Finally, plotColorScale is a simple function to plot a vector of colors to examine their values.
-Also, don’t forget the built in heat.colors, topo.colors, terrain.colors, grey.colors, etc.
+Also, don't forget the built in heat.colors, topo.colors, terrain.colors, grey.colors, etc.
\({\large{\mathbf{Basic \> Usage}}}\)
tim.colors(n = 64, alpha=1.0)
larry.colors()
-two.colors(n=256, start=“darkgreen”, end=“red”, middle=“white”,alpha=1.0)
-designer.colors( n=256, col= c(“darkgreen”, “white”, “darkred”), x=seq(0,1,, length(col)) ,alpha=1.0)
+two.colors(n=256, start="darkgreen", end="red", middle="white",alpha=1.0)
+designer.colors( n=256, col= c("darkgreen", "white", "darkred"), x=seq(0,1,, length(col)) ,alpha=1.0)
\({\large{\mathbf{Value}}}\)
A vector giving the colors in a hexadecimal format, two extra hex digits are added for the alpha channel.
@@ -1766,7 +1831,7 @@ 8.5 interp.surface
We recommend this method for fast visualization and presentation of images. However, if the actual interpolated values will be used for analysis, use a statistical method such as Tps or fastTps to get an interpolation that is based on first principles.
Here is a brief explanation of the interpolation: Suppose that the location (locx, locy) lies in between the first two grid points in both x and y. That is, locx is between x1 and x2 and locy is between y1 and y2. Let \(e_x= (l_1-x_1)/(x_2-x_1)\) and \(e_y= (l_2-y_1)/(y_2-y_1)\). The interpolant is
\[(1-e_x)(1-e_y)z_{11} + (1-e_x)(e_y)z_{12} + (e_x)(1-e_y)z_{21} + (e_x)(e_y)z_{22}\]
-where the z’s are the corresponding elements of the Z matrix. See also the akima package for fast interpolation from irregular locations.
+where the z's are the corresponding elements of the Z matrix. See also the akima package for fast interpolation from irregular locations.
\({\large{\mathbf{Basic \> Usage}}}\)
@@ -1811,7 +1876,7 @@ 9 Covariance Models Revisited
9.1 Rad.cov, Rad.cov.simple, cubic.cov, and wendland.cov
Radial basis functions are the natural covariance that comes out of the thin plate spline minimization problem (see the section Tps Revisited). The functional form is the following: \[\operatorname{Cov}(Y(\mathbf x), Y(\mathbf x')) = C(r) =c_0(m,d) *\begin{cases} r^{2m-d}, & d \text{ odd} \\ r^{2m-d} \ln(r), & d \text{ even} \end{cases}\] where \(r = \| \mathbf x - \mathbf x' \|\) and \(c_0(m,d)\) is a multiplicative constant depending on \(m\) and \(d\).
-The argument d is inferred from the dimension of the input data. If m isn’t specified by the user, by default it is set to the smallest integer such that \(p = 2m-d\) is positive. Alternatively, the user can specify p.
+The argument d is inferred from the dimension of the input data. If m isn't specified by the user, by default it is set to the smallest integer such that \(p = 2m-d\) is positive. Alternatively, the user can specify p.
Note that these RBFs are generalized covariance functions, which means they are only valid covariances when restricted to a subspace of linear combinations of the field. The function Rad.simple.cov is a straightforward implementation in R code (computes fewer checks on the data). It can be used like Exp.cov to write a new covariance function.
cubic.cov is a specific case of Rad.cov where we set d=1 and m=1.
All of these covariance functions can be equipped to take advantage of sparsity in the covariance matrix by using the SPAM package. The functions Wendland and wendland.cov allow easy use of the Wendland compactly supported polynomials as covariance functions. This will output a sparse covariance matrix, which helps shorten the computation time when working with large datasets. The main argument (besides location) to in the Wendland function is theta, which by default is set to 1. This is the taper range, or the radius of support of the function. The Wendland function is identically 0 for distances greater than theta. If the user would like to scale each coordinate independently, they can provide a matrix or the diagonal of a matrix for the argument V, which is the inverse linear transformation of the coordinates before distances are found. Another parameter that can be chosen is k, the order of the Wendland polynomial, which determines the smoothness.
@@ -1827,8 +1892,8 @@ 9.2 stationary.taper.covRad.cov(x1, x2, p = 1, m=NA)
Rad.simple.cov(x1, x2, p=1)
cubic.cov(x1, x2, theta = 1)
-stationary.cov(x1, x2=NULL, Covariance = “Exponential”, Distance = “rdist”,theta = 1, V = NULL)
-stationary.taper.cov(x1, x2, Covariance=“Exponential”,Taper=“Wendland”,theta=1.0,V=NULL)
+stationary.cov(x1, x2=NULL, Covariance = "Exponential", Distance = "rdist",theta = 1, V = NULL)
+stationary.taper.cov(x1, x2, Covariance="Exponential",Taper="Wendland",theta=1.0,V=NULL)
wendland.cov(x1, x2, theta = 1, V=NULL, k = 2)
\({\large{\mathbf{Value}}}\)
@@ -1913,7 +1978,7 @@ 9.4 stationary.taper.cov
10 mKrig
-The mKrig function is both a Kriging/linear algebra engine and also user-level function for fitting spatial models. The mKrig engine is a simple and fast version of Krig. The “m” stands for micro, being a succinct (and we hope readable) function. The function focuses on the same computations as in Krig.engine.fixed done for a fixed lambda parameter, for unique spatial locations and for data without missing values. mKrig is optimized for large data sets, sparse linear algebra, and a clear exposition of the computations. The underlying code in mKrig is more readable and straightforward, but it does fewer checks on the data than Krig. One savings in computation is that Monte Carlo simulations are used to compute the trace of the smoothing matrix \(\operatorname{tr}(A(\lambda))\), which can be used for the effective degrees of freedom of the model.
+The mKrig function is both a Kriging/linear algebra engine and also user-level function for fitting spatial models. The mKrig engine is a simple and fast version of Krig. The "m" stands for micro, being a succinct (and we hope readable) function. The function focuses on the same computations as in Krig.engine.fixed done for a fixed lambda parameter, for unique spatial locations and for data without missing values. mKrig is optimized for large data sets, sparse linear algebra, and a clear exposition of the computations. The underlying code in mKrig is more readable and straightforward, but it does fewer checks on the data than Krig. One savings in computation is that Monte Carlo simulations are used to compute the trace of the smoothing matrix \(\operatorname{tr}(A(\lambda))\), which can be used for the effective degrees of freedom of the model.
Just as in other functions, we can use cov.function and cov.args to specify which covariance we want to use in the model. mKrig is often used with a compactly supported covariance to take advantage of sparsity in computations. See fastTps as an example of how it uses mKrig.
There are other arguments that can be specified in chol.args for efficiency when dealing with a large data set. For example, nnzR is a guess at how many nonzero entries there are in the Cholesky decomposition of the linear system used to solve for the basis coefficients. Specifying this to increase size will help to avoid warnings from the spam package.
mKrig.trace is an internal function called by mKrig to estimate the effective degrees of freedom. The Kriging surface estimate at the data locations is a linear function of the data and can be represented as \(A(\lambda)y\).
@@ -1925,7 +1990,7 @@ 10 mKrig
\({\large{\mathbf{Basic \> Usage}}}\)
-mKrig(x, y, Z = NULL, cov.function = “stationary.cov”, lambda = 0, m = 2)
+mKrig(x, y, Z = NULL, cov.function = "stationary.cov", lambda = 0, m = 2)
\({\large{\mathbf{Value}}}\)
Returns an object of class mKrig.
@@ -1972,7 +2037,7 @@ 10.2 COmonthlyMet
10.3 flame
The flame data consists of a 2 column matrix x with different fuel and oxygen flow rates for the burner, and vector y with the intensity of light at a particular wavelength indicative of Lithium ions. The characteristics of an ionizing flame are varied with the intent of maximizing the intensity of emitted light for lithuim in solution. Areas outside of the measurements are where the mixture may explode! Note that the optimum is close to the boundary.
-This example shows the versatility of fields, fitting Krig, spatialProcess, and mKrig models to the data. Note how much faster mKrig is, and how even with a Wendland it is able to reproduce the other fits’ behaviours fairly well.
+This example shows the versatility of fields, fitting Krig, spatialProcess, and mKrig models to the data. Note how much faster mKrig is, and how even with a Wendland it is able to reproduce the other fits' behaviours fairly well.
x <- flame$x
y <- flame$y
look <- as.image(Z=y, x=x)
@@ -1989,7 +2054,7 @@ 10.3 flame
10.4 Krig.replicates with NWSC
-We now consider the NWSC data set. The data we consider are the temperature and relative humidity as our spatial indexing variables, and our response is power usage. The observations aren’t at unique locations, so to use mKrig or spatialProcess, we will have to collapse the observations to unique locations with the function Krig.replicates.
+We now consider the NWSC data set. The data we consider are the temperature and relative humidity as our spatial indexing variables, and our response is power usage. The observations aren't at unique locations, so to use mKrig or spatialProcess, we will have to collapse the observations to unique locations with the function Krig.replicates.
library(sp) #for colors
load("NWSC.rda")
x <- cbind(NWSC$RH, NWSC$Otemp)
@@ -2024,7 +2089,7 @@ 10.5 mKrig and a GCV
x<- ozone2$lon.lat[good,]
mKrig( x,y,cov.function="stationary.taper.cov",theta = 2.0, lambda=.01,
Taper="Wendland", Taper.args=list(theta = 1.5, k=2, dimension=2)) -> out2
-Now, we will compare many lambda’s on a grid using GCV for this small data set. One should really just use Krig or Tps, but this is an example of approximate GCV that will work for much larger data sets using sparse covariances and the Monte Carlo trace estimate.
+Now, we will compare many lambda's on a grid using GCV for this small data set. One should really just use Krig or Tps, but this is an example of approximate GCV that will work for much larger data sets using sparse covariances and the Monte Carlo trace estimate.
lgrid<- 10**seq(-1,1,,15) # a grid of lambdas
GCV<- matrix( NA, 15,20)
trA<- matrix( NA, 15,20)
@@ -2086,7 +2151,7 @@ 10.6 Collapse fixed effects with
## [8,] -525.6700 0.3666253 24.196703 5.337032 77.69217
## [9,] -529.7651 0.5806189 18.395516 5.868205 59.30883
## [10,] -540.6156 0.3602721 147.142371 7.314397 148.50003
-Names of summary: “lnProfileLike.FULL”, “lambda”, “theta”, “sigmaMLE”, “rhoMLE”, “funEval”, “gradEval”. We left out the last two columns in the previous output. It can be interesting to visualize the results as pairwise scatter plots for individual estimates.
+Names of summary: "lnProfileLike.FULL", "lambda", "theta", "sigmaMLE", "rhoMLE", "funEval", "gradEval". We left out the last two columns in the previous output. It can be interesting to visualize the results as pairwise scatter plots for individual estimates.
plot(O3MLE[,2], O3MLE[,3], log = 'xy', xlab = "lambda", ylab = "theta")
title("theta vs lambda for 10 days")

@@ -2103,7 +2168,7 @@ 10.6 Collapse fixed effects with
mKrig.coef finds the d and c coefficients representing the solution using the previous cholesky decomposition for a new data vector. This is used in computing the prediction standard error in predictSE.mKrig and can also be used to evalute the estimate efficiently at new vectors of observations provided the locations and covariance remain fixed.
-Since this data set has NA’s, we remove any columns (observation locations) with an NA. Beware of blindly removing data like this - we are just illustrating the mKrig.coef functionality.
+Since this data set has NA's, we remove any columns (observation locations) with an NA. Beware of blindly removing data like this - we are just illustrating the mKrig.coef functionality.
goody <- na.omit(t(ozone2$y))
omitted <- attr(goody, "na.action")
x <- ozone2$lon.lat[-omitted,]
@@ -2136,7 +2201,7 @@ 10.7 Approximate standard errors
O3.fit<- mKrig( x,y, Covariance="Matern", theta=.5,smoothness=1.0, lambda= .01 )
-Now, we’ll look at some approximate conditional simulations of our random field and the missing data.
+Now, we'll look at some approximate conditional simulations of our random field and the missing data.
set.seed(122)
O3.sim<- sim.mKrig.approx( O3.fit, nx=100, ny=100, gridRefinement=3, M=2 )
@@ -2180,17 +2245,17 @@ 10.8 Finding taper range
## To avoid the iteration, increase 'nearestdistnnz' option to something like
## 'spam.options(nearestdistnnz=c(923230,400))'
## (constructed 1213 lines out of 1500).
-The warning resets the memory allocation for the covariance matrix according the to values spam.options(nearestdistnnz=c(416052,400)). This is inefficient because the preliminary pass failed. The following call completes the computation in “one pass” without a warning and without having to reallocate more memory.
-spam.options(nearestdistnnz=c(4.2E5,400))
+The warning resets the memory allocation for the covariance matrix according the to values spam.options(nearestdistnnz=c(416052,400)). This is inefficient because the preliminary pass failed. The following call completes the computation in "one pass" without a warning and without having to reallocate more memory.
+options(spam.nearestdistnnz=c(4.2E5,400))
look2<- mKrig( x,y, cov.function="wendland.cov",k=2, theta=.9, lambda=1e-2)
-## Warning in nearest.dist(structure(c(0.963543639518321, 0.0860745231620967, : You ask for a 'dense' spase distance matrix, I require one more iteration.
+## Warning in nearest.dist(structure(c(0.963543639518321, 0.0860745231620967, : You ask for a 'dense' sparse distance matrix, I require one more iteration.
## To avoid the iteration, increase 'nearestdistnnz' option to something like
-## 'spam.options(nearestdistnnz=c(923230,400))'
+## 'options(spam.nearestdistnnz=c(600000,400))'
## (constructed 966 lines out of 1500).
-## Warning in nearest.dist(structure(c(0.963543639518321, 0.0860745231620967, : You ask for a 'dense' spase distance matrix, I require one more iteration.
+## Warning in nearest.dist(structure(c(0.963543639518321, 0.0860745231620967, : You ask for a 'dense' sparse distance matrix, I require one more iteration.
## To avoid the iteration, increase 'nearestdistnnz' option to something like
-## 'spam.options(nearestdistnnz=c(923230,400))'
+## 'options(spam.nearestdistnnz=c(600000,400))'
## (constructed 966 lines out of 1500).
As a check notice that print(look2) reports the number of nonzero elements consistent with the specifc allocation increase in spam.options.
Now we generate a new data set of 1500 locations.
@@ -2267,7 +2332,7 @@ 10.9 Using V,
11 Tps and fastTps
11.1 Tps and Krig
-We return to Tps, but this time consider spatial data. The Tps estimate can be thought of as a special case of a spatial process estimate using a particular generalized covariance function. Because of this correspondence the Tps function is a wrapper that determines the polynomial order for the null space and then calls Krig with the radial basis generalized covariance rad.cov. The RBF is the reproducing kernel that generates the reproducing kernel Hilbert space in which the minimization problem is set. See Wahba’s Spline Models for Observational Data.
+We return to Tps, but this time consider spatial data. The Tps estimate can be thought of as a special case of a spatial process estimate using a particular generalized covariance function. Because of this correspondence the Tps function is a wrapper that determines the polynomial order for the null space and then calls Krig with the radial basis generalized covariance rad.cov. The RBF is the reproducing kernel that generates the reproducing kernel Hilbert space in which the minimization problem is set. See Wahba's Spline Models for Observational Data.
In this way Tps requires little extra coding beyond the Krig function itself, and in fact the returned list from Tps is a Krig object and so it is adapted to all the special functions developed for this object format. One downside is some confusion in the summary of the thin plate spline as a spatial process type estimator. For example, the summary refers to a covariance model and maximum likelihood estimates. This is not something usually reported with a spline fit! Similarly, fastTps is a convenient wrapper function that calls mKrig with the Wendland covariance function.
@@ -2277,7 +2342,7 @@ 11.1 Tps and K
fastTps(x, Y, m = NULL, p = NULL, theta, lon.lat=FALSE, lambda=0)
\({\large{\mathbf{Value}}}\)
-Both functions returns a list of class Krig. m determines the degree (m-l) of the polynomial function capturing the spatial drift/trend component of the model. The default is the value such that 2m-d is greater than zero where d is the dimension of x. p is the polynomial power for Wendland radial basis functions. The default is 2m-d. theta is the tapering range that is passed to the Wendland compactly supported covariance. The covariance (i.e. the radial basis function) is zero beyond range theta. The larger theta is, the closer this model will approximate the standard thin plate spline.
+Both functions returns a list of class Krig. m determines the degree (m-l) of the polynomial function capturing the spatial drift/trend component of the model. The default is the value such that 2m-d is greater than zero where d is the dimension of x. p is the polynomial power for Wendland radial basis functions. The default is 2m-d. theta is the tapering range that is passed to the Wendland compactly supported covariance. The covariance (i.e. the radial basis function) is zero beyond range theta. The larger theta is, the closer this model will approximate the standard thin plate spline.
Some care needs to exercised in specifying the taper range theta and power p which describes the polynomial behavior of the Wendland at the origin. Note that unlike Tps the locations are not scaled to unit range and this can cause havoc in smoothing problems with variables in very different units. So rescaling the locations x <- scale(x) is a good idea for putting the variables on a common scale for smoothing. This function does have the potential to approximate estimates of Tps for very large spatial data sets. Also, the function predictSurface.fastTps has been made more efficient for the case of k = 2 and m = 2.
@@ -2423,11 +2488,11 @@ 12.2 REML
\ell(\mathbf{d},\boldsymbol{\xi}) \propto -\frac{1}{2} \left( \log( \det(\Sigma)) + (\mathbf Y - \mathbf{Zd})^T \Sigma^{-1} (\mathbf Y - \mathbf{Zd}) \right)
\]
For any fixed \(\boldsymbol \xi\), the value of \(\mathbf d\) that maximizes \(\ell\) is the Generalized Least Squares (GLS) estimate \(\hat{\mathbf d} = (\mathbf{Z}^T \Sigma^{-1} \mathbf{Z})^{-1} \mathbf{Z}^T \Sigma^{-1} \mathbf{Y}\).
-To maximize \(\ell\), one often “profiles” and eliminates \(\mathbf{d}\) in the equation for \(\ell\) by substituting \(\mathbf{d} = \hat {\mathbf d}\). Then \(\ell(\mathbf{d},\boldsymbol \xi)=\ell(\boldsymbol \xi)\) only depends on \(\boldsymbol \xi\). Numerical optimization (nonlinear methods like BFGS) is used to find the remaining parameters, and then the GLS estimate \(\hat {\mathbf d}\) is updated with the optimal parameter values. (See Handbook of Spatial Statistics, p. 46).
+To maximize \(\ell\), one often "profiles" and eliminates \(\mathbf{d}\) in the equation for \(\ell\) by substituting \(\mathbf{d} = \hat {\mathbf d}\). Then \(\ell(\mathbf{d},\boldsymbol \xi)=\ell(\boldsymbol \xi)\) only depends on \(\boldsymbol \xi\). Numerical optimization (nonlinear methods like BFGS) is used to find the remaining parameters, and then the GLS estimate \(\hat {\mathbf d}\) is updated with the optimal parameter values. (See Handbook of Spatial Statistics, p. 46).
However, this process of Maximum Likelihood Estimation (MLE) tends to give biased results, as \[
(\mathbf{Y} - \mathbf{Z} \hat{\mathbf{d}})^T \Sigma^{-1} (\mathbf Y - \mathbf{Z} \hat {\mathbf{d}})
\le (\mathbf{Y}-\mathbf{Zd}) \Sigma^{-1} (\mathbf Y - \mathbf{Zd}), \quad \forall \mathbf{d}.
-\] Intuitively, this means that \(\mathbf{Z} \hat{\mathbf d}\) is “closer” to \(\mathbf{Y}\) than any other estimate \(\mathbf{Zd}\), even if \(\mathbf{d}\) is the true parameter! As a result, the MLE for \(\boldsymbol \xi\) is biased due to this simultaneous estimation of \(\mathbf{d}\). In particular, the MLE for \(\boldsymbol \xi\) tends to underestimate the process variance.
+\] Intuitively, this means that \(\mathbf{Z} \hat{\mathbf d}\) is "closer" to \(\mathbf{Y}\) than any other estimate \(\mathbf{Zd}\), even if \(\mathbf{d}\) is the true parameter! As a result, the MLE for \(\boldsymbol \xi\) is biased due to this simultaneous estimation of \(\mathbf{d}\). In particular, the MLE for \(\boldsymbol \xi\) tends to underestimate the process variance.
This leads to Restricted Maximum Likelihood (REML), which attempts to reduce this bias when using maximum likelihood methods. We use a linear transformation \(\mathbf Y^* = \mathbf {AY}\) of the data so that the distribution of \(\mathbf{Y}^*\) does not depend on \(\mathbf{d}\). In particular, we use the matrix \(\mathbf A = \mathbf{I} - \mathbf{Z} (\mathbf{Z}^T \mathbf{Z})^{-1} \mathbf{Z}^T\) that projects \(\mathbf{Y}\) to ordinary least squares residuals. The resulting matrix product \(\mathbf{AY}\) is independent of \(\mathbf{d}\). \[
\ell_R(\boldsymbol \xi) = - \frac{1}{2} \left(\log(\det(\Sigma)) + \frac{1}{2} \log( \det( \mathbf{Z}^T \Sigma^{-1} \mathbf{Z})) + \mathbf{Y}^T (\Sigma^{-1} - \Sigma^{-1} \mathbf{Z} (\mathbf{Z}^T \Sigma^{-1} \mathbf{Z})^{-1} \mathbf{Z}^T \Sigma^{-1}) \mathbf{Y} \right)
\] This ends up being the profile log-likelihood with an additional term involving \(\log( \det( \mathbf{Z}^T \Sigma^{-1} \mathbf{Z}))\). Again, this expression must be maximized numerically with respect to \(\boldsymbol \xi\). The procedure concludes by estimating \(\mathbf d\) with GLS and the estimate \(\hat{\boldsymbol \xi}\) plugged in.
@@ -2457,12 +2522,29 @@ 12.3 GCV
+
+
+
+
+
+
+
diff --git a/vignette/fieldsVignette.pdf b/vignette/fieldsVignette.pdf
new file mode 100644
index 0000000..b31182f
Binary files /dev/null and b/vignette/fieldsVignette.pdf differ
diff --git a/vignette/mKrig.Rmd b/vignette/mKrig.Rmd
index 209aab4..5385c80 100755
--- a/vignette/mKrig.Rmd
+++ b/vignette/mKrig.Rmd
@@ -5,6 +5,7 @@ knitr::opts_chunk$set(echo = TRUE, cache=TRUE)
library(fields)
library(colorRamps)
library(RColorBrewer)
+library(spam)
```
The `mKrig` function is both a Kriging/linear algebra engine and also user-level function for fitting spatial models. The `mKrig` engine is a simple and fast version of `Krig`. The "m" stands for micro, being a succinct (and we hope readable) function. The function focuses on the same computations as in `Krig.engine.fixed` done for a fixed `lambda` parameter, for unique spatial locations and for data without missing values. `mKrig` is optimized for large data sets, sparse linear algebra, and a clear exposition of the computations. The underlying code in `mKrig` is more readable and straightforward, but it does fewer checks on the data than `Krig`. One savings in computation is that Monte Carlo simulations are used to compute the trace of the smoothing matrix $\operatorname{tr}(A(\lambda))$, which can be used for the effective degrees of freedom of the model.
@@ -340,7 +341,7 @@ look2<- mKrig(x,y, cov.function="wendland.cov",k=2, theta=.9, lambda=.1)
The warning resets the memory allocation for the covariance matrix according the to values `spam.options(nearestdistnnz=c(416052,400))`. This is inefficient because the preliminary pass failed. The following call completes the computation in "one pass" without a warning and without having to reallocate more memory.
```{r}
-spam.options(nearestdistnnz=c(4.2E5,400))
+options(spam.nearestdistnnz=c(4.2E5,400))
look2<- mKrig( x,y, cov.function="wendland.cov",k=2, theta=.9, lambda=1e-2)
```
diff --git a/vignette/masterfile.Rmd b/vignette/masterfile.Rmd
index b0127bb..5751550 100755
--- a/vignette/masterfile.Rmd
+++ b/vignette/masterfile.Rmd
@@ -1,8 +1,10 @@
---
title: '`fields` vignette'
-author: "Ashton Wiens and Mitchell Krock"
+author: "Ashton Wiens, Mitchell Krock, and Emma Lilly"
date: "`r format(Sys.time(), '%d %B, %Y')`"
output:
+ pdf_document:
+ toc: yes
html_document:
number_sections: yes
toc: yes
@@ -10,7 +12,7 @@ output:
```{r set-options, echo=FALSE, eval=FALSE}
options(width = 400)
-knitr::opts_chunk$set(fig.show = 'hide') #only if you want no figures
+#knitr::opts_chunk$set(fig.show = 'hide') #only if you want no figures
```
-
-
@@ -190,8 +178,8 @@
fields vignette
-Ashton Wiens and Mitchell Krock
-15 January, 2020
+Ashton Wiens, Mitchell Krock, and Emma Lilly
+06 August, 2020
@@ -351,10 +339,10 @@ 1 Introduction
For many of these functions, we present their basic usage in this vignette. We have edited out secondary arguments in the presentation to focus on the essential options.
We have tried to present this vignette as a narrative that introduces spatial statistics, describes how to use the functions in fields, and also includes some relevant theory about splines and Kriging. The goal is to get the interested reader started quickly with many data based examples. Core functions in fields return S3 objects that are compatible with generic functions such as plot, predict, surface, etc. Thus, the analysis flow will seem familiar to other methods such as lm. If the reader is familiar with Kriging, they can skip to Chapters 6-8 to see some examples.
-We begin by showing quick examples using some of fields’ most important functions. This gives the user an idea of what kind of problems can be solved and how to implement the functions in fields to solve them.
+We begin by showing quick examples using some of fields' most important functions. This gives the user an idea of what kind of problems can be solved and how to implement the functions in fields to solve them.
In the third chapter we discuss fitting splines to univariate data. We begin with a simple time series data set, but the discussion applies more generally to data with one predictor variable x and one response variable y. The functions splint and sreg are used to fit cubic splines to this type of data. Then we introduce Tps, a core function in the package that can fit thin plate splines.
-Chapters 4-6 build up to Kriging. We have tried give an intuitive presentation of the fundamental concepts in spatial statistics, and at the same time introduce the corresponding functions in fields. Initially, the Kriging linear algebra is done manually for exposition, and then the package’s most important function spatialProcess is presented in Chapter 7.
-There is a brief interlude to discuss the package’s plotting functions as well as some image processing functions in Chapters 8-9. Many of them will have been used up to this point, but here we give a thorough description of the plotting options to be used with images, surfaces, and the package’s models. See quilt.plot to get a quick plot of raw data to discern spatial patterns. There are also several functions for working with image data specifically. The functions image.smooth and smooth.2d are capable of smoothing images, and interp.surface performs fast bilinear interpolation (often useful when zooming in on images from a grid).
+Chapters 4-6 build up to Kriging. We have tried give an intuitive presentation of the fundamental concepts in spatial statistics, and at the same time introduce the corresponding functions in fields. Initially, the Kriging linear algebra is done manually for exposition, and then the package's most important function spatialProcess is presented in Chapter 7.
+There is a brief interlude to discuss the package's plotting functions as well as some image processing functions in Chapters 8-9. Many of them will have been used up to this point, but here we give a thorough description of the plotting options to be used with images, surfaces, and the package's models. See quilt.plot to get a quick plot of raw data to discern spatial patterns. There are also several functions for working with image data specifically. The functions image.smooth and smooth.2d are capable of smoothing images, and interp.surface performs fast bilinear interpolation (often useful when zooming in on images from a grid).
In Chapter 10, we return to covariances and Kriging in more detail, including all of the covariance options in fields. Then we show how the user can write their own covariance function for use with all of the functionality in fields. This allows use to discuss three important wrappers on Krig in the following chapters.
In Chapter 11, we cover the Kriging optimization workhorse spatialProcess. The user only needs to specify the data and a covariance model, and then spatialProcess will perform a grid search for the optimal values of the rest of the parameters and return the spatial model. Optimization methods include restricted maximum likelihood (REML) and generalized cross validation (GCV). The user should exercise caution, however, when using black box optimization routines. We hope this chapter will help explain the optimization procedure and assist the user with fitting the right model parameters.
In Chapter 12, we introduce the concept of using a compactly supported covariance function to generate a sparse covariance matrix. The function mKrig (microKrig) is set up to take advantage of this sparsity in computation by using the spam package. We return to the Tps function in Chapter 13 with spatial data. The theory of thin plate splines is presented in the framework of Kriging. Finally, we show how the fastTps function is an analogue of mKrig that can similarly take advantage of sparse linear algebra.
@@ -392,7 +380,7 @@ 2.1 Visualizing raw data with
2.2 spatialProcess with a covariate
-fields makes it easy to fit a spatial model from data, predict at arbitrary locations or on a grid, and plot the results. Additional covariates (e.g. elevation) can also be included in the linear trend. We can use predictSurface to predict on a grid and output a surface object, much like predict but more convenient for plotting.
+fields makes it easy to fit a spatial model from data, predict at arbitrary locations or on a grid, and plot the results. Additional covariates (e.g. elevation) can also be included in the linear trend. We can use predictSurface to predict on a grid and output a surface object, much like predict but more convenient for plotting.
data( COmonthlyMet)
# predicting average daily minimum temps for spring in Colorado
obj<- spatialProcess( CO.loc, CO.tmin.MAM.climate, Z= CO.elev)
@@ -410,7 +398,7 @@ 2.2 spatialProcess w
US(add=TRUE, col="grey")
points(CO.loc[,1], CO.loc[,2], col="magenta", pch=21, bg="white")
title("Standard errors for avg Spring daily min. temp in CO")
-Computing standard errors can be computationally expensive. An alternative is to use sim.spatialProcess to conditionally simulate the process, and then use these simulations to approximate the standard errors. In practice, a large M (e.g. 100) should be used to produce many simulations to reduce the variability when finding sample standard deviations in the Monte Carlo sample.
+Computing standard errors can be computationally expensive. An alternative is to use sim.spatialProcess to conditionally simulate the process, and then use these simulations to approximate the standard errors. In practice, a large M (e.g. 100) should be used to produce many simulations to reduce the variability when finding sample standard deviations in the Monte Carlo sample.
sim <- sim.spatialProcess(obj, xp=make.surface.grid(CO.Grid)[1:5,], Z=CO.elevGrid$z[1:5], M=30)
look <- as.surface(CO.Grid, t(sim[1,]))
look2 <- as.surface(CO.Grid, t(sim[2,]))
@@ -499,7 +487,7 @@ 3.1 splint and The splint function returns a vector of values of the interpolating spline evaluated at xgrid, while sreg returns a list of class sreg. Some of the relevant components of the sreg list are the original data x and y, the smoothness parameters lambda and df (same as trace), and the residuals and fitted.values. Note that the identical parameters lam and lambda both appear for covenience.
-First, we’ll use interpolation to fit the control group data. We use the x data coordinates as nodes to interpolate, and then evaluate the model at the grid locations to plot. Interpolation of a set of data points \((x_i,y_i)_{i=1}^n\) means that the interpolating function passes exactly through these points.
+First, we'll use interpolation to fit the control group data. We use the x data coordinates as nodes to interpolate, and then evaluate the model at the grid locations to plot. Interpolation of a set of data points \((x_i,y_i)_{i=1}^n\) means that the interpolating function passes exactly through these points.
grd <- seq(range(x)[1], range(x)[2], length.out = 400)
spl <- splint(x,y,grd)
plot(y~x, pch=".", cex=5, ylab="Median Food Intake", xlab="Days")
@@ -508,7 +496,7 @@ 3.1 splint and

-Often we want to smooth the data rather than interpolate, for example if we assumed possible error in data collection. Loosely, a smoothing function follows the general “trend” of the data, but it may not exactly interpolate the data points.
+Often we want to smooth the data rather than interpolate, for example if we assumed possible error in data collection. Loosely, a smoothing function follows the general "trend" of the data, but it may not exactly interpolate the data points.
The parameters df and lambda both control the smoothing in these functions. Interpolating the data corresponds to \(\lambda=0\) and df= the number of observations. As lambda increases, the spline gets smoother.
In place of lambda, a more useful scale is in terms of the effective number of parameters (degrees of freedom) associated with the estimate. We denote this by EDF (effective degrees of freedom). There is a 1-1 relationship between \(\lambda\) and EDF, and both are useful measures in slightly different contexts: \(\lambda\) for computing with the spatial process model versus EDF for data smoothing. The user should be aware that splint defaults to lambda = 0 (interpolation), while sreg defaults to using GCV to find lambda.
#Note this small lambda almost interpolates, close to splint model fit above
@@ -561,7 +549,7 @@ 3.2 sreg and S3 meth
## lambda trA GCV shat converge
## GCV 11.13 7.446 2.378 1.387 5
## GCV.one 11.13 7.446 2.378 1.387 5
-plot gives a series of four diagnostic plots describing the fit. For sreg, plot 1 shows data vs. predicted values, and plot 2 shows predicted values vs. residuals. Plot 3 shows the criteria to select the smoothing parameter \(\lambda = \sigma^2/\rho\). The x axis has transformed \(\lambda\) in terms of effective degrees of freedom to make it easier to interpret. Note that here the GCV function is minimized while the REML is maximized. Finally, plot 4 shows a histogram of the standard residuals.
+plot gives a series of four diagnostic plots describing the fit. For sreg, plot 1 shows data vs. predicted values, and plot 2 shows predicted values vs. residuals. Plot 3 shows the criteria to select the smoothing parameter \(\lambda = \sigma^2/\rho\). The x axis has transformed \(\lambda\) in terms of effective degrees of freedom to make it easier to interpret. Note that here the GCV function is minimized while the REML is maximized. Finally, plot 4 shows a histogram of the standard residuals.
set.panel(2,2)
plot(fit) # four diagnostic plots of fit

@@ -591,7 +579,7 @@ 3.3 Tps
The Tps function is used for fitting curves and surfaces to interpolate or smooth data by thin plate spline regression. This is a very useful technique and is often as informative as more complex methods.
The assumed model for Tps is additive. Namely, \[ Y_i = f(\mathbf{X}_i) + \epsilon_i,\]
where \(f(\mathbf{X})\) is a d dimensional surface, and the object is to fit a thin plate spline surface to irregularly spaced data. \(\epsilon_i\) are uncorrelated random errors with zero mean and variances \(\sigma^2 / w_i\).
-This function also works for just a single dimension and is a special case of a spatial process estimate (Kriging). A “fast” version of this function uses a compactly supported Wendland covariance, computing the estimate for a fixed smoothing parameter.
+This function also works for just a single dimension and is a special case of a spatial process estimate (Kriging). A "fast" version of this function uses a compactly supported Wendland covariance, computing the estimate for a fixed smoothing parameter.
\({\large{\mathbf{Basic \> Usage}}}\)
@@ -603,7 +591,7 @@ 3.3 Tps
Returns a list of class Krig/mKrig (see below), which includes the predicted surface of fitted.values and residuals. Also includes the results of the GCV grid search in gcv.grid. Note that the GCV/REML optimization is done even if lambda or df is given. Please see the documentation on Krig for details of the returned arguments.
-Tps is a “wrapper function”, which is why a Krig object is returned. Moreover, any argument that can be used in a call to Krig can be used in Tps. For example, the user can specify lambda or df just as in sreg. As seen in the minimization problem, lambda determines the weight put on the smoothness condition, giving it the same interpretation. lambda is the most important argument for Tps, which is estimated by GCV if omitted. We can also includes covariates in the linear part of the model by passing the argument Z.
+Tps is a "wrapper function", which is why a Krig object is returned. Moreover, any argument that can be used in a call to Krig can be used in Tps. For example, the user can specify lambda or df just as in sreg. As seen in the minimization problem, lambda determines the weight put on the smoothness condition, giving it the same interpretation. lambda is the most important argument for Tps, which is estimated by GCV if omitted. We can also includes covariates in the linear part of the model by passing the argument Z.
Tps and fastTps are special cases of using the Krig and mKrig functions, respectively. The Tps estimator is implemented by passing the right generalized covariance function based on a radial basis function (RBF) to the more general function Krig. One advantage of this implementation is that once a Tps/Krig object is created the estimator can be found rapidly for other data and smoothing parameters provided the locations remain unchanged. This makes simulation within R efficient (see example below).
@@ -669,7 +657,7 @@ 3.5 Confidence intervals with
-Notice how similar the summary is to sreg’s summary. sreg is actually a special case of Tps (as shown below – note that lambda is not equal for the two functions!) The m=2 default for Tps and leaving the data unscaled results in the cubic smoothing spline sreg.
+Notice how similar the summary is to sreg's summary. sreg is actually a special case of Tps (as shown below -- note that lambda is not equal for the two functions!) The m=2 default for Tps and leaving the data unscaled results in the cubic smoothing spline sreg.
# compare sreg and Tps results to show the adjustment to lambda.
predict( fit)-> look
predict( fit.tps, lambda=fit$lambda*fit$N)-> look2
@@ -690,13 +678,14 @@ 3.5 Confidence intervals with
-
+ col=c( 2,2,4,4), lty=2)
+legend("bottomright", legend = c("95 Pct Pointwise", "Simultaneous Bonferroni"), col = c(2,4), lty = 2, cex = 0.8)
+title( "95 Pct Pointwise and Simultaneous Intervals")
+
# or try the more visually honest:
plot( fit.tps$x, fit.tps$y)
lines( fit.tps$predicted, lwd=2)
@@ -709,20 +698,23 @@ 3.5 Confidence intervals with x <- WorldBankCO2[,'Pop.urb']
y <- log10(WorldBankCO2[,'CO2.cap'])
out.fast <- fastTps(x,y,lambda=2, theta=20)
-plot(x,y)
-xgrid<- seq( min(x), max(x),,300)
+
+plot(x,y, xlab = "Urban Percent of Population", ylab = "CO2 Emissions Per Capita")
+
+xgrid<- seq( min(x), max(x), length.out = 300)
fhat.fast <- predict( out.fast,xgrid)
-#se.fast <- predictSE( out.fast)
+se.fast <- predictSE( out.fast, x = xgrid)
+
lines( xgrid, fhat.fast)
-#lines( xgrid, fhat.fast+1.96*se.fast, col=2, lty=2)
-#lines( xgrid, fhat.fast-1.96*se.fast, col=2, lty=2)
-title('fastTps fit')
-
+lines( xgrid, fhat.fast+1.96*se.fast, col=2, lty=2)
+lines( xgrid, fhat.fast-1.96*se.fast, col=2, lty=2)
+title('`fastTps` Fit With 95% Confidence Interval')
+
4 Introduction to Spatial Statistics
-Here we’ll cover the fundamentals of spatial statistics. In particular, we will examine the basic definitions and concepts of the field before giving a geostatistics mini-course. As this is a vignette, we can only scratch the theory behind spatial statistics. A few useful resources are:
+Here we'll cover the fundamentals of spatial statistics. In particular, we will examine the basic definitions and concepts of the field before giving a geostatistics mini-course. As this is a vignette, we can only scratch the theory behind spatial statistics. A few useful resources are:
Statistics for Spatial Data by Noel Cressie
Handbook of Spatial Statistics by Alan E. Gelfand, Peter J. Diggle, Montserrat Fuentes, and Peter Guttorp
@@ -730,15 +722,15 @@ 4 Introduction to Spatial Statist
4.1 The spatial problem
-Let’s return to a previous quilt.plot. Suppose we want to predict the maximum March/April/May temperature at a new location \((-103,39.5)\) in Colorado, which is shown below as the x.
-
+Let's return to a previous quilt.plot. Suppose we want to predict the maximum March/April/May temperature at a new location \((-103,39.5)\) in Colorado, which is shown below as the x.
+
An initial thought may be to use OLS regression with covariates of \(\texttt{longitude}\) and \(\texttt{latitude}\).
-Recall Tobler’s First Law of Geography: “Everything is related to everything else, but near things are more related than distant things.” We see this reflected in the plot above, as temperatures in one area are similar to the temperature in nearby areas. This spatial autocorrelation violates many typical statistical assumptions – our observations are no longer independent! Fortunately, in the field of spatial statistics, we can relax our assumption of independence and work with observations that are spatially dependent.
-To make a “spatial prediction”, we need some way to quantify the change in our observation variable (temperature) as a function of distance.
+Recall Tobler's First Law of Geography: "Everything is related to everything else, but near things are more related than distant things." We see this reflected in the plot above, as temperatures in one area are similar to the temperature in nearby areas. This spatial autocorrelation violates many typical statistical assumptions -- our observations are no longer independent! Fortunately, in the field of spatial statistics, we can relax our assumption of independence and work with observations that are spatially dependent.
+To make a "spatial prediction", we need some way to quantify the change in our observation variable (temperature) as a function of distance.
4.2 Covariance
-A covariance function \(\operatorname{Cov}(Y(\mathbf x), Y(\mathbf x'))\) describes the joint variability between a stochastic process \(Y(\cdot)\) at two locations \(\mathbf{x}\) and \(\mathbf{x}'\). This covariance function is vital in spatial prediction. The fields package includes common parametric covariance families (e.g. exponential and Matern) as well as nonparametric models (e.g. radial and tensor basis functions).
+A covariance function \(\operatorname{Cov}(Y(\mathbf x), Y(\mathbf x'))\) describes the joint variability between a stochastic process \(Y(\cdot)\) at two locations \(\mathbf{x}\) and \(\mathbf{x}'\). This covariance function is vital in spatial prediction. The fields package includes common parametric covariance families (e.g. exponential and Matern) as well as nonparametric models (e.g. radial and tensor basis functions).
When modeling \(\operatorname{Cov}(Y(\mathbf{x}), Y(\mathbf{x}'))\), we are often forced make simplifying assumptions.
Stationarity assumes we can represent the covariance function as \[
@@ -752,7 +744,7 @@ 4.2 Covariance
Matern. \(\operatorname{Cov}(Y(\mathbf x), Y(\mathbf x')) = C(r) = \rho \left( \frac{2^{1-\nu}}{\Gamma(\nu)} \left( \frac{ r}{\theta} \right)^\nu K_\nu \left( \frac{r}{\theta} \right) \right) + \sigma^2 \mathbf{1}_{ \mathbf x = \mathbf x'}\), where \(K_{\nu}\) is a modified Bessel function of the second kind, of order \(\nu\).
Note that the Matern covariance function depends on parameters \((\rho,\theta,\nu,\sigma^2)\), and the Exponential covariance depends on parameters \((\rho,\theta,\sigma^2)\). The parameters \(\rho,\theta,\sigma^2\) and \(\nu\) respectively denote the marginal variance (or sill), range, nugget effect, and smoothness of our process.
-The range \(\theta\) of the process is the distance at which observations become uncorrelated. The sill \(\rho\) is the marginal variance of the spatial process. The nugget effect \(\sigma^2\) corresponds to small-scale variation such as measurement error. The smoothness \(\nu\) corresponds to how “smooth” our spatial process appears. An illustration in the vgram section shows the visual interpretation of these parameters.
+The range \(\theta\) of the process is the distance at which observations become uncorrelated. The sill \(\rho\) is the marginal variance of the spatial process. The nugget effect \(\sigma^2\) corresponds to small-scale variation such as measurement error. The smoothness \(\nu\) corresponds to how "smooth" our spatial process appears. An illustration in the vgram section shows the visual interpretation of these parameters.
For such isotropic covariances, the fields packages uses relies on the function rdist, which is used to calculate the distance between two sets of locations.
4.2.1 rdist
@@ -791,7 +783,7 @@ 4.2.1 rdist
quilt.plot(grd.CO,y.CO)
US(add=TRUE)
-
+
It is easy two find the distance between two points on the quilt.plot. Here, we use rdist.earth since our vectors consist of longitude/latitude pairs.
three.locations <- grd.CO[1:3,]
#location one is (-109.10, 36.90)
@@ -807,7 +799,7 @@ 4.2.1 rdist
4.2.2 Exponential and Matern covariances
The Exponential and Matern functions produce these isotropic covariances in R. The inputs are a matrix of distances d, either the range parameter range (this is \(\theta\) in our notation) or alpha = 1/range, and finally the marginal variance phi (which is \(\rho\) in our notation), and the smoothness nu for the Matern covariance. Note that the exponential covariance is identical to the Matern covariance with smoothnes nu = 0.5.
There are other (generalized) covariances in fields, such as RadialBasis and Wendland, that will be discussed later. To set the nugget variance \(\sigma^2\), we use the sigma2 argument in the functions Krig or mKrig that we also discuss later.
-Below are plots that illustrate the shape of these covariance functions. The Exponential, Matern, and Wendland correlations (marginal variances \(\rho=1\)) behave exactly how one would expect. That is, the covariance between two nearby objects is close to 1, indicating a strong positive correlation among the two observations. The appearance of the Radial-Basis correlation may be shocking – we will return to it later.
+Below are plots that illustrate the shape of these covariance functions. The Exponential, Matern, and Wendland correlations (marginal variances \(\rho=1\)) behave exactly how one would expect. That is, the covariance between two nearby objects is close to 1, indicating a strong positive correlation among the two observations. The appearance of the Radial-Basis correlation may be shocking -- we will return to it later.
\({\large{\mathbf{Basic \> Usage}}}\)
@@ -872,7 +864,7 @@ 4.2.4 Exp.cov and \({\large{\mathbf{Basic \> Usage}}}\)
Exp.cov(x1, x2=NULL, theta = 1, p=1)
-stationary.cov(x1, x2=NULL, Covariance = “Exponential”, Distance = “rdist”, theta = 1)
+stationary.cov(x1, x2=NULL, Covariance = "Exponential", Distance = "rdist", theta = 1)
\({\large{\mathbf{Value}}}\)
If the argument C is NULL, then the (cross-)covariance matrix is returned. If C is a vector of length n, then returned value is the multiplication of the (cross-)covariance matrix with this vector. The C functionality is used internally by all fields Kriging-related functions.
@@ -901,7 +893,7 @@ 4.2.5 Simulating Random Fields (<
Take the Cholesky Decomposition \(\Sigma = L L^T\).
Multiply \(L \boldsymbol{\epsilon}\), where \(\boldsymbol{\epsilon} \sim N(\mathbf 0, I)\).
-The resulting random vector \(L \boldsymbol \epsilon\) is an exact simulation of a Gaussian process. The only problem with this approach is the computations and storage grow rapidly for larger grids. For example, a \(128 \times 128\) image would mean that the dimension of \(\Sigma\) is huge (\(16,000 \times 16,000\)) and effectively prohibit the use of the Cholesky decomposition. For rectangular grids, one can sometimes simulate Gaussian random fields very efficiently using an alternative algorithm called “circulant embedding” that is based on the Fast Fourier Transform. The restrictions are that the covariance function needs to be stationary and the correlation range needs to the small relative to the size of the domain (see Chan and Wood, 1994).
+The resulting random vector \(L \boldsymbol \epsilon\) is an exact simulation of a Gaussian process. The only problem with this approach is the computations and storage grow rapidly for larger grids. For example, a \(128 \times 128\) image would mean that the dimension of \(\Sigma\) is huge (\(16,000 \times 16,000\)) and effectively prohibit the use of the Cholesky decomposition. For rectangular grids, one can sometimes simulate Gaussian random fields very efficiently using an alternative algorithm called "circulant embedding" that is based on the Fast Fourier Transform. The restrictions are that the covariance function needs to be stationary and the correlation range needs to the small relative to the size of the domain (see Chan and Wood, 1994).
The sim.rf function uses circulant embedding to simulate a stationary Gaussian random field (GRF) on a regular grid with unit marginal variance (i.e. \(\rho=1\)). Note that the marginal variance is readily changed by scaling the resulting GRF.
A limitation when using sim.rf presents itself with Error in sim.rf(obj) : FFT of covariance has negative values. This comes from the circulant embedding method used to create the GRF; in short, it occurs when the correlation range is too large. One fix is to increase the domain size so this correlation is then small relative to the size of the domain.
The input of sim.rf is a list that includes information about the covariance function, its FFT, and the grid for evaluation. Usually this is created by a setup call image.cov (i.e. Exp.image.cov, matern.image.cov, stationary.image.cov).
@@ -911,7 +903,7 @@ 4.2.5 Simulating Random Fields (<
sim.rf(obj)
matern.image.cov(ind1, ind2, Y, cov.obj = NULL, setup = FALSE, grid ,theta= 1.0, smoothness=.5)
-Exp.image.cov(ind1, ind2, Y, cov.obj = NULL, setup = FALSE, grid, …)
+Exp.image.cov(ind1, ind2, Y, cov.obj = NULL, setup = FALSE, grid, ...)
\({\large{\mathbf{Value}}}\)
A matrix with the random field values.
@@ -947,7 +939,7 @@ 4.3 Variograms
The definition is readily translated to an empirical variogram. Let \(N(h)\) be the set of pairs of observations \(y_i, y_j\) such that \(\| \mathbf x_i - \mathbf x_j \| = h\). The empirical variogram (vgram) is defined as \[
\widehat{\gamma}(h) = \frac{1}{2 \cdot|N(h)| } \sum_{(i,j) \in N(h)} ( y_i - y_j )^2
\]
-The default “cloud” variogram is done for each distance \(h\), but it is noisy and hard to interpret. Often, we specify a number of bins N in which values for nearby values are averaged. In this way, the choices for a variogram may seem similar to finding a histogram. The breaks argument allows the user to explicitly provide distances at which bins are created.
+The default "cloud" variogram is done for each distance \(h\), but it is noisy and hard to interpret. Often, we specify a number of bins N in which values for nearby values are averaged. In this way, the choices for a variogram may seem similar to finding a histogram. The breaks argument allows the user to explicitly provide distances at which bins are created.
crossCoVGram is the same as vgram but differences are taken across different variables rather than the same variable. boxplotVGram uses the base R boxplot function to display the variogram neatly.
@@ -976,7 +968,7 @@ 4.3 Variograms
ind: a two column matrix giving the x and y increment used to compute the shifts
vgram.full: the variogram at each of these separations
-vgram.robust: Cressie’s version of a robust variogram statistic.
+vgram.robust: Cressie's version of a robust variogram statistic.
@@ -996,9 +988,9 @@ 4.3 Variograms
4.4 Simple Kriging
The additive spatial model is \[
-Y(\mathbf x) = \mathbf Z(\mathbf x) \mathbf{d} + g(\mathbf x) + \epsilon(\mathbf x),\] where \(Y(\cdot)\) is an observation, \(\mathbf Z \mathbf{d}\) is a deterministic (nonrandom) product of covariates \(\mathbf Z\) with a weight vector \(\mathbf d\) that acts as a mean function, \(g(\cdot) \sim N(\mathbf 0, \rho \mathbf K)\) is spatial process, and \(\epsilon(\cdot) \sim N( \mathbf 0, \sigma^2 \mathbf I)\) is error (i.e. white noise).
+Y(\mathbf x) = \mathbf Z(\mathbf x) \mathbf{d} + g(\mathbf x) + \epsilon(\mathbf x),\] where \(Y(\cdot)\) is an observation, \(\mathbf Z \mathbf{d}\) is a deterministic (nonrandom) product of covariates \(\mathbf Z\) with a weight vector \(\mathbf d\) that acts as a mean function, \(g(\cdot) \sim N(\mathbf 0, \rho \mathbf K)\) is spatial process, and \(\epsilon(\cdot) \sim N( \mathbf 0, \sigma^2 \mathbf I)\) is error (i.e. white noise).
Thus, we assume that our observations \(Y(\cdot)\) are Gaussian; in particular, \(\mathbf Y \sim N( \mathbf{Zd}, \rho \mathbf K + \sigma^2 \mathbf I)\)
-For now, we’ll assume \(\mathbf Z\mathbf{d} \equiv \mathbf 0\) and focus on the spatial aspect \(g(\mathbf x)\). This is known as simple kriging, where the mean of the stochastic process is known. Denote our locations of observed points as \(\mathbf x_1,\dots,\mathbf x_n\).
+For now, we'll assume \(\mathbf Z\mathbf{d} \equiv \mathbf 0\) and focus on the spatial aspect \(g(\mathbf x)\). This is known as simple kriging, where the mean of the stochastic process is known. Denote our locations of observed points as \(\mathbf x_1,\dots,\mathbf x_n\).
Suppose we want to predict \(Y(\cdot)\) at a new location \(\mathbf x_0\). The kriging estimate \(\hat Y(\mathbf x_0)\) of our mean zero Gaussian process \(Y(\cdot)\) is given by: \[
\hat Y(\mathbf x_0) = \Sigma_0 \Sigma^{-1} \mathbf Y
\]
@@ -1008,7 +1000,7 @@ 4.4 Simple Kriging
\(\Sigma = \left( \operatorname{Cov}[ Y(\mathbf x_i),Y(\mathbf x_j) ] \right)_{i,j=1}^n\) is the \(n \times n\) covariance matrix of \(Z(\cdot)\) with \(ij\)-entry equal to \(\operatorname{Cov}[Y(\mathbf x_i),Y(\mathbf x_j)]\).
Briefly, when the covariance parameters are known, the Kriging estimate represents the best prediction at \(\mathbf x_0\) based on a linear combination of the observations. By best we mean in terms of mean squared error.
-In the next section, we see an example of how to perform simple kriging. Other types of kriging (i.e. ordinary kriging and universal kriging) are explained in the appendix.
+In the next section, we see an example of how to perform simple kriging. Other types of kriging (i.e. ordinary kriging and universal kriging) are explained in the appendix.
@@ -1029,11 +1021,11 @@ 5.1 Fitting a Variogram
US(add=TRUE)
points(-103,39.5, pch=4,lwd=3,cex=1.25, col='black')

-For now, we’ll assume a zero mean (\(\mathbf Z\mathbf{d} \equiv \mathbf 0\)) and focus on the spatial aspect, which is clearly an incorrect assumption. One can remove the mean trend using least squares, but then this requires universal Kriging instead of simple Kriging. We prefer to start with simple Kriging in this exposition. Details for universal Kriging can be done in the appendix. Denote our locations of observed points as \(\mathbf x_1,\dots,\mathbf x_n\). In this example, \(n=213\) since we have 213 temperature observations.
+For now, we'll assume a zero mean (\(\mathbf Z\mathbf{d} \equiv \mathbf 0\)) and focus on the spatial aspect, which is clearly an incorrect assumption. One can remove the mean trend using least squares, but then this requires universal Kriging instead of simple Kriging. We prefer to start with simple Kriging in this exposition. Details for universal Kriging can be done in the appendix. Denote our locations of observed points as \(\mathbf x_1,\dots,\mathbf x_n\). In this example, \(n=213\) since we have 213 temperature observations.
Suppose we want to predict the temperature \(Y(\cdot)\) at a new location \(\mathbf x_0\). Recall that the simple Kriging estimate \(\hat Y(\mathbf x_0)\) of our mean zero Gaussian process \(Y(\cdot)\) is given by: \[
\hat Y(\mathbf x_0) = \Sigma_0 \Sigma^{-1} \mathbf Y
\]
-Now, the question is how to determine the covariance function of \(Y(\cdot)\). For this example, we’ll consider the (isotropic) exponential covariance: \(\operatorname{Cov}[ Y(\mathbf x), Y(\mathbf x')] = \rho e^{ - \| \mathbf x - \mathbf x'\|/\theta}\). Here, \(\theta\) is a range parameter that corresponds to the length of spatial autocorrelation of our process, and \(\rho\) is the overall variance of the process.
+Now, the question is how to determine the covariance function of \(Y(\cdot)\). For this example, we'll consider the (isotropic) exponential covariance: \(\operatorname{Cov}[ Y(\mathbf x), Y(\mathbf x')] = \rho e^{ - \| \mathbf x - \mathbf x'\|/\theta}\). Here, \(\theta\) is a range parameter that corresponds to the length of spatial autocorrelation of our process, and \(\rho\) is the overall variance of the process.
The covariance matrix \(\Sigma\) thus has \(i,j\)-th entry \[
\Sigma_{i,j} = e^{ - \| \mathbf x_i - \mathbf x_j\|/\theta} + \sigma^2 \mathbf{1}_{ \{ \mathbf x_i = \mathbf x_j \} },
\] where \(\mathbf{1}_{ \{ \mathbf x_i = \mathbf x_j \} }= \begin{cases} 1 & \mathbf x_i = \mathbf x_j \\ 0 & \text{else} \end{cases}\)
@@ -1141,7 +1133,7 @@ 5.2 Kriging predictor
title("Elevation in CO")
US(add=TRUE,col='grey')

-The elevation plot is a near inverse of our prediction grid – this agrees with our intuition that there are lower temperatures at higher elevations. If we include a covariate Z corresponding to elevation, then we can produce even more accurate predictions (a sneak-peek is shown below).
+The elevation plot is a near inverse of our prediction grid -- this agrees with our intuition that there are lower temperatures at higher elevations. If we include a covariate Z corresponding to elevation, then we can produce even more accurate predictions (a sneak-peek is shown below).

@@ -1181,7 +1173,7 @@ 6 spatialProcess
However, we want to make it clear that the spatial polynomial \(P(\cdot)\) is (by default) included in the kriging estimate, while other covariates \(\mathbf{Z}\) must be specified.
-spatialProcess represents one-stop shopping for fitting a spatial model in fields. It was designed so that users are able to quickly fit models and visualize them. We also add one caution: it hides several default model choices from the user. After a covariance model is chosen, a grid search is performed to find the optimal values of the parameters theta, rho, and sigma2, and then the spatial model is computed with these estimated parameters. Any other covariance parameters (e.g. the smoothness) need to be specified. The default call uses the covariance function stationary.cov, specifically with the Matern and smoothness nu=1. The optimization is done by MLESpatialProcess.
+spatialProcess represents one-stop shopping for fitting a spatial model in fields. It was designed so that users are able to quickly fit models and visualize them. We also add one caution: it hides several default model choices from the user. After a covariance model is chosen, a grid search is performed to find the optimal values of the parameters theta, rho, and sigma2, and then the spatial model is computed with these estimated parameters. Any other covariance parameters (e.g. the smoothness) need to be specified. The default call uses the covariance function stationary.cov, specifically with the Matern and smoothness nu=1. The optimization is done by MLESpatialProcess.
A good way to think about the spatial model functions in fields is in the context of engines and wrappers. The two engines in fields which perform the Kriging computations are Krig and mKrig. Krig is a more numerically stable algorithm: if a covariance matrix is close to singular, mKrig will fails whereas Krig may still be able to perform the calculations. On the other hand, mKrig is set up to handle large data sets with sparse covariance matrices by using the SPAM package.
A wrapper is an easy to use function that is based on an underlying function or engine. Often default parameters values are set to make the function accessible to the user. The function Tps is a wrapper on Krig. Tps performs thin plate spline regression, which is a special case of Kriging, so it makes sense to use the Krig engine. On the other hand, the functions fastTps and spatialProcess are wrappers for mKrig. See the chapters and examples on these functions for specifics.
There is always a hazard in providing a simple to use method that makes many default choices for the spatial model. As in any analysis be aware of these choices and try alternative models and parameter values to assess the robustness of your conclusions. Also examine the residuals to check the adequacy of the fit!
@@ -1189,9 +1181,9 @@ 6 spatialProcess
\({\large{\mathbf{Basic \> Usage}}}\)
-spatialProcess(x, y, Z = NULL, cov.function = “stationary.cov”,
-\(\> \> \> \quad\) cov.args = list(Covariance = “Matern”, smoothness = 1), theta = NULL)
-Krig(x, Y, theta, Covariance=“Matern”, smoothness, Distance=“rdist”)
+spatialProcess(x, y, Z = NULL, cov.function = "stationary.cov",
+\(\> \> \> \quad\) cov.args = list(Covariance = "Matern", smoothness = 1), theta = NULL)
+Krig(x, Y, theta, Covariance="Matern", smoothness, Distance="rdist")
\({\large{\mathbf{Value}}}\)
spatialProcess returns object of classes mKrig and spatialProcess. The main difference from mKrig is an extra component MLEInfo that has the results of the grid evaluation over theta (maximizing lambda), joint maximization over theta and lambda, and a grid evaluation over lambda with theta fixed at its MLE. The Krig function produces an object of class Krig.
@@ -1241,7 +1233,7 @@ 6.1 Analysis of soil pH
## gradEval
## 1.00000000
The summary shows the value of the profile log likelihood function, the estimates for the parameters, and the number of evaluations needed in the optimization.
-We can use plot to view diagnostic plots of the fit. The first three plots are the same as sreg and mKrig objects. Plot 1 shows data vs. predicted values, and plot 2 shows predicted values vs. residuals. Plot 3 shows the criteria to select the smoothing parameter \(\lambda = \sigma^2 / \rho\). The x axis has transformed \(\lambda\) in terms of effective degrees of freedom to make it easier to interpret. Note that here the GCV function is minimized while the REML is maximized.
+We can use plot to view diagnostic plots of the fit. The first three plots are the same as sreg and mKrig objects. Plot 1 shows data vs. predicted values, and plot 2 shows predicted values vs. residuals. Plot 3 shows the criteria to select the smoothing parameter \(\lambda = \sigma^2 / \rho\). The x axis has transformed \(\lambda\) in terms of effective degrees of freedom to make it easier to interpret. Note that here the GCV function is minimized while the REML is maximized.
One of the main features of spatialProcess is the ability to optimize over theta as well as the usual lambda/rho/sigma combination. For this reason, plot 4 shows the profile likelihood versus values of theta instead of a histogram of the standard residuals displayed in the plots of other class objects
set.panel(2,2)
plot(fit)
@@ -1259,7 +1251,7 @@ 6.2 Analysis of Coal Ash
fit <- spatialProcess(x, y)
surface(fit,col=topo.colors(100))

-Note that surface displays the full model, except Z covariates if they are included; i.e. by default Z is dropped. To additionally include the Z covariates, one can use predict or predictSurface. In these functions, the default is to retain Z in the computation.
+Note that surface displays the full model, except Z covariates if they are included; i.e. by default Z is dropped. To additionally include the Z covariates, one can use predict or predictSurface. In these functions, the default is to retain Z in the computation.
We can use our spatial model to predict on a grid using predict, or predictSurface which is more convenient for plotting.
grid.list <- list(x=seq(0,17), y=seq(0,24) )
pred.grid <- predict(fit, grid=grid.list)
@@ -1310,7 +1302,7 @@ 6.2 Analysis of Coal Ash
6.3 ozone2
-Now we consider the ozone2 dataset in fields, which consists of CO2 observations over 89 days at a set of 153 locations. We’ll restrict ourselves to day 16 (June 18, 1987) and remove all NA values. We fit several covariance models: Matern, exponential, and Wendland. We first visually inspect the different surfaces, then we compare the parameters of the three models using summary.
+Now we consider the ozone2 dataset in fields, which consists of CO2 observations over 89 days at a set of 153 locations. We'll restrict ourselves to day 16 (June 18, 1987) and remove all NA values. We fit several covariance models: Matern, exponential, and Wendland. We first visually inspect the different surfaces, then we compare the parameters of the three models using summary.
data( ozone2)
x<- ozone2$lon.lat
y<- c(ozone2$y[16,])
@@ -1334,7 +1326,7 @@ 6.3 ozone2
cov.args = list(Covariance = "Wendland",
dimension = 2, k = 2) )
rbind(obj$summary,obj2$summary,obj3$summary)
-Since the exponential is a Matern covariance with \(\nu = 0.5\), the first two fits can be compared in terms of their likelihoods. The REML value is slightly higher for obj verses obj2 (\(598.4 > 596.7\)). These are the negative log likelihoods so this suggests a preference for the exponential model. But… does it really matter in terms of spatial prediction?
+Since the exponential is a Matern covariance with \(\nu = 0.5\), the first two fits can be compared in terms of their likelihoods. The REML value is slightly higher for obj verses obj2 (\(598.4 > 596.7\)). These are the negative log likelihoods so this suggests a preference for the exponential model. But... does it really matter in terms of spatial prediction?
library(sp) #for colors
set.panel(1,3)
surface(obj, col=heat.colors(100),smallplot= c(.88,.9,0.2,.8))
@@ -1378,7 +1370,7 @@ 6.3 ozone2
6.4 COmonthlyMet
-We’ll return to the dataset COmonthlyMet. When using other fields functions for Kriging such as Krig and mKrig, the user has to pass in a range theta, which can be initally guessed for a quick fit by looking at a variogram.
+We'll return to the dataset COmonthlyMet. When using other fields functions for Kriging such as Krig and mKrig, the user has to pass in a range theta, which can be initally guessed for a quick fit by looking at a variogram.
Note that the estimated theta without including the Z covariate CO.elev will probably change if do we include Z.
Ultimately, using a variogram to determine spatial parameters for our covariance is not a reliable method. A more rigorous way to determine parameters is via Maximum Likelihood, which is performed within spatialProcess. To be precise, the parameters theta, rho, and sigma2 are estimated using Maximum Likelihood, and the user only needs to input x, y, and a type of covariance. (Note: If a Matern covariance is used, then smoothness must be supplied. See below how one might select between different smoothness values.)
Of course, spatialProcess runs slower than the functions would with a user-supplied theta. One must take caution when using spatialProcess with large datasets. Below, we run through this climate example using spatialProcess to construct our model.
@@ -1411,7 +1403,7 @@ 6.4 COmonthlyMet
-Observe the largely different values for parameters, but similar values for log-likelihood. Note that there is an interplay between smoothness and range, so directly comparing theta is not appropriate. Will the underlying spatial process be much different if we vary the smoothness? We’ll use the drop.Z command to make the differences in the spatial process clear.
+Observe the largely different values for parameters, but similar values for log-likelihood. Note that there is an interplay between smoothness and range, so directly comparing theta is not appropriate. Will the underlying spatial process be much different if we vary the smoothness? We'll use the drop.Z command to make the differences in the spatial process clear.
pred0.5 <- predictSurface.Krig( out0.5,grid.list=CO.Grid,ZGrid=CO.elevGrid,drop.Z=TRUE)
pred1 <- predictSurface.Krig( out1,grid.list=CO.Grid,ZGrid=CO.elevGrid,drop.Z=TRUE)
@@ -1546,7 +1538,7 @@ 7.3 grid.list, \({\large{\mathbf{Basic \> Usage}}}\)
make.surface.grid(grid.list)
-as.surface(obj, z, order.variables=“xy”)
+as.surface(obj, z, order.variables="xy")
\({\large{\mathbf{Value}}}\)
The result will be a list with x, y, and z components suitable for plotting with functions such as persp, image, surface, contour, drape.plot.
@@ -1585,12 +1577,12 @@ 7.3 grid.list, Using surface combines image.plot with contour as its default.
surface(look.surface)

-There is a graphics function persp which can produce 3-d plots. drape.plot is a wrapper function in fields that automatically adds color based on how it’s done in image.plot/surface.
+There is a graphics function persp which can produce 3-d plots. drape.plot is a wrapper function in fields that automatically adds color based on how it's done in image.plot/surface.
set.panel(1,2)
persp(look.surface)
drape.plot(look.surface)

-In summary, we fit a spatialProcess object from our data, created a “grid.list”, constructed a full set of gridded locations using the “grid.list,” found Kriging estimates at the gridded locations, converted back to a “grid.list”, and then plotted with surface. Compare this with the convenient surface.mKrig.
+In summary, we fit a spatialProcess object from our data, created a "grid.list", constructed a full set of gridded locations using the "grid.list," found Kriging estimates at the gridded locations, converted back to a "grid.list", and then plotted with surface. Compare this with the convenient surface.mKrig.
fit <- spatialProcess(RMprecip$x, RMprecip$y, theta=20)
surface(fit)

@@ -1603,7 +1595,7 @@ 7.4 surface
\({\large{\mathbf{Basic \> Usage}}}\)
-surface( object, grid.list = NULL, extrap = FALSE, type=“C”)
+surface( object, grid.list = NULL, extrap = FALSE, type="C")
@@ -1747,8 +1739,8 @@ 8.3 drape.plot and <
\({\large{\mathbf{Basic \> Usage}}}\)
drape.plot(x, y, z, z2=NULL, col = tim.colors(64), zlim = range(z, na.rm=TRUE), zlim2 = NULL)
-drape.color(z, col = tim.colors(64), zlim = NULL,breaks,transparent.color = “white”)
-pushpin( x,y,z,p.out, height=.05,col=“black”,text=NULL,adj=-.1,cex=1.0,…)
+drape.color(z, col = tim.colors(64), zlim = NULL,breaks,transparent.color = "white")
+pushpin( x,y,z,p.out, height=.05,col="black",text=NULL,adj=-.1,cex=1.0,...)
\({\large{\mathbf{Value}}}\)
If an assignment is made, the projection matrix from persp is returned. This information can be used to add additional 3-d features to the plot, such as pushpin. See the persp help file for an example how to add additional points and lines using the trans3d function and also the example below.
@@ -1796,17 +1788,17 @@ 8.4 Color palettes
tim.colors gives a very nice spectrum that is close to the jet color scale in MATLAB® and is the default for image.plot and the like.
larry.colors is particularly useful for visualizing fields of climate variables.
two.colors is really about three different colors. For other colors try fields.color.picker to view possible choices. start="darkgreen", end="azure4" are the options used to get a nice color scale for rendering aerial photos of ski trails. (http://www.image.ucar.edu/Data/MJProject)
-designer.colors is the master function for two.colors and tim.colors. It can be useful if one wants to customize the color table to match quantiles of a distribution. e.g. if the median of the data is at .3 with respect to the range then set x equal to c(0,.3,1) and specify three colors to provide a transtion that matches the median value. In fields language, this function interpolates between a set of colors at locations x. While you can be creative about choosing these colors, just using another color scale as the basis is easy. For example, designer.color( 256, rainbow(4), x= c( 0,.2,.8,1.0)) leaves the choice of the colors to Dr. R after a thunderstorm.
+designer.colors is the master function for two.colors and tim.colors. It can be useful if one wants to customize the color table to match quantiles of a distribution. e.g. if the median of the data is at .3 with respect to the range then set x equal to c(0,.3,1) and specify three colors to provide a transtion that matches the median value. In fields language, this function interpolates between a set of colors at locations x. While you can be creative about choosing these colors, just using another color scale as the basis is easy. For example, designer.color( 256, rainbow(4), x= c( 0,.2,.8,1.0)) leaves the choice of the colors to Dr. R after a thunderstorm.
color.scale assigns colors to a numerical vector in the same way as the image function. This is useful to keep the assigment of colors consistent across several vectors by specifiying a common zlim range. Finally, plotColorScale is a simple function to plot a vector of colors to examine their values.
-Also, don’t forget the built in heat.colors, topo.colors, terrain.colors, grey.colors, etc.
+Also, don't forget the built in heat.colors, topo.colors, terrain.colors, grey.colors, etc.
\({\large{\mathbf{Basic \> Usage}}}\)
tim.colors(n = 64, alpha=1.0)
larry.colors()
-two.colors(n=256, start=“darkgreen”, end=“red”, middle=“white”,alpha=1.0)
-designer.colors( n=256, col= c(“darkgreen”, “white”, “darkred”), x=seq(0,1,, length(col)) ,alpha=1.0)
+two.colors(n=256, start="darkgreen", end="red", middle="white",alpha=1.0)
+designer.colors( n=256, col= c("darkgreen", "white", "darkred"), x=seq(0,1,, length(col)) ,alpha=1.0)
\({\large{\mathbf{Value}}}\)
A vector giving the colors in a hexadecimal format, two extra hex digits are added for the alpha channel.
@@ -1839,7 +1831,7 @@ 8.5 interp.surface
We recommend this method for fast visualization and presentation of images. However, if the actual interpolated values will be used for analysis, use a statistical method such as Tps or fastTps to get an interpolation that is based on first principles.
Here is a brief explanation of the interpolation: Suppose that the location (locx, locy) lies in between the first two grid points in both x and y. That is, locx is between x1 and x2 and locy is between y1 and y2. Let \(e_x= (l_1-x_1)/(x_2-x_1)\) and \(e_y= (l_2-y_1)/(y_2-y_1)\). The interpolant is
\[(1-e_x)(1-e_y)z_{11} + (1-e_x)(e_y)z_{12} + (e_x)(1-e_y)z_{21} + (e_x)(e_y)z_{22}\]
-where the z’s are the corresponding elements of the Z matrix. See also the akima package for fast interpolation from irregular locations.
+where the z's are the corresponding elements of the Z matrix. See also the akima package for fast interpolation from irregular locations.
\({\large{\mathbf{Basic \> Usage}}}\)
@@ -1884,7 +1876,7 @@ 9 Covariance Models Revisited
9.1 Rad.cov, Rad.cov.simple, cubic.cov, and wendland.cov
Radial basis functions are the natural covariance that comes out of the thin plate spline minimization problem (see the section Tps Revisited). The functional form is the following: \[\operatorname{Cov}(Y(\mathbf x), Y(\mathbf x')) = C(r) =c_0(m,d) *\begin{cases} r^{2m-d}, & d \text{ odd} \\ r^{2m-d} \ln(r), & d \text{ even} \end{cases}\] where \(r = \| \mathbf x - \mathbf x' \|\) and \(c_0(m,d)\) is a multiplicative constant depending on \(m\) and \(d\).
-The argument d is inferred from the dimension of the input data. If m isn’t specified by the user, by default it is set to the smallest integer such that \(p = 2m-d\) is positive. Alternatively, the user can specify p.
+The argument d is inferred from the dimension of the input data. If m isn't specified by the user, by default it is set to the smallest integer such that \(p = 2m-d\) is positive. Alternatively, the user can specify p.
Note that these RBFs are generalized covariance functions, which means they are only valid covariances when restricted to a subspace of linear combinations of the field. The function Rad.simple.cov is a straightforward implementation in R code (computes fewer checks on the data). It can be used like Exp.cov to write a new covariance function.
cubic.cov is a specific case of Rad.cov where we set d=1 and m=1.
All of these covariance functions can be equipped to take advantage of sparsity in the covariance matrix by using the SPAM package. The functions Wendland and wendland.cov allow easy use of the Wendland compactly supported polynomials as covariance functions. This will output a sparse covariance matrix, which helps shorten the computation time when working with large datasets. The main argument (besides location) to in the Wendland function is theta, which by default is set to 1. This is the taper range, or the radius of support of the function. The Wendland function is identically 0 for distances greater than theta. If the user would like to scale each coordinate independently, they can provide a matrix or the diagonal of a matrix for the argument V, which is the inverse linear transformation of the coordinates before distances are found. Another parameter that can be chosen is k, the order of the Wendland polynomial, which determines the smoothness.
@@ -1900,8 +1892,8 @@ 9.2 stationary.taper.covRad.cov(x1, x2, p = 1, m=NA)
Rad.simple.cov(x1, x2, p=1)
cubic.cov(x1, x2, theta = 1)
-stationary.cov(x1, x2=NULL, Covariance = “Exponential”, Distance = “rdist”,theta = 1, V = NULL)
-stationary.taper.cov(x1, x2, Covariance=“Exponential”,Taper=“Wendland”,theta=1.0,V=NULL)
+stationary.cov(x1, x2=NULL, Covariance = "Exponential", Distance = "rdist",theta = 1, V = NULL)
+stationary.taper.cov(x1, x2, Covariance="Exponential",Taper="Wendland",theta=1.0,V=NULL)
wendland.cov(x1, x2, theta = 1, V=NULL, k = 2)
\({\large{\mathbf{Value}}}\)
@@ -1986,7 +1978,7 @@ 9.4 stationary.taper.cov
10 mKrig
-The mKrig function is both a Kriging/linear algebra engine and also user-level function for fitting spatial models. The mKrig engine is a simple and fast version of Krig. The “m” stands for micro, being a succinct (and we hope readable) function. The function focuses on the same computations as in Krig.engine.fixed done for a fixed lambda parameter, for unique spatial locations and for data without missing values. mKrig is optimized for large data sets, sparse linear algebra, and a clear exposition of the computations. The underlying code in mKrig is more readable and straightforward, but it does fewer checks on the data than Krig. One savings in computation is that Monte Carlo simulations are used to compute the trace of the smoothing matrix \(\operatorname{tr}(A(\lambda))\), which can be used for the effective degrees of freedom of the model.
+The mKrig function is both a Kriging/linear algebra engine and also user-level function for fitting spatial models. The mKrig engine is a simple and fast version of Krig. The "m" stands for micro, being a succinct (and we hope readable) function. The function focuses on the same computations as in Krig.engine.fixed done for a fixed lambda parameter, for unique spatial locations and for data without missing values. mKrig is optimized for large data sets, sparse linear algebra, and a clear exposition of the computations. The underlying code in mKrig is more readable and straightforward, but it does fewer checks on the data than Krig. One savings in computation is that Monte Carlo simulations are used to compute the trace of the smoothing matrix \(\operatorname{tr}(A(\lambda))\), which can be used for the effective degrees of freedom of the model.
Just as in other functions, we can use cov.function and cov.args to specify which covariance we want to use in the model. mKrig is often used with a compactly supported covariance to take advantage of sparsity in computations. See fastTps as an example of how it uses mKrig.
There are other arguments that can be specified in chol.args for efficiency when dealing with a large data set. For example, nnzR is a guess at how many nonzero entries there are in the Cholesky decomposition of the linear system used to solve for the basis coefficients. Specifying this to increase size will help to avoid warnings from the spam package.
mKrig.trace is an internal function called by mKrig to estimate the effective degrees of freedom. The Kriging surface estimate at the data locations is a linear function of the data and can be represented as \(A(\lambda)y\).
@@ -1998,7 +1990,7 @@ 10 mKrig
\({\large{\mathbf{Basic \> Usage}}}\)
-mKrig(x, y, Z = NULL, cov.function = “stationary.cov”, lambda = 0, m = 2)
+mKrig(x, y, Z = NULL, cov.function = "stationary.cov", lambda = 0, m = 2)
\({\large{\mathbf{Value}}}\)
Returns an object of class mKrig.
@@ -2045,7 +2037,7 @@ 10.2 COmonthlyMet
10.3 flame
The flame data consists of a 2 column matrix x with different fuel and oxygen flow rates for the burner, and vector y with the intensity of light at a particular wavelength indicative of Lithium ions. The characteristics of an ionizing flame are varied with the intent of maximizing the intensity of emitted light for lithuim in solution. Areas outside of the measurements are where the mixture may explode! Note that the optimum is close to the boundary.
-This example shows the versatility of fields, fitting Krig, spatialProcess, and mKrig models to the data. Note how much faster mKrig is, and how even with a Wendland it is able to reproduce the other fits’ behaviours fairly well.
+This example shows the versatility of fields, fitting Krig, spatialProcess, and mKrig models to the data. Note how much faster mKrig is, and how even with a Wendland it is able to reproduce the other fits' behaviours fairly well.
x <- flame$x
y <- flame$y
look <- as.image(Z=y, x=x)
@@ -2062,7 +2054,7 @@ 10.3 flame
10.4 Krig.replicates with NWSC
-We now consider the NWSC data set. The data we consider are the temperature and relative humidity as our spatial indexing variables, and our response is power usage. The observations aren’t at unique locations, so to use mKrig or spatialProcess, we will have to collapse the observations to unique locations with the function Krig.replicates.
+We now consider the NWSC data set. The data we consider are the temperature and relative humidity as our spatial indexing variables, and our response is power usage. The observations aren't at unique locations, so to use mKrig or spatialProcess, we will have to collapse the observations to unique locations with the function Krig.replicates.
library(sp) #for colors
load("NWSC.rda")
x <- cbind(NWSC$RH, NWSC$Otemp)
@@ -2097,7 +2089,7 @@ 10.5 mKrig and a GCV
x<- ozone2$lon.lat[good,]
mKrig( x,y,cov.function="stationary.taper.cov",theta = 2.0, lambda=.01,
Taper="Wendland", Taper.args=list(theta = 1.5, k=2, dimension=2)) -> out2
-Now, we will compare many lambda’s on a grid using GCV for this small data set. One should really just use Krig or Tps, but this is an example of approximate GCV that will work for much larger data sets using sparse covariances and the Monte Carlo trace estimate.
+Now, we will compare many lambda's on a grid using GCV for this small data set. One should really just use Krig or Tps, but this is an example of approximate GCV that will work for much larger data sets using sparse covariances and the Monte Carlo trace estimate.
lgrid<- 10**seq(-1,1,,15) # a grid of lambdas
GCV<- matrix( NA, 15,20)
trA<- matrix( NA, 15,20)
@@ -2159,7 +2151,7 @@ 10.6 Collapse fixed effects with
## [8,] -525.6700 0.3666253 24.196703 5.337032 77.69217
## [9,] -529.7651 0.5806189 18.395516 5.868205 59.30883
## [10,] -540.6156 0.3602721 147.142371 7.314397 148.50003
-Names of summary: “lnProfileLike.FULL”, “lambda”, “theta”, “sigmaMLE”, “rhoMLE”, “funEval”, “gradEval”. We left out the last two columns in the previous output. It can be interesting to visualize the results as pairwise scatter plots for individual estimates.
+Names of summary: "lnProfileLike.FULL", "lambda", "theta", "sigmaMLE", "rhoMLE", "funEval", "gradEval". We left out the last two columns in the previous output. It can be interesting to visualize the results as pairwise scatter plots for individual estimates.
plot(O3MLE[,2], O3MLE[,3], log = 'xy', xlab = "lambda", ylab = "theta")
title("theta vs lambda for 10 days")

@@ -2176,7 +2168,7 @@ 10.6 Collapse fixed effects with
mKrig.coef finds the d and c coefficients representing the solution using the previous cholesky decomposition for a new data vector. This is used in computing the prediction standard error in predictSE.mKrig and can also be used to evalute the estimate efficiently at new vectors of observations provided the locations and covariance remain fixed.
-Since this data set has NA’s, we remove any columns (observation locations) with an NA. Beware of blindly removing data like this - we are just illustrating the mKrig.coef functionality.
+Since this data set has NA's, we remove any columns (observation locations) with an NA. Beware of blindly removing data like this - we are just illustrating the mKrig.coef functionality.
goody <- na.omit(t(ozone2$y))
omitted <- attr(goody, "na.action")
x <- ozone2$lon.lat[-omitted,]
@@ -2209,7 +2201,7 @@ 10.7 Approximate standard errors
O3.fit<- mKrig( x,y, Covariance="Matern", theta=.5,smoothness=1.0, lambda= .01 )
-Now, we’ll look at some approximate conditional simulations of our random field and the missing data.
+Now, we'll look at some approximate conditional simulations of our random field and the missing data.
set.seed(122)
O3.sim<- sim.mKrig.approx( O3.fit, nx=100, ny=100, gridRefinement=3, M=2 )
@@ -2253,17 +2245,17 @@ 10.8 Finding taper range
## To avoid the iteration, increase 'nearestdistnnz' option to something like
## 'spam.options(nearestdistnnz=c(923230,400))'
## (constructed 1213 lines out of 1500).
-The warning resets the memory allocation for the covariance matrix according the to values spam.options(nearestdistnnz=c(416052,400)). This is inefficient because the preliminary pass failed. The following call completes the computation in “one pass” without a warning and without having to reallocate more memory.
-spam.options(nearestdistnnz=c(4.2E5,400))
+The warning resets the memory allocation for the covariance matrix according the to values spam.options(nearestdistnnz=c(416052,400)). This is inefficient because the preliminary pass failed. The following call completes the computation in "one pass" without a warning and without having to reallocate more memory.
+options(spam.nearestdistnnz=c(4.2E5,400))
look2<- mKrig( x,y, cov.function="wendland.cov",k=2, theta=.9, lambda=1e-2)
-## Warning in nearest.dist(structure(c(0.963543639518321, 0.0860745231620967, : You ask for a 'dense' spase distance matrix, I require one more iteration.
+## Warning in nearest.dist(structure(c(0.963543639518321, 0.0860745231620967, : You ask for a 'dense' sparse distance matrix, I require one more iteration.
## To avoid the iteration, increase 'nearestdistnnz' option to something like
-## 'spam.options(nearestdistnnz=c(923230,400))'
+## 'options(spam.nearestdistnnz=c(600000,400))'
## (constructed 966 lines out of 1500).
-## Warning in nearest.dist(structure(c(0.963543639518321, 0.0860745231620967, : You ask for a 'dense' spase distance matrix, I require one more iteration.
+## Warning in nearest.dist(structure(c(0.963543639518321, 0.0860745231620967, : You ask for a 'dense' sparse distance matrix, I require one more iteration.
## To avoid the iteration, increase 'nearestdistnnz' option to something like
-## 'spam.options(nearestdistnnz=c(923230,400))'
+## 'options(spam.nearestdistnnz=c(600000,400))'
## (constructed 966 lines out of 1500).
As a check notice that print(look2) reports the number of nonzero elements consistent with the specifc allocation increase in spam.options.
Now we generate a new data set of 1500 locations.
@@ -2340,7 +2332,7 @@ 10.9 Using V,
11 Tps and fastTps
11.1 Tps and Krig
-We return to Tps, but this time consider spatial data. The Tps estimate can be thought of as a special case of a spatial process estimate using a particular generalized covariance function. Because of this correspondence the Tps function is a wrapper that determines the polynomial order for the null space and then calls Krig with the radial basis generalized covariance rad.cov. The RBF is the reproducing kernel that generates the reproducing kernel Hilbert space in which the minimization problem is set. See Wahba’s Spline Models for Observational Data.
+We return to Tps, but this time consider spatial data. The Tps estimate can be thought of as a special case of a spatial process estimate using a particular generalized covariance function. Because of this correspondence the Tps function is a wrapper that determines the polynomial order for the null space and then calls Krig with the radial basis generalized covariance rad.cov. The RBF is the reproducing kernel that generates the reproducing kernel Hilbert space in which the minimization problem is set. See Wahba's Spline Models for Observational Data.
In this way Tps requires little extra coding beyond the Krig function itself, and in fact the returned list from Tps is a Krig object and so it is adapted to all the special functions developed for this object format. One downside is some confusion in the summary of the thin plate spline as a spatial process type estimator. For example, the summary refers to a covariance model and maximum likelihood estimates. This is not something usually reported with a spline fit! Similarly, fastTps is a convenient wrapper function that calls mKrig with the Wendland covariance function.
@@ -2350,7 +2342,7 @@ 11.1 Tps and K
fastTps(x, Y, m = NULL, p = NULL, theta, lon.lat=FALSE, lambda=0)
\({\large{\mathbf{Value}}}\)
-Both functions returns a list of class Krig. m determines the degree (m-l) of the polynomial function capturing the spatial drift/trend component of the model. The default is the value such that 2m-d is greater than zero where d is the dimension of x. p is the polynomial power for Wendland radial basis functions. The default is 2m-d. theta is the tapering range that is passed to the Wendland compactly supported covariance. The covariance (i.e. the radial basis function) is zero beyond range theta. The larger theta is, the closer this model will approximate the standard thin plate spline.
+Both functions returns a list of class Krig. m determines the degree (m-l) of the polynomial function capturing the spatial drift/trend component of the model. The default is the value such that 2m-d is greater than zero where d is the dimension of x. p is the polynomial power for Wendland radial basis functions. The default is 2m-d. theta is the tapering range that is passed to the Wendland compactly supported covariance. The covariance (i.e. the radial basis function) is zero beyond range theta. The larger theta is, the closer this model will approximate the standard thin plate spline.
Some care needs to exercised in specifying the taper range theta and power p which describes the polynomial behavior of the Wendland at the origin. Note that unlike Tps the locations are not scaled to unit range and this can cause havoc in smoothing problems with variables in very different units. So rescaling the locations x <- scale(x) is a good idea for putting the variables on a common scale for smoothing. This function does have the potential to approximate estimates of Tps for very large spatial data sets. Also, the function predictSurface.fastTps has been made more efficient for the case of k = 2 and m = 2.
@@ -2496,11 +2488,11 @@ 12.2 REML
\ell(\mathbf{d},\boldsymbol{\xi}) \propto -\frac{1}{2} \left( \log( \det(\Sigma)) + (\mathbf Y - \mathbf{Zd})^T \Sigma^{-1} (\mathbf Y - \mathbf{Zd}) \right)
\]
For any fixed \(\boldsymbol \xi\), the value of \(\mathbf d\) that maximizes \(\ell\) is the Generalized Least Squares (GLS) estimate \(\hat{\mathbf d} = (\mathbf{Z}^T \Sigma^{-1} \mathbf{Z})^{-1} \mathbf{Z}^T \Sigma^{-1} \mathbf{Y}\).
-To maximize \(\ell\), one often “profiles” and eliminates \(\mathbf{d}\) in the equation for \(\ell\) by substituting \(\mathbf{d} = \hat {\mathbf d}\). Then \(\ell(\mathbf{d},\boldsymbol \xi)=\ell(\boldsymbol \xi)\) only depends on \(\boldsymbol \xi\). Numerical optimization (nonlinear methods like BFGS) is used to find the remaining parameters, and then the GLS estimate \(\hat {\mathbf d}\) is updated with the optimal parameter values. (See Handbook of Spatial Statistics, p. 46).
+To maximize \(\ell\), one often "profiles" and eliminates \(\mathbf{d}\) in the equation for \(\ell\) by substituting \(\mathbf{d} = \hat {\mathbf d}\). Then \(\ell(\mathbf{d},\boldsymbol \xi)=\ell(\boldsymbol \xi)\) only depends on \(\boldsymbol \xi\). Numerical optimization (nonlinear methods like BFGS) is used to find the remaining parameters, and then the GLS estimate \(\hat {\mathbf d}\) is updated with the optimal parameter values. (See Handbook of Spatial Statistics, p. 46).
However, this process of Maximum Likelihood Estimation (MLE) tends to give biased results, as \[
(\mathbf{Y} - \mathbf{Z} \hat{\mathbf{d}})^T \Sigma^{-1} (\mathbf Y - \mathbf{Z} \hat {\mathbf{d}})
\le (\mathbf{Y}-\mathbf{Zd}) \Sigma^{-1} (\mathbf Y - \mathbf{Zd}), \quad \forall \mathbf{d}.
-\] Intuitively, this means that \(\mathbf{Z} \hat{\mathbf d}\) is “closer” to \(\mathbf{Y}\) than any other estimate \(\mathbf{Zd}\), even if \(\mathbf{d}\) is the true parameter! As a result, the MLE for \(\boldsymbol \xi\) is biased due to this simultaneous estimation of \(\mathbf{d}\). In particular, the MLE for \(\boldsymbol \xi\) tends to underestimate the process variance.
+\] Intuitively, this means that \(\mathbf{Z} \hat{\mathbf d}\) is "closer" to \(\mathbf{Y}\) than any other estimate \(\mathbf{Zd}\), even if \(\mathbf{d}\) is the true parameter! As a result, the MLE for \(\boldsymbol \xi\) is biased due to this simultaneous estimation of \(\mathbf{d}\). In particular, the MLE for \(\boldsymbol \xi\) tends to underestimate the process variance.
This leads to Restricted Maximum Likelihood (REML), which attempts to reduce this bias when using maximum likelihood methods. We use a linear transformation \(\mathbf Y^* = \mathbf {AY}\) of the data so that the distribution of \(\mathbf{Y}^*\) does not depend on \(\mathbf{d}\). In particular, we use the matrix \(\mathbf A = \mathbf{I} - \mathbf{Z} (\mathbf{Z}^T \mathbf{Z})^{-1} \mathbf{Z}^T\) that projects \(\mathbf{Y}\) to ordinary least squares residuals. The resulting matrix product \(\mathbf{AY}\) is independent of \(\mathbf{d}\). \[
\ell_R(\boldsymbol \xi) = - \frac{1}{2} \left(\log(\det(\Sigma)) + \frac{1}{2} \log( \det( \mathbf{Z}^T \Sigma^{-1} \mathbf{Z})) + \mathbf{Y}^T (\Sigma^{-1} - \Sigma^{-1} \mathbf{Z} (\mathbf{Z}^T \Sigma^{-1} \mathbf{Z})^{-1} \mathbf{Z}^T \Sigma^{-1}) \mathbf{Y} \right)
\] This ends up being the profile log-likelihood with an additional term involving \(\log( \det( \mathbf{Z}^T \Sigma^{-1} \mathbf{Z}))\). Again, this expression must be maximized numerically with respect to \(\boldsymbol \xi\). The procedure concludes by estimating \(\mathbf d\) with GLS and the estimate \(\hat{\boldsymbol \xi}\) plugged in.
@@ -2530,6 +2522,23 @@ 12.3 GCV
+
+
+
+
+
+
+