This lesson outlines many of the key statistical concepts needed in geomatics networks applications.

Random variables

If an event has several possible outcomes, we associate it with a random variable (or variate), which we’ll refer to as y. Such an event could be the result of rolling dice, or counting defective products, or measuring the distance between two points in a survey. Either way, the idea is usually that we seek to understand the behavior of a system or situation by characterizing the behavior of a subset, or sample, of observations of the entire system, or population. This is done through mathematical modeling based on as large a random sample as is reasonably possible to get our hands on, for reasons you will have studied before.

Probability distribution functions

A key mathematical model is called the probability distribution function which describes the various probabilities related to possible values of a random variable. The probability distribution function, PDF(y), of a random variable, y, is the function whose integral gives the probability P(a,b) of y lying in the range from a to b:

pdf

Note that in this figure:

  • P(a,b) is the probability corresponding to the ratio of the yellow region to the white region
  • Not all of the white region is shown here – it’s cut off at either end by the edges of the image
  • PDF(y) is shown here to look like a normal distribution, with which you are likely already familiar, but it could take any shape (and the shape it takes will depend on the nature of the underlying phenomena)

And the following properties hold:

1. P(-\infty,\infty)=\int_{-\infty}^{\infty} PDF(y)dy = 1 i.e. the sum of the area under a probability distribution function is 1, or, put another way, a probability of 1 means an event is certain to occur.

2. PDF(y) \geq 0 for all y.

3. PDF(y) is continuous and represents the whole population by a number of variables known as parameters that represent it completely. Whereas values drawn from a sample are used to compute variables known as statistics which together do their best to estimate the population parameters, e.g. think of a population mean vs. a calculated sample mean.

When dealing with more than one random variable – which we will do often in geomatics – we get something like the following (for two random variables y_1 and y_2, and their PDFs):

    \begin{equation*} P(a_1<y_1<b_1; a_2<y_2<b_2)=\int_{a1}^{b1}\int_{a2}^{b2} PDF_1(y_1)PDF_1(y_2)dy_1dy_2 \end{equation*}

which can be drawn as the volume under a surface between the vertical planes y_1=a_1y_1=b_1y_2=a_2, and y_2=b_2.

And in the n-dimensional case we collect the variables in a random vector:

    \begin{equation*} \mathbf{y}=\begin{bmatrix}y_1, y_2, y_3, ..., y_n\end{bmatrix}^T \end{equation*}

and the probability that the random vector \mathbf{y} takes on values in the range between \mathbf{a} and \mathbf{b} is given by:

    \[ \begin{IEEEeqnarray*}{rCl} \IEEEeqnarraymulticol{2}{l}{ P(a_1<y_1<b_1; a_2<y_2<b_2; ...a_n<y_n<b_n) } \\[2ex] & =\int_{a_1}^{b_1}\int_{a_2}^{b_2}...\int_{a_n}^{b_n} PDF_1(y_1)PDF_1(y_2)...PDF_1(y_n) dy_1 dy_2 ... dy_n \\[2ex] \end{IEEEeqnarray*} \]

or, more simply:

    \begin{equation*} P(\mathbf{a}<\mathbf{y}<\mathbf{b}) = \int_{\mathbf{a}}^{\mathbf{b}}PDF(\mathbf{y})d\mathbf{y} \end{equation*}

Expected value and mean

As we have said, the distributions and density functions of random variables are characterized by parameters that define their behavior. The first of these is referred to as expectation, expected value, mean, or average.

The expected value of a function, f(y), is the arithmetic average of f(y) according to its probability distribution function, i.e.:

    \begin{equation*} E\begin{bmatrix}f(y)\end{bmatrix}=\int_{-\infty}^{\infty}f(y)PDF(y)dy \end{equation*}

so the mean, \mu_y, is the expected value of the function itself:

    \begin{equation*} \mu_y= E\begin{bmatrix}y\end{bmatrix}=\int_{-\infty}^{\infty}yPDF(y)dy \end{equation*}

which makes sense if you recall that \int_{-\infty}^{\infty} PDF(y)dy = 1.

And for the multivariate case:

    \begin{equation*} E\begin{bmatrix}f(\mathbf{y})\end{bmatrix}=\int_{-\infty}^{\infty}f(\mathbf{y})PDF(\mathbf{y})d\mathbf{y} \end{equation*}

and the mean is as follows:

    \begin{equation*} \boldsymbol{\mu}= \begin{bmatrix} \mu_1 \\ \mu_2 \\ \vdots \\ \mu_n \\ \end{bmatrix}= \begin{bmatrix} E\begin{bmatrix}y_1\end{bmatrix} \\ E\begin{bmatrix}y_2\end{bmatrix} \\ \vdots \\ E\begin{bmatrix}y_n\end{bmatrix} \\ \end{bmatrix}= E\begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \\ \end{bmatrix}= E\begin{bmatrix}\mathbf{y}\end{bmatrix} \end{equation*}

The variance-covariance matrix

The variance-covariance matrix of y is \mathbf{C}_y and it is defined as follows:

    \begin{equation*} \mathbf{C}_y=E\begin{bmatrix}((\mathbf{y}-\mathbf{\boldsymbol{\mu}})(\mathbf{y}-\mathbf{\boldsymbol{\mu}})^T\end{bmatrix} \end{equation*}

which is written as follows:

(1)   \begin{align*} \mathbf{C}_y & = E( \begin{bmatrix} y_1-\mu_1 \\ y_2-\mu_2 \\ \vdots \\ y_n-\mu_n \end{bmatrix} \begin{bmatrix} y_1-\mu_1,y_2-\mu_2, \cdots, y_n-\mu_n \end{bmatrix}) \\ & = \begin{bmatrix} \sigma_{y_1}^2 & \sigma_{y_1y_2} & \cdots &\sigma_{y_1y_n} \\ \sigma_{y_2y_1} & \sigma_{y_2}^2 & \cdots &\sigma_{y_2y_n} \\ \vdots & \vdots & \ddots & \vdots \\ \sigma_{y_ny_1} & \sigma_{y_ny_2} & \cdots &\sigma_{y_n}^2 \\ \end{bmatrix} \end{align*}

Key parameters

From the above we define the following key parameters:

\sigma_{y_i}^2 = E\begin{bmatrix}(y_i-\mu_i)^2\end{bmatrix} is called the variance of y_i – an expression of the variation of the distribution

\sigma_{y_iy_j} = E\begin{bmatrix}(y_i-\mu_i)(y_j-\mu_j)\end{bmatrix} is called the covariance of y_i and y_j – an expression of the mutual variation of the two random variables, reflecting their interrelationship or mutual correlation

the square root of the variance \sigma_{y_i} = \sqrt{\sigma_{y_i}^2} is called the standard error or standard deviation of y_i

And, finally, we define the coefficient of correlation between y_i and y_j to be:

    \begin{equation*} \rho_{ij}=\dfrac{\sigma_{y_iy_j}}{\sigma_{y_i}\sigma_{y_j}} \end{equation*}

about which we note that:

-1 \leq \rho_{ij} \leq 1

and that:

As outlined here: “If the correlation coefficient is close to 1, it would indicate that the variables are positively linearly related and the scatter plot falls almost along a straight line with positive slope. For -1, it indicates that the variables are negatively linearly related and the scatter plot almost falls along a straight line with negative slope. And for zero, it would indicate a weak linear relationship between the variables.”

and that:

Correlation between two random variables describes interdependence between them, but does not describe a stochastic dependence (or independence) which would come from a conditional distribution.

Propagation of covariances

It is incredibly useful to be able to propagate the parameters that define a stochastic model. For variances and covariances we’ll use the general law of propagation of covariances which says that if a second random variable \mathbf{z} is related to our first random variable \mathbf{y} as follows:

    \begin{equation*} \mathbf{z}=\mathbf{J}\mathbf{y} \end{equation*}

where \mathbf{J} is a deterministic non-stochastic function, then:

    \begin{align*} \boxed{\mathbf{C}_\mathbf{z}=\mathbf{J}\mathbf{C}_{\mathbf{y}}\mathbf{J}^T} \end{align*}

This is incredibly powerful.

Example 1: Error propagation for simple linear relationships

For example, if you have a simple linear model that relates your observations,\mathbf{l}_{measured}, to your desired parameters, \mathbf{x}, e.g.:

    \begin{equation*} \mathbf{x} =\mathbf{J}\mathbf{l}_{measured} \end{equation*}

then the covariance matrix of the parameters can be propagated from that of the observations as follows:

    \begin{equation*} \mathbf{C}_\mathbf{x} =\mathbf{J}\mathbf{C}_\mathbf{l}\mathbf{J}^T \end{equation*}

Example 2: Error propagation for general nonlinear relationships

As another example, think of the observation equations we’ve been using, and of our simplified observation model in vector form:

    \begin{equation*} \mathbf{l}_{measured}=\mathbf{l}_{true}+\mathbf{n} \end{equation*}

where in the parametric case we know that \mathbf{l}_{true}=\mathbf{F}(\mathbf{x}) which means that:

    \begin{equation*} \mathbf{l}_{measured}=\mathbf{F}(\mathbf{x})+\mathbf{n} \end{equation*}

In other words, our familiar observation equations are in exactly the form \mathbf{z}=\mathbf{J}\mathbf{y} mentioned above, which means the law of propagation of covariances applies. Of course, as we have seen, this relationship is not a simple linear relationship.

However, if you linearize and follow through, you can show that the following holds, which should be familiar to you by now:

    \begin{equation*} \mathbf{C}_\mathbf{l} = \mathbf{A}\mathbf{C}_\mathbf{x}\mathbf{A}^T \end{equation*}

This means that you can propagate the errors if you know the design matrix \mathbf{A}.

Example 3: Preanalysis by propagating the variance-covariance matrix

And this can also be used to show that the following is true:

    \begin{equation*} \mathbf{C}_{\hat{\mathbf{x}}} = \begin{bmatrix}\mathbf{A}^T\mathbf{C}_\mathbf{l}^{-1}\mathbf{A}\end{bmatrix}^{-1} \end{equation*}

which you should recognize as one of our parametric least squares estimation equations.

Again, you don’t need anything more than the matrix relating them to each other – the design matrix \mathbf{A} – in order to relate the errors in the measurements to those in the estimated parameters.

I hope you can see the power of this!

If you can’t yet, you will when you get to Lab 3 in which you’ll do a full preanalysis of a network without ever setting foot in the field. In other words, you’ll be able to use the above equation to calculate the stochastic model of the estimated parameters, \mathbf{C}_{\hat{\mathbf{x}}}, based only on the stochastic model of the measurements, and the design matrix \mathbf{A}. No measurements needed!

Propagation of variances

The law of propagation of variances follows from the the above in the case where the variables are independent, i.e. where the off-diagonals are zero.

For example, for

    \begin{equation*} \mathbf{C}_\mathbf{l} = \mathbf{A}\mathbf{C}_\mathbf{x}\mathbf{A}^T \end{equation*}

where the off-diagonal elements of \mathbf{C}_\mathbf{x} are zero and where the relationship \mathbf{l}_{measured}=\mathbf{F}(\mathbf{x}) is of the form

    \begin{equation*} l = f(x_1, x_2, ..., x_n) = a_1x_1 \pm a_2x_2 \pm ... \pm a_ux_u \end{equation*}

it’s easy to show that:

    \begin{equation*} \boxed{\sigma_l^2=\begin{pmatrix}\dfrac{df_1}{dx_1}\end{pmatrix}^2\sigma_x_1^2+\begin{pmatrix}\dfrac{df_2}{dx_2}\end{pmatrix}^2\sigma_x_2^2 + ... +\begin{pmatrix}\dfrac{df_u}{dx_u}\end{pmatrix}^2\sigma_x_u^2} \end{equation*}

which we often shorten to the following:

    \begin{equation*} \sigma_l^2=a_1^2\sigma_1^2+a_2^2\sigma_2^2 + ... +a_u^2\sigma_u^2 \end{equation*}

In the next lesson we will look at the basic equations for calculating the key population parameters and the sample statistics.