Now let’s do it for our own model

In the last lesson we just learned that as long as we can get our hands on a decent approximate to what we’re after estimating, then we can turn our nonlinear functional models into linear approximations. Let’s put this to work for our situation where \mathbf{F}(\mathbf{x},\mathbf{l}_{true}) = \mathbf{0}.

But how can we get approximate values of \mathbf{x} and \mathbf{l}_{true}?

Just as in the general case (in the lesson How do we linearize?) we wrote:

    \begin{equation*} x = x_0 + \Delta x \end{equation*}

we will now write the following for our desired unknown parameter, \mathbf{x}:

    \begin{equation*} \mathbf{x} = \mathbf{x_0} + \boldsymbol{\delta} \end{equation*}

where:

\mathbf{x_0} is an approximation to \mathbf{x}

\boldsymbol{\delta} is an unknown correction to \mathbf{x_0}, or the difference between it and \mathbf{x}

And for our observed quantities, \mathbf{l}_{true}, we will just use the measurements themselves and our earlier equation:

    \begin{equation*} \mathbf{l}_{true} = \mathbf{l}_{measured} - \mathbf{e} \end{equation}

where:

our measurements \mathbf{l}_{measured} provide an approximation to \mathbf{l}_{true}

\mathbf{e} are our statistical errors, which should be familiar to you from our first lesson (What are errors and residuals? (And some other sanity checks out of the gate) as the difference between the measurement and the actual unknown observed quantity

Now we have the required approximations of the type x = x_0 + \Delta x but relevant for our situation and using them we can write the following:

    \begin{equation*} \mathbf{F}(\mathbf{x},\mathbf{l}_{true}) = \mathbf{F}(\mathbf{x_0} +\boldsymbol{\delta},\mathbf{l}_{measured} -\mathbf{e}) = 0 \end{equation*}

Or (and this is the really awesome bit), we can write it as follows using what we now know about Taylor’s Theorem:

    \begin{equation*} \boxed{ \mathbf{F}(\mathbf{x},\mathbf{l}_{true}) = \mathbf{F}(\mathbf{x_0},\mathbf{l}_{measured}) + \left.\frac{d\mathbf{F}}{d\mathbf{x}}\right|_{\mathbf{x}_0}\boldsymbol{\delta} - \left.\frac{d\mathbf{F}}{d\mathbf{l}}\right|_{\mathbf{l}_{measured}}\mathbf{e} } \end{equation*}

So we’ve taken our general functional model \mathbf{F}(\mathbf{x},\mathbf{l}_{true}) = \mathbf{0} and expressed it in linear form!

It’s worth noting again that the terms \boldsymbol{\delta} and \mathbf{e} are the true (and unknown) corrections to the approximate values \mathbf{x}_0 and \mathbf{l}_{measured} that we used. (Just as \Delta x was the true unknown correction to x^0 in Taylor’s Theorem.)

And it’s worth having a look at what each of the terms really means, which we will do next.

Let’s have a look at this thing

Now we’ve got our linearize model. So let’s have a look at each of its components in turn.

\mathbf{F}(\mathbf{x_0},\mathbf{l}_{measured}) is a vector containing the r values of \mathbf{F}(\mathbf{x},\mathbf{l}_{true}), computed at the known points \mathbf{l}_{measured}. We will denote this as:

    \begin{align*} \underset{r\times 1}{\mathbf{w} } & = \mathbf{F}(\mathbf{x}_0,\mathbf{l}_{measured}) \\ &= \begin{bmatrix} f_1(\mathbf{x}_0,\mathbf{l}_{measured}) \\[2ex] f_2(\mathbf{x}_0,\mathbf{l}_{measured}) \\[2ex] \vdots \\[2ex] f_r(\mathbf{x}_0,\mathbf{l}_{measured}) \\ \end{bmatrix} \end{align*}

and call it the misclosure vector.

And \left.\dfrac{d\mathbf{F}}{d\mathbf{x}}\right|_{\mathbf{x}_0} is a matrix of size r x u that we will denote with the letter \mathbf{A}, where the j^{th} row will be the partial differentials of f_j(\mathbf{x},\mathbf{l}_{true}) with respect to x_1x_2x_3, … x_u.

    \begin{align*} \underset{r\times u}{\mathbf{A} } & = \left.\frac{d\mathbf{F}}{d\mathbf{x}}\right|_0 \\ & = \begin{bmatrix} \left.\dfrac{df_1}{dx_1}\right|_0 & \left.\dfrac{df_1}{dx_2}\right|_0 & \cdots &\left.\dfrac{df_1}{dx_u}\right|_0\\[2ex] \left.\dfrac{df_2}{dx_1}\right|_0 & \left.\dfrac{df_2}{dx_2}\right|_0 & \cdots &\left.\dfrac{df_2}{dx_u}\right|_0\\[2ex] \vdots & \vdots & & \vdots \\[2ex] \left.\dfrac{df_r}{dx_1}\right|_0 & \left.\dfrac{df_r}{dx_2}\right|_0 & \cdots &\left.\dfrac{df_r}{dx_u}\right|_0\\ \end{bmatrix} \end{align*}

where we’ve introduced the following slight shorthand for convenience of notation:

    \begin{equation*} \left.\dfrac{d\mathbf{F}}{d\mathbf{x}}\right|_{0} = \left.\dfrac{d\mathbf{F}}{d\mathbf{x}}\right|_{\mathbf{x}_0} \end{equation*}

to indicate that the derivatives are evaluated at the approximate values, \mathbf{x}_0.

And, finally, \left.\dfrac{d\mathbf{F}}{d\mathbf{l}}\right|_{\mathbf{l}_{measured}} is a matrix of size r x n that we will denote with the letter \mathbf{B}, where the rows contain the partial differentials of f_j(\mathbf{x},\mathbf{l}_{true}) with respect to l_1l_2, … l_n.

    \begin{align*} \underset{r\times n}{\mathbf{B} } & =\left.\dfrac{d\mathbf{F}}{d\mathbf{l}}\right|_{\mathbf{l}_{m}} \\ & = \begin{bmatrix} \left.\dfrac{df_1}{dl_1}\right|_{\mathbf{l}_{m}} & \left.\dfrac{df_1}{dl_2}\right|_{\mathbf{l}_{m}} & \cdots &\left.\dfrac{df_1}{dl_n}\right|_{\mathbf{l}_{m}}\\[2ex] \left.\dfrac{df_2}{dl_1}\right|_{\mathbf{l}_{m}} & \left.\dfrac{df_2}{dl_2}\right|_{\mathbf{l}_{m}} & \cdots &\left.\dfrac{df_2}{dl_n}\right|_{\mathbf{l}_{m}}\\[2ex] \vdots & \vdots & & \vdots \\[2ex] \left.\dfrac{df_r}{dl_1}\right|_{\mathbf{l}_{m}} & \left.\dfrac{df_r}{dl_2}\right|_{\mathbf{l}_{m}} & \cdots &\left.\dfrac{df_r}{dl_n}\right|_{\mathbf{l}_{m}}\\ \end{bmatrix} \end{align*}

where we’ve introduced the following slight shorthand for convenience of notation:

    \begin{equation*} \left.\dfrac{d\mathbf{F}}{d\mathbf{l}}\right|_{\mathbf{l}_{m}} =\left.\dfrac{d\mathbf{F}}{d\mathbf{l}}\right|_{\mathbf{l}_{measured}} \end{equation*}

to indicate that the derivatives are evaluated at the measured values, \mathbf{l}_{measured}.

Summary of the general linearized model equation

We refer to the matrices \mathbf{A} and \mathbf{B} as design matrices. And with them in hand, we can write the function form of our functional math models:

    \begin{equation*} \boxed{\mathbf{F}(\mathbf{x},\mathbf{l}_{true}) \approx \mathbf{A}\boldsymbol{\delta} - \mathbf{B}\mathbf{e} + \mathbf{w} = 0} \end{equation*}

where the misclosure vector is given by:

    \begin{equation*} \underset{r\times 1}{\mathbf{w} } & = \mathbf{F}(\mathbf{x}_0,\mathbf{l}_{measured}) \end{equation*}

and the design matrices are given by:

    \begin{equation*} \underset{r\times u}{\mathbf{A} } & = \left.\frac{d\mathbf{F}}{d\mathbf{x}}\right|_{\mathbf{x}_0} \end{equation*}

    \begin{equation*} \underset{r\times n}{\mathbf{B} } & =\left.\dfrac{d\mathbf{F}}{d\mathbf{l}}\right|_{\mathbf{l}_{m}} \end{equation*}

The special cases

The linearized combined model

In the lesson Let’s linearize our general functional model, we arrived at the linearized model:

    \begin{equation*} \boxed{ \mathbf{A}\boldsymbol{\delta} - \mathbf{B}\mathbf{e} + \mathbf{w} =\mathbf{0}} \end{equation*}

This is referred to as the combined linear model which we know has the non-linear form \mathbf{F}(\mathbf{x}, \mathbf{l}_{true}) =\mathbf{0}.

There are two special cases of this model, for each of the other types of equation we saw in So, to summarize functional modeling.

The linearized parametric model

For the parametric model which has the form \mathbf{l}_{true} - \mathbf{F}(\mathbf{x}) =\mathbf{0}, it can be shown that the term

    \begin{equation*} \mathbf{B} =\left.\dfrac{d\mathbf{F}}{d\mathbf{l}}\right|_{\mathbf{l}_{m}} = \mathbf{l}_{I} \end{equation*}

e.g. \dfrac{df_i}{df_l} = 1

so we get the following linearized form:

    \begin{equation*} \boxed{ \mathbf{A}\boldsymbol{\delta} - \mathbf{e} + \mathbf{w} =\mathbf{0}} \end{equation*}

The linearized condition model

And for the condition model which has the form \mathbf{l}_{true} =\mathbf{0}, it can be shown that the term

    \begin{equation*} \mathbf{A} =\left.\dfrac{d\mathbf{F}}{d\mathbf{x}}\right|_{\mathbf{x}_0} = \mathbf{0} \end{equation*}

(e.g. since there are no parameters!)

so we get the following linearized form:

    \begin{equation*} \boxed{ -\mathbf{B}\mathbf{e} + \mathbf{w} =\mathbf{0}} \end{equation*}

In summary

The following table summarizes the linearized functional models we’ve been dealing with.

Combined equations Observation equations
(Parametric model)
Condition equations
Nonlinear functional model \mathbf{F}(\mathbf{x}, \mathbf{l}_{true}) =\mathbf{0} \mathbf{l}_{true} - \mathbf{F}(\mathbf{x}) =\mathbf{0} \mathbf{F}(\mathbf{l}_{true}) =\mathbf{0}
number of equations r n nu (see note 1)
number of observations n n n
number of parameters u u 0 (see note 1)
degrees of freedom ru runu n
linearized model (see note 2)  \mathbf{A}\boldsymbol{\delta} - \mathbf{B}\mathbf{e} + \mathbf{w} =\mathbf{0}  \mathbf{A}\boldsymbol{\delta} - \mathbf{e} + \mathbf{w} =\mathbf{0}  -\mathbf{B}\mathbf{e} + \mathbf{w} =\mathbf{0}

Note 1: It might seem strange to say that there are no parameters in the condition equations (i.e. that u = 0) and then say that the number of equations is nu. In this case, it is the number of parameters that would be used if the problem was solved using observation equations. And, as such, the degrees of freedom is n in that case.

Note 2: I have used the error term \mathbf{e} to stay consistent with earlier lessons, e.g. right back to What are errors and residuals? (And some other sanity checks out of the gate), because I find there’s often a fundamental misunderstanding about what \mathbf{e} is and what it isn’t. That said, it’s not uncommon the see the symbols \mathbf{v} or -\mathbf{r} used in its place. These are matters of preference and convention, and the meaning doesn’t change in either case.