The following are some recommended practical self-assessment questions for the lesson called *A deeper understanding of data, statistics, probability, and estimation*. They’re intended for you to work through to test your own __ability to apply the key concepts__ we covered there.

## Question 1

In order to calibrate it before sale, the following set of data was collected using a newly manufactured electronic distance measuring (EDM) instrument over a known baseline, i.e. over a known distance. That distance has been subtracted from these data and all scale factor and gross errors have been removed, meaning that the data contain only an instrument bias and random stochastic errors.

- EDM calibration data set » (all measurements in meters)

a) Use Excel to create a scatter plot of your data in mm. Do this by selecting the cells containing those data (and the headers) and by Insert > Scatter. Label the axes appropriately, include a screenshot or image of your histogram in the file you submit, and comment on what you see.

b) Use Excel to create a histogram of your data in mm. Do this by selecting the cells containing those data (and the headers) and by Insert > Histogram. Label the axes appropriately, include a screenshot or image of your histogram in the file you submit, and comment on its shape.

c) Estimate the instrument bias in mm using the most appropriate measure of central tendency.

d) Provide a measure of how precise your estimate of the bias is, also in mm.

e) Estimate the variability in the measurements that the instrument provides. For this, the customers expect to be provided with the standard deviation in mm of the random stochastic noise to they can apply it in their adjustments.

f) Provide a measure of how precise your estimate of the standard deviation is, also in mm.

I get the following numerical values here: c) 2.465 mm; d) 0.107 mm; e) 0.923 mm; f) 0.075 mm.

## Question 2

Imagine that a teacher has been teaching a course for long enough to have had a total of 2000 students take it by now, including the 99 students currently taking it. Further imagine that the 99 students currently in her class write a quiz and get the following scores out of a possible 24 points:

a) Use Excel to create a scatter plot of your data. Label the axes appropriately, include a screenshot or image of your histogram in the file you submit, and comment on what you see.

b) Use Excel to create a histogram of your data. Label the axes appropriately, include a screenshot or image of your histogram in the file you submit, and comment on its shape.

c) Calculate the following descriptive measures of the central tendency of the data:

- Mean
- Median
- Mode
- Midrange

d) Calculate the Standard Error of the following and comment on which provides the most precise measure of central tendency:

- Mean
- Median
- Midrange

e) Calculate the following descriptive measures of the variation in the data:

- Standard deviation
- Variance
- Range
- Coefficient of variation

f) Calculate the Standard Error of the following:

- Standard deviation
- Variance
- Range

In all of the above, be sure to provide the correct units.

g) Which measure of central tendency would you trust most, and why? Which measure of variation would you least, and why?

I get the following numerical values here:

c) mean = 17.9696967 points; median = 19 points; mode = 22 points; midrange = 13 points.

d) SE_{mean} = 0.44773291 points; SE_{median} = 0.561149986 points; SE_{midrange} = 6.300160495 points

e) standard deviation = 4.45488621 points; variance = 19.8460111 points^{2}; range = 22 points; coefficient of variation = 24.7911037

f) SE_{stddev} = 0.316594977 points; SE_{variance} = 2.835144447 points^{2}; SE_{range} = 6.300160495 points

Note: An earlier version of the solutions for d) and f) were inadvertently corrected for the finite population correction (totally my mistake). They should not have been because n/N = 0.0495 < 0.05. I am sorry for the added confusion there. Thanks to those of you who corrected me!

## Question 3

### Context and introduction

This problem is a simple and really cool application of descriptive statistics. First, check out my brief introduction. Next check out the inventor and his prototype. (Personally I can’t understand the language, but was able to find out more by Googling “airbag case for phone”).

### Data

As soon as I saw this product prototype I got my phone out and started dropping it! I wanted to see what the data coming from the sensors looked like. And indeed, it’s possible to identify when your phone is falling just based on the data from the accelerometers – and using the descriptive measures we’ve been studying!

Right away I collected the following short data sets with my phone:

- One when I the phone was sitting flat on my kitchen table
- One when I was standing and holding the phone as I would when looking at it normally
- One when I was lowering the phone from standing height to the ground, as if to put it in my bag
- One when I dropped my phone as the inventor does in the video above

For the questions below, you should download the following Excel file which contains those four data sets in different tabs of the spreadsheet (all measurements are in m/s^{2}):

Each tab of this spreadsheet contains the accelerometer data measured in each of the three perpendicular axes of my phone over a period of a few seconds. Each column corresponds to a different axis and each row corresponds to a different instant in time (they are roughly 0.02 seconds apart).

Accelerometers measure specific force (which can be thought of as the acceleration due to gravity being experienced by the phone *plus* the acceleration due to any motion the phone might be undergoing). So what you have in that file is the total acceleration (gravity + motion) being experienced in each axis of the phone. This is measured every 0.02 seconds for around three seconds for each of the cases described in 1, 2, 3, and 4 above.

(You *don’t* need to collect your own data for this problem but you can play around with your own phone using an app like this one » if you want to.)

### Questions

a) Use Excel to create a scatter plot of each set of data – i.e. create a total of four plots, with each plot containing the accelerometer data in each of the three axes. Label your plot appropriately and include a screenshot or image of your plot in the file you submit.

b) Identify which plot corresponds to which of the above data sets, i.e. which of your four plots from a) corresponds to each of the cases 1, 2, 3, and 4? You should be able to do this by thinking through the motion implied by each case I’ve described and by looking at the accelerometer data itself in the scatter plots.

c) Add a new fourth column to each of the four data sets by computing the magnitude of the specific force (total acceleration) vector at each epoch (i.e. at each moment in time). As you will have learned in applied math, the magnitude of a vector is computed as the square root of the sum of the squares of the values in each component.

d) For the data set where the phone was sitting on the table:

i) Create a scatter plot of its magnitude value – the one you calculated in c).

ii) When thinking about the measurement model, what errors are you likely seeing here if the true magnitude of gravity at the location of my table is 9.83245 m/s^{2}? (It might help to subtract this value from the data before plotting it.)

e) For the data set where I dropped my phone:

i) Create a scatter plot of its magnitude value – the one you calculated in c). What do you see?

ii) Briefly explain the physics going on here and how it shows up in the measurements being taken.

iii) Also explain what is the key characteristic or event in the data that identifies the time at which a phone is dropped?

f) Tell me which one of the following descriptive statistics would be most useful for identifying a dropped phone from the magnitude data, and why:

- Mean
- Standard deviation
- Range

Feel free to reinforce your answer to this with values computed from the magnitude data.

g) Fill in the blanks in the following sentence so that it would serve as a condition statement in an algorithm for triggering the springs that protect a phone, e.g.:

If [this descriptive statistic] calculated for [this type or column of data] [meets this condition] in any [#-second] period of time, then the user likely dropped their phone, so release the springs right away in order to protect the phone!

You can click through to other self-assessments or lessons (if any) using the button below, and return here whenever you wish.