Homework #2 Using Excel - Woznicki

How to Calculate Joint, Marginal, and Conditional Distributions Using Microsoft Excel
Weather Forecasts and Associated Observations

BackgroundWeather forecasting is inherently flawed, as meteorologists do not have perfect knowledge in which to make their best judgments (and sufficient computer power does not exist for perfect numerical weather prediction). Therefore, verification of these forecasts provides insight as to where the forecaster can improve in the future.  Murphy and Winkler (1987) and Murphy (1993) introduce joint, marginal, and conditional distributions as means to quantify errors in forecasting.  *Note: The calculations and plots generated in this tutorial were accomplished using Excel for Mac.  Excel for Windows may have slightly different steps.

*When I mention selecting a "range of cells", this is performed by clicking and holding the mouse on the first cell and dragging through all the cells you want included.

Step 1: Attain a Dataset with Forecasts and Observations

Temperature and Probability of Precipitation (POP) Forecasts:  Forecasted temperatures (high temperatures for a certain day were used in this analysis) and forecasted probability of precipitation should be loaded into an Excel spreadsheet with their associated observations for the specified time period/date.  The example data   An example of this type of table is shown below with forecasts and observations for Indianapolis International Airport (KIND).  To later determine the probabilities, it is helpful to have all of the temperature forecasts, temperature observations, POP forecasts, and POP observations in their own separate column.    

Step 2: Create a Scatter Diagram to Represent the Joint Distribution

Temperature Scatter Diagram:  The temperature scatter diagram (shown below on the left) was created using Excel's graphing wizard.  The following steps demonstrate how it is performed:

1.  Select the entire temperature observation column (column C in the table above), and then hold down the command key and select the temperature forecast column (column F) as well.

2.  With those two columns selected, click the Insert tab at the top of the window -> select graph.

3.  Select the X-Y scatter chart type from the green chart toolbar located above the spreadsheet (this should create the plot without the title, axis labels, etc.)

4.  With the chart selected, click the View tab at the top of the Window and click Formatting Palette.  The Formatting Palette will appear on the left-hand side of the screen, and will easily allow you to insert a chart title, chart axis labels, manipulate gridlines, assign colors/designs, etc.

5.  Individual tick values can be manipulated by selecting the axis you want to format, right clicking the axis (or command click for Mac), and selecting format axis.

POP Scatter Diagram:  The POP scatter diagram (shown below on the right) is generated using the same techniques as the temperature plot, except the two POP columns (observations and forecasts) will be selected instead.  *Note: Since the POP observations can only be zero or 100 (%), a scatter diagram representing the joint distribution is not particularly helpful. 


Step 3:  Create a Categorical Contingency Table

Temperature Contingency Table:  A categorical contingency table can be created easily using Excel.  For temperatures, bins for forecasts (y-axis) and observations (x-axis) are set up for certain ranges of temperatures.  The table below shows an example of the contingency tables with counts in each category.  The different forecast and observation bins below are 10-19, 20-29, 20-39, 40-49, 50-59, 60-69, 70-79, 80-89, and 90-99 (ºF).  For example, if someone forecasted a high temperature of 38ºF and the observed high was 41ºF, then that would be added as a count in the intersection of the 40-49 observation bin and the 30-39 forecast bin.  Excel can determine the counts in each bin using the countifs function.  The countifs function counts a value if it meets a set of multiple criteria.  The countifs syntax for three example bins are below:

10-19 forecast bin and 20-29 observation bin: =countifs(range of obs temps,"<30", range of obs temps,">19", range of fcst temps,"<20",range of fcst temps,">9")

30-39 forecast bin and 20-29 observation bin:=countifs(range of obs temps,"<30", range of obs temps,">19", range of fcst temps,"<40",range of fcst temps,">29")

60-69 forecast bin and 70-79 observation bin:=countifs(range of obs temps,"<80", range of obs temps,">69", range of fcst temps,"<70",range of fcst temps,">59")

*Note:  The ranges in the formulas are just placeholders for an actual range of cell references.

This process is completed for all cells to complete the table.  Notice in the table below that the counts are summed together at the end of each column and row.  The total number of forecasts is also summed in the bottom, right-hand corner.

POP Contingency Table:  The POP categorical contingency table is calculated in a similar way, except only two observation categories are possible. Zero and 100 (%) are the only observation categories, as either measurable precipitation was observed or it wasn't.  The following is an example contingency table for POPs:

Step 4: Determine the Joint and Marginal Probability Distributions Using the Contingency Table

Joint and Marginal Probability Distributions:  The joint probabilities in the table below are in the same places as the counts in the previous temperature forecast contingency table.  The joint probabilities are calculated by dividing the counts in the contingency table by the total number of forecasts.  The Excel syntax is as follows:

Joint Probability of a single cell p(f,x) =cell of interest/bottom right cell (total number of forecasts).

The marginal probability p(f) is determined by adding up all the values in a forecast bin (row).

The marginal probability p(x) is determined by adding up all the values in an observation bin (column).

*The sum function is used to add a range of cells together. ex) =sum(range of values)

The same is the case for the POP joint and marginal probability tables.

Step 5: Factor the Joint Distribution and Compute the Conditional Probabilities

Conditional Probability p(f|x):  The conditional probability p(f|x) is calculated from the joint probability distribution table above.  The joint probability p(f|x) is divided by the marginal probability p(x).  The conditional probability p(f|x) is below.  

The Excel syntax for each cell is: =(p(f|x)/p(x)) *Note: The p(f|x) and p(x) are just placeholders for the actual cell reference.

Conditional Probability p(x|f):  The conditional probability p(x|f) is also calculated from the joint distribution table.  The joint probability p(f|x) is divided by the marginal probability p(f).  The conditional probability p(x|f) is below.

Step 6: Forecast Bias

Forecast Bias:  Forecast bias (for either temperature or POPs) can be easily calculated using Excel.  The following Excel formula will calculate the forecast bias by using the original data separated into separate columns:

=(average(range of forecast cells))-(average(range of observed cells))