Snyder - Using gnuplot and Excel

The purpose of this assignment is to evaluate a set of forecasts using distributions-based verification methods described in Murphy and Winkler (1987), namely joint, conditional and marginal distributions. You are given a .csv file containing a series of MOS temperature and probability of precipitation (POP) forecasts for Indianapolis and the corresponding observations. 

First, you are first asked to plot temperature (POP) forecasts (f) and observations (x) to examine the joint distribution. The joint distribution, p(f,x), gives information about the forecasts, the observations and their relationship. Using gnuplot, you can plot f vs. x easily. Gnuplot ( is a free, python-based GUI that allows users to plot 2D and 3D data quickly. To plot f vs. x using gnuplot, first save the forecasts and observations in two columns as a .txt or .data file.

Next use the command line to title the graph and label axes.

        Version 4.6 patchlevel 0    last modified 2012-03-04 
        Build System: MS-Windows 32 bit 

        Copyright (C) 1986-1993, 1998, 2004, 2007-2012
        Thomas Williams, Colin Kelley and many others

        gnuplot home:
        faq, bugs, etc:   type "help FAQ"
        immediate help:   type "help"  (plot window: hit 'h')

Terminal type set to 'wxt'
gnuplot> cd 'C:\Users\owner\Documents\gnuplot'
gnuplot> set title "KIND Temperature Observations vs. Forecasts"
gnuplot> set xlabel "Forecasts (°F)"                            
gnuplot> set ylabel "Observations (°F)"

Finally, you can plot f vs. x using this command. Gnuplot allows you to change the colors and symbols plotted, but in this case the defaults work fine.

plot "" using 1:2 no title with points

For plotting temperature forecasts vs. observations, your figure should look like this:

Now you have a visual representation of the joint distribution of temperature forecasts and observations.

Next you are asked to create contingency tables to further examine the distributions of forecasts and observations. The first thing you want to do is to create bins for your temperature and precipitation data so they can be put into a contingency table easily. For tempertures you are asked to use bins of 10 degrees for forecasts and observations (e.g. 10-19 degrees, 20-29, etc.). For precipitation, the observations are simply 100 and 0, while you are asked to use 20% bins for POP forecasts (e.g. 0-20%, 21-40%, etc.) For the temperature distributions, you will create a 9x9 table, like below:

10s 20s 30s 40s 50s 60s 70s 80s 90s
10s Your        Values             Go        Here



p(x) 40s






Using Excel, I used the rounddown() function to bin precipitation forecasts and temperature forecasts and observations into easy to count bins. I found this was quicker than using a series of if() statements.

Now that I have my forecasts and observations organized into bins, you can now create the contingency table. In Excel, I used if() and countif() statements to find the occurrences of each value in the table. 

Dividing the values of the tables by the total number of forecast/observation pairs yields the joint probablitly distribution p(f,x). Summing the rows and columns of the table and dividing by the total number of forecast/observations yields the marginal distributions p(f) and p(x)

Now you are ready to factor the joint distributions and compute the conditional probablities p(f|x) and p(x|f)

The calibration-refinement factorization is the distribution of the observations given the forecasts and is calculated by p(f,x) = p(x|f)p(f).
The likelihood-base rate factorization is the distribution of the forecasts given the observations and is calculated by p(f,x) = p(f|x)p(x).

Since you have already calculated p(f,x), p(f) and p(x), you can easily solve for both of these distributions using Excel and simple algebra.

Finally, you are asked to compute a measure of bias by subtracting the average observed value from the average forecast value. This can be easily done with Excel as well.