Burgin - Joint, Marginal and Conditional Probabilities Using Excel

When comparing forecast to observed values, one method of comparision is using joint, conditional and marginal distributions. This method is
especially used to when dealing with categorical forecasts such as yes/no for events such as precipitation or frost, or categorical predictions
such as "temperatures in the lower 70's".
The first step in this type of verification technique is to define the categories or bins that you will be sorting your forecasts and observations into. For probability of precipitation (POP), for example, the observations would have two categories, one for days with precipitation and another for days with precipitation. The forecasts will also have to be categorized into categories or bins you determine. For example you could take [0-10],[11-20],...[91-100] to be your categories or you could just as easily use [0-20],[21-40]...[81-100] and have half as many bins to deal with. What you are trying to learn from the analysis should determine the bins you use.
The goal is to make a table of the overlap of observations and measurements like this:
Table A


 observed precipitation

 no observed precipitation

forecast precipitation

# of times precip was correctly predicted

# of times that precipitation was forecast but did not occur

forecast no precipitation

 # of times precip occurred but was not predicted

 # of times that there was no precipitation forecasted or recieved

This can be achieved in excell using the IF() and COUNTIF() functions. Starting with collumns of your observation and forecast data create parallel collumns for each forecast category. Using IF statements that are true when the forecast value in that row falls within that category, display the observed value. Your sheet will look something like this:
 Next you will use the COUNTIF function to complete the table. Apply it to the entire collumn you just created, and condition it upon the forecast category. This will count the cells that apply have the condtion. In this example using POP, the conditions are simply 0 for one row of the table and 100 for the other collumn of the table.
You have now created an occurence table. You can now divide this by the total number of observations and forecasts to create a joint probability table. The marginal probabilities are simply the sums of the collumns and rows. Marginal probabilities represent the probability that an occurence falls into a category such as "forecast between 10 and 19" or "observed 100". For the simple 2X2 example, it should look something like this:

 observation category 1
 observation category 2
 forecast category 1
 b a+b
 forecast category 2
 c d c+d
 marginal a+c b+d 1
Table B
Conditional probabilities are probably the trickiest to wrap one's head around conceptually, but once you understand it, calculating them is not difficult. A conditional probability is the probability of an event given that another event has occurred. For example, if you wanted to look at how often it rains/not rains when rain is forecast, this is a conditional probability problem. Using x for observed values and f for forecast values, you would denote these conditional probabilities as P(x|f), specifically P(x=1|f). Conditional probabilities can be found by taking your joint probability of an event (intersection between two categories and dividing by the probability of the given category. In the example mentioned, refering to Table B, The conditional probability of P(x=1|f=1) would be a/(a+b) and  P(x=1|f=0) would be b/(a+b). Conditional probabilities can also be taken assuming a certain observation looking at what was forecast. These would be noted as P(f|x).
Though most the examples used here are using 2x2 tables NxN tables can also be created expanding on these concepts. For joint probabilities in an NxN table, cells falling along the main diagonal of the table would be correct forecasts. Forecasts differ further and further from observations as you look further from the main diagonal.