**I need to provide a link to the best web page that I know of that contains forecast verification methodology information:**

**I have used some of the equations from that page (since the Google docs equation editor doesn't translate to Google sites)**

`# basically, the stuff in this courier font is code `

`# you should be able to copy and paste it into the python shell`

**# things that begin with # are comment lines and are not executed**

**python**

what is python?

http://pyaos.johnny-lin.com/

open a terminal and start python interactively

**in the computer lab (linux) type:**

`/opt/EPD/bin/python`

also open a terminal to run python scripts

edit a script, keep the editor open, save and run the script

`print 'hello world'`

it is always good to know how to get out of a program...try:

`quit()`

try these to see how python handles mathematical expressions and integer data types

`print 5*4`

`print 6/2`

`print 13+5+7`

`print 6**2`

`a=5/2`

`print a`

`dir(a)`

variables, math, logicals

`a=3.5`

`b=-2.1`

`c=3`

`d=4`

`a*b`

`b+c`

`a/c`

`c/d`

`d**c`

`a>b`

`a<=b`

`a==b`

`a!=b`

strings

`s1='hello'`

`s2='world'`

`s1 + s2`

`s1 + ' ' + s2 + "!"`

lists

`a=[2, 3.2, 'hello', 'eeee']`

`len(a)`

`a[0]`

`a[1:3]`

`a[1]='goodbye'`

tuples (lists that can’t be changed)

`b=(2, 3.2, 'goodbye', 'aaaa')`

`b[0]`

**continuous variables**

specific values, such as temperature, wind speed, probability of precipitation

following the Murphy & Winkler general framework - joint distribution approach

modules and packages

package is a collection of modules, I’ll probably use these interchangeably

import module_name

`import numpy`

`p=numpy.pi`

`print p`

`numpy.sin(p/2)`

`import numpy as np`

`print np.pi`

we’re going to generate some fake observations and forecasts and do some quick calculations

`mu=50`

`sigma=20`

`obs=np.random.normal(mu,sigma,1000)`

`help(np.random.normal)`

`dir(np.random)`

`np.max(obs)`

`np.min(obs)`

`np.mean(obs)`

`np.median(obs)`

`randerr=np.random.normal(-1,10,1000)`

**syserr=10.0-0.2*obs**

`fcst=obs+randerr+syserr`

`np.max(fcst)`

`np.min(fcst)`

`np.mean(fcst)`

`np.median(fcst)`

mean error (bias)

`error=fcst-obs`

`np.mean(fcst-obs)`

`np.mean(error)`

positive: forecast is on average above the obs (too warm)

negative: forecast is on average below the obs (too cold)

mean absolute error** **

`np.mean(np.abs(error))`

typical magnitude of the error

mean square error

`np.mean(error*error)`

`np.mean(error**2)`

more sensitive to large errors (outliers)

a single bad forecast (or bad ob) will make this large

root mean squared error

same units as forecast, obs

another typical magnitude of error (sensitive to large errors)

`mse=np.mean(error**2)`

`rmse=np.sqrt(mse)`

`print rmse`

let’s use joint distribution approach and make a scatter plot

forecast values vs. observed values (f,x)

to be consistent with contingency table, obs=x coord, fcst=y coord

need to import another package that has nice plotting modules

`import matplotlib.pyplot as plt`

matplotlib is a Matlab-style plotting package, since I’m familiar with Matlab, it’s easy for me to learn

`plt.figure()`

`plt.plot(obs,fcst,'.')`

`plt.hold()`

`plt.xlabel('obs')`

`plt.ylabel('fcst')`

`plt.show()`

we’re looking at p(f,x) from the sample

could do some kernel density estimation to estimate p(f,x) in between points

or turn this into a categorical forecast verification and use contingency tables

can also plot:

errors vs forecast values

`plt.plot(fcst-obs,obs,'.')`

`plt.show()`

errors vs observed values

`plt.plot(fcst-obs,obs,'.')`

```
plt.show()
```

**this can reveal relationships between the errors and particular predicted or observed values**

sort of a conditional distribution

do we underforecast stronger observed winds?

do we overforecast cold observed temperatures?

do errors increase in magnitude for rare events?

also useful to examine the distribution of forecast and observed values

**let's look at histograms, which are empirical estimates of the PDF**

`plt.hist(obs)`

`plt.show()`

`plt.hist(fcst)`

`plt.show()`

`plt.hist([obs,fcst])`

`plt.show()`

another useful way to look at these distributions is by plotting the empirical cumulative distribution function (CDF)**CDF: p(x)=Pr{X<=x} = integral of PDF(x) from -inf to x**

to do this:

1. sort the data (x) from low to high

`fcstsort=np.sort(fcst)`

`obssort=np.sort(obs)`

**2. estimate CDF=p(xi)**

p(x_{i})=(i-a)/(n+1-2a)

Tukey a=1/3

Gumbel a=1

a=1./3.

n=1000.

`i=np.arange(1.0,1001.0,1.0)`

px=(i-a)/(n+1.-2.*a)

3. plot sorted data (x) vs p(x)

`plt.plot(fcstsort,px,'r',obssort,px,'b')`

`plt.show()`

for a Normal distribution, this will look a bit like the side of a hill

**starts at zero, ramps up slowly at first as we move along the left tail of the distribution, then ramps up quickly as we move across the main part of the distribution, then ramps up slowly again towards 1.0 as we move along the right tail of the distribution**

**for a uniform distribution, the CDF will be a straight line**

Linear Error in Probability Space (LEPS)

basically look up the CDF of the obs value and compare that to the CDF of the forecast value (using the same observed distribution to define those CDFs)...take the absolute value of the difference, and average all of those

**when the underlying observed distribution is uniform, the CDF is linear, this score is basically the same as MAE**

**HOWEVER, when the underlying distribution has tails or skewness, like precipitation for example, large differences in actual values will result in small differences in LEPS when you are comparing values out in the "tail" of a particular distribution.**

On the other hand, **when you compare values near the "bulge" of the distribution, small differences in actual values will result in large differences in LEPS.**

**this is similar to saying that a forecast for 50mm when 25mm is observed is still relatively "good", while a forecast for 1mm when 10mm is observed is relatively bad, even though the actual difference in the first case is larger than in the second...a forecast of a rare event should be "graded easier" than a forecast for a common event**

`cdfobs=px`

`cdffcst=np.interp(fcstsort,obssort,px)`

here we are using numpy's interpolation function to linearly interpolate the CDF values in the obs distribution to the forecast values. this interpolation function does not extrapolate by default, so any forecast values greater than the observed maximum will be assigned to CDF=0.999 (in this example with sample size=1000) and any forecast values less than the observed minimum will be assigned CDF=0.

**now we can calculate the LEPS**

`leps=np.mean(np.abs(cdffcst-cdfobs))`

`print leps`

doing a scatter plot of these two distributions is what is known as a Q-Q plot (quantile-quantile plot), which is a exploratory data analysis procedure for determining if two samples are from the same distribution, if they are, this will fall along the main diagonal. this is similar to what we would expect the forecast performance to be if we were to do verification following recalibration of the forecasts to remove bias

`plt.plot(cdfobs,cdffcst,'.')`

`plt.show()`

skill score

for most of these error measures, the score for a perfect forecast = 0, for example, for MSE

SS_mse = 1 - MSE_forecast/MSE_reference

for a reference forecast = climatology

where O_bar is the mean observed value, often called "climatology"

questions:

should this be the mean observation from the verification sample?

should this be a long-term climatology?

should this be the overall/global mean observation, or should this vary spatially/temporally?

`error=fcst-obs`

`mse=np.mean(error**2)`

`anomaly=np.mean(obs)-obs`

`mseclimo=np.mean(anomaly**2)`

`ssmse=1-mse/mseclimo`

`print ssmse`

`print mse`

`print mseclimo`

correlation coefficient can also be used as a measure of "association" (aspect of forecast quality as defined by Murphy 1993)

correlation is not sensitive to bias, a biased forecast could have perfect correlation, so correlation by itself should not be used as a single measure of quality

`A=np.corrcoef(fcst,obs)`

`r=A[0,1]`

`print A`

`print r`

this returns a correlation matrix, the values in the diagonal should be = 1, these indicate that the forecasts are perfectly correlated with themselves, and the observations are perfected correlated with themselves. A is a 2x2 matrix, we can refer to the i,j element of A either with

`print A[1,0]`

`print A[1][0]`

in general A=np.corrcoef(x_{0},x_{1}) returns: A_{1,0} element = A_{0,1} element = the correlation between x_{0} (the forecasts) and x_{1} (the observed values)

as a reminder, the definition of the correlation coefficient is the covariance between two variables (X and Y), divided by the product of their standard deviations:

Murphy (1988) showed how the MSE can be rewritten in terms of: means, standard deviations, and correlation between forecasts and observations:

where s_{f}, s_{x} are the sample standard deviations of the forecast and observations, respectively

and r_{fx} is the correlation between the forecasts and observations

- first term is the bias (squared), therefore the larger the bias, the larger MSE will be
- second term is the variance of the forecasts, one might assume that a lower MSE could be obtained by simply issuing constant forecasts (all equal to the mean, giving zero variance s
_{f}^{2}=0), however we also have the standard deviation in the fourth term, which is negatively oriented, therefore low variance (standard deviation) would also result in a low fourth term. ideally we want our forecast system to have a similar variance as the observations. - third term is the variance of the observations, obviously outside the control of the forecaster
- fourth term consists of the product of the standard deviations of both forecasts and observations with the correlation between the two...forecasts that are more highly correlated with observations will result in a lower MSE

note that for a perfect forecast, s_{f} = s_{x} and r_{fx} = 1, therefore the fourth term would cancel out the second and third terms

this is telling us: to reduce your MSE...reduce your bias and increase your correlation with the observations

this may seem quite obvious, but here we have the math to explain how this works

in a similar vein, we can decompose the skill score based on MSE

here, the first term is the square of the linear association between the forecasts and observations, a good forecast will have a higher number for this term the second term is related to reliability/conditional bias...it is also negatively oriented, so a good forecast will have a small magnitude for this term

- ideally, s
_{f }= s_{x}, so the ratio would = 1...also recall that for simple linear regression, the slope of the regression line is related to the correlation between the two variables, so this compares the slope of a regression line to the ideal slope - if there is conditional bias, the forecast system may overforecast small values and underforecast large values, the slope of the linear regression between forecasts and observations would be less than 1 even if the overall bias = zero
- this term will increase in magnitude if the forecast system is unreliable

the third term is related to the overall bias of the forecast system, a good forecast will have a small number for this term

`sx = np.std(obs)`

`sf = np.std(fcst)`

`fbar=np.mean(fcst)`

`xbar=np.mean(obs)`

`print r**2`

`print (r-sf/sx)**2`

`print ((fbar-xbar)/sx)**2`

`ssmse2=r**2-`

`(r-sf/sx)**2-`

`((fbar-xbar)/sx)**2`

`print ssmse2`

let's try to see how these numbers will change if we examine a biased set of forecasts and a set of forecasts that have larger conditional bias

## Probability forecasts

Probability forecasts can be considered a subset of continuous forecasts since the forecast can have any value between 0 and 1 (or 0-100%).

Typically the probability of a specific event, such as precipitation greater than or equal to 0.01"

(POP)

therefore, the observations can only have two values: 0 or 1

`obspop = np.random.binomial(1,0.3,1000)`

this will return 1000 numbers, either 0 or 1, drawn from a binomial distribution with a 0.3 probability of returning 1 for each

let's generate totally random forecasts, from a uniform distribution

say that our forecasts are not very sharp, they only vary between 0.1 and 0.5

`fcstpop=np.random.uniform(0.1,0.5,1000)`

check the mean values to see if they are both close to 0.3

`np.mean(obspop)`

`np.mean(fcstpop)`

also, as we have already noted, the forecasts are often categorized by placing the continuous numbers into discrete bins, we have used categorical methods to verify these kinds of forecasts already

MSE of probability forecast is known as the Brier Score (BS)

for n pairs of forecast/observed events

where x_{k} =1 when the observed event occurs, x_{k} = 0 when the observed event does not occur

pretty easy to compute in python

`error=fcstpop-obspop`

`BS=np.mean(error**2)`

`print BS`

this is an error, so for a perfect forecast: BS = 0 (*no "B.S."*)

Brier Skill Score (BSS)

BS_{ref} = usually climatology

BS can be decomposed into terms related to:

- reliability
- resolution
- uncertainty

calibration-refinement factorization from Murphy and Winkler (1987)

let's look at the first 5 numbers in each forecast and observation array

`fcstpop[:5]`

`obspop[:5]`

let's say the forecast values are indicated by i=1,M categories

for example, let's say the i=1 category is for 0 < f <= 0.2

total number of events = n

`n=len(obspop)`

let N_{i} be the number of times each categorical f_{i} was issued

let's look at some logical tests on our first 5 forecast values

`fcstpop[:5]<0.2`

this will return a "boolean" array of True and False for each value in the fcstpop array after testing if they are <0.2

we can take advantage of this array of True and False to do conditional calculations, a simple sum treats True =1 and False = 0

`np.sum(fcstpop[:5]<0.2)`

since most categories are greater than or equal to a lower limit __and__ less than an upper limit, we use the numpy logical_and function to do this:

`np.logical_and(fcstpop[:5]>=0, fcstpop[:5]<0.2)`

`np.sum(np.logical_and(fcstpop[:5]>=0, fcstpop[:5]<0.2))`

(should give the same answer as before since fcstpop are all non-negative)

take away the array index that was limiting things to the first 5 numbers to see the number of forecasts that were issued in this category:

`Ni=np.sum(np.logical_and(fcstpop>=0, fcstpop<0.2))`

`print Ni`

the marginal distribution of forecast values is given by:

relative frequency of each forecast category

Ni=np.sum(np.logical_and(fcstpop>=0, fcstpop<0.2))

pfi = Ni*1.0/n

print pfi

here I multiplied Ni by 1.0 to get this into a real number, otherwise it will do integer division

we can also list the forecast values only where the forecasts fall into our category:

`fcstpop[np.logical_and(fcstpop[:5]>=0., fcstpop[:5]<0.2)]`

this nice thing about this is that we can also list the *observed* values that correspond to those specific forecast values, only where the forecasts fall into our category, simply by indexing the *observations* by the logical test on the *forecasts*:

obspop[np.logical_and(fcstpop[:5]>=0., fcstpop[:5]<0.2)]

now let's look at the conditional probability that x=1 given f=f_{i} , which is:

sum/average the observed values for only those occasions where f_{i} is issued

conditional average observation, expected value of x given f_{i}

using numpy, this is simply:

np.mean(obspop[np.logical_and(fcstpop[:5]>=0., fcstpop[:5]<0.2)])

#now do the entire array

np.mean(obspop[np.logical_and(fcstpop>=0., fcstpop<0.2)])

recall that this is getting at reliability: we want the frequency of observation = yes to be equal to the forecast category for a reliable forecast system

fil

unconditional sample climatology:

decomposed Brier score:

BS = "reliability" + "resolution" + "uncertainty"

let's set up a script to calculate the components of the Brier score using our POPs and forecast categories every 20%

edit a file called "bspop.py", you can use "gedit bspop.py" or vi or emacs if you are comfortable with those editors

the first line will execute our version of python

everything following will be python code that will be executed just like it would be if we were to type it into the terminal

let's start with a 'hello world' (of course)

`#!/usr/bin/env /opt/EPD/bin/python`

`print 'hello world'`

`quit()`

save this file, then make sure that the permissions are set so you can execute the file:

`chmod 777 bspop.py`

`./bspop.py`

now we can edit our script, save it, then execute it

this will allow us to build up our python code piece-by-piece

let's play around with looping, python can do looping with the "for <index> in <list>" command

add this to your script:

import numpy as np

for k in [7, -3.3, np.pi, 'hello']:

print k

a = [7, -3.3, np.pi, 'hello']

for i in np.arange(4):

print i, a[i]

print np.arange(4)

save this, an execute it to see what you get

here is where some python syntax comes into play, when you use a "for" loop (or an "if" test), the end of that statement will need a colon ":", the block of code that follows must be indented to indicate that this block of code will be what is executed in the loop. the indents must be consistent within that block of code. that block of code is terminated by the next line of python that is NOT indented

in the first loop, you see how this for loop works, it simply steps through the provided list sequentially, executing the intended block of code that follows. the list can contain any type of data in the list, here we've provided an integer, float, numpy object, and a string

the second loop uses the same list, but uses a numpy array (np.arange) that sets up a simple array of sequential numbers, the for loop steps through those numbers, in this case 0 to 3 (four numbers in total). the block of code prints the loop index, then refers to array "a" by that i

a numpy "array" is a special kind of list that only contains one type of variable, say "short" integers or a double-precision floating point numbers. using arrays is faster than regular lists since all of the members of that list are of the same type, there is less overhead in python when operating with arrays than with regular lists

you can define a list to be a numpy array using:

`mylist = np.array([2, 3, 5], dtype = 'i')`

the first argument is the list, the second argument is the data type, use:

- 'd' for double-precision floating point
- 'f' for single-precision floating point
- 'i' for short integer
- 'l' for long integer

I will need to double check this, but I believe double-precision floats are 64-bit, and long integers are 32-bit, this probably depends on the machine you're using

it is usually a good idea to "initialize" an array that you will be working with to set the values to all zeros, using the numpy.zeros function

`a = np.zeros(1000, dtype='d')`

where the first argument is the shape of the array

try:

`b = np.zeros((3,2), dtype='d')`

`np.shape(b)`

`np.rank(b)`

`np.size(b)`

`np.dtype.char`

let's try some other common array manipulation

`a = np.arange(6)`

`print a`

`b= np.reshape(a,(3,2))`

`print b`

`c=np.transpose(b)`

`print c`

`a = np.append(a,[7,8])`

`print a`

`aplus=np.concatenate((a,[10,11]))`

`print aplus`

`aaa=np.repeat(a,3)`

`print aaa`

let's try to use the looping and arrays to calculate the components of the BS decomposition, try the following python code, make sure that you understand how each line works

`#!/usr/bin/env /opt/EPD/bin/python`

`import numpy as np`

`#set up obs as random draw from binomial distribution, 0.3 frequency of yes`

`obspop = np.random.binomial(1,0.3,1000)`

`#set up fcst as random draw from uniform distribution, half between 0.1 and 0.5`

`# the rest between 0 and 1`

`fcstpop=np.random.uniform(0.1,0.5,500)`

`fcstpop = np.append(fcstpop,np.random.uniform(0.,1.,500))`

`#set up arrays for lower and upper bounds on categories`

`lower = np.arange(0.,1.,.1)`

`upper = np.arange(.1, 1.1, .1)`

`n = len(obspop)*1.0`

`print 'lower',lower`

`print 'upper',upper`

`#initialize variables with zeroes`

`# N is the total number of forecast in each category`

`N = np.zeros(len(lower))`

`# pf is the marginal distribution of forecasts`

`pf = np.zeros(len(N),dtype='d')`

`# xbari is the average observation given the corresponding forecast was in a category`

`xbari = np.zeros(len(N),dtype='d')`

`# fi is the average forecast given that forecast was in a category`

`fi = np.zeros(len(N),dtype='d')`

`# loop over all of the catgories, find N, pf, xbari, fi`

`for i in np.arange(len(lower)):`

` N[i]=np.sum(np.logical_and(fcstpop>=lower[i], fcstpop<upper[i]))`

` pf[i]=N[i]/n`

` xbari[i]=np.mean(obspop[np.logical_and(fcstpop>=lower[i], fcstpop<upper[i])])`

` fi[i]=np.mean(fcstpop[np.logical_and(fcstpop>=lower[i], fcstpop<upper[i])])`

`#overall unconditional mean obs`

`xbar = np.mean(obspop)`

`#components of BS`

`reliability = np.sum(N*(fi-xbari)**2)/n`

`resolution = np.sum(N*(xbari-xbar)**2)/n`

`uncertainty = xbar*(1-xbar)`

`# print stuff`

`print 'N',N`

`print 'pf',pf`

`print 'xbari',xbari`

`print 'fi',fi`

`print 'reli',reliability`

`print 'reso',resolution`

`print 'uncer',uncertainty`

`print 'BSdecomp =',reliability - resolution + uncertainty`

`print 'BS =',np.mean((fcstpop-obspop)**2)`

`quit()`

graphical way of looking at the conditional distributions using the likelihood-base rate factorization, POD vs POFD for a fixed observed event, varying the forecast threshold across all possible values

## 2 cones

let's plot ROC curves for "2 cones" that we can control, as in Baldwin and Kain (2006)

first set up the forecast "cone"

`#!/usr/bin/env /opt/EPD/bin/python`

`import numpy as np`

`import matplotlib.pyplot as plt`

`# set up x,y coords`

`# x and y will be 2-D arrays that vary between 0 and 1000`

`# x[i,j] holds the x coordinate`

`# y[i,j] holds the y coordinate`

`x,y=np.mgrid[0.:1001.,0.:1001.]`

`# n is the total number of points in the grid`

`n=1001.*1001.`

`# fcst circle characteristics`

`# x, y coords of center of cone`

`xf0=250.`

`yf0=500.`

`# amplitude of forecast cone`

`Af=10.`

`# radius where value of forecast hits zero`

`Rf=250.`

`# rr is the radial distance from the center coordinates`

`rr=np.sqrt((x-xf0)**2+(y-yf0)**2)`

`# fcst is a 2-D array that contains fcst values`

`fcst=Af*(Rf-rr)/Rf`

`#fcst[rr>Rf]=0.`

`# set all negative values of fcst to zero, optional`

`# plot contours so we can see what this forecast looks like`

`plt.contour(x,y,fcst)`

`plt.show()`

`# obs circle, similar characteristics as fcst, keeping y coord of center the same`

`# but shifting the x coordinate, similar to Baldwin and Kain (2006)`

`xo0=500.`

`yo0=500.`

`Ao=10.`

`Ro=250.`

`# observed event threshold, this must be held fixed`

`#`

`To=0.`

`#`

`# compute obs values`

`rr=np.sqrt((x-xo0)**2+(y-yo0)**2)`

`obs=Ao*(Ro-rr)/Ro`

`#obs[rr>Ro]=0.`

`#plot the discrimination diagram, two conditional fcst distributions given obs=yes, obs=no`

`#plt.hist([fcst[obs>To],fcst[obs<=To]])`

`#plot the forecast errors`

`#plt.contour(x,y,fcst-obs)`

`#plt.show()`

`# max min values of fcsts`

`fmin=np.min(fcst)`

`fmax=np.max(fcst)`

`print fmin,fmax`

`# set up an array of forecast thresholds that vary from the min to max, with ~100 pts in between`

# Tf=thresholds for converting continuous fcst to yes/no

`step=(fmax-fmin)*0.01`

`Tf=np.arange(fmin,fmax,step)`

`# set up arrays of POD and POFD with zeroes`

`POD=np.zeros(len(Tf))`

`POFD=np.zeros(len(Tf))`

`# loop across the forecast thresholds, find regions in Venn diagram to determine `

`# 2x2 contingency table components for each forecast threshold, then find POD, POFD`

`for i in np.arange(len(Tf)):`

` a=np.logical_and(fcst>Tf[i],obs>To)`

` b=np.logical_and(fcst>Tf[i],obs<=To)`

` c=np.logical_and(fcst<=Tf[i],obs>To)`

` d=np.logical_and(fcst<=Tf[i],obs<=To)`

` aa=np.sum(a)/n`

` bb=np.sum(b)/n`

` cc=np.sum(c)/n`

` dd=np.sum(d)/n`

` POD[i]=aa/(aa+cc)`

` POFD[i]=bb/(bb+dd)`

# plot ROC curve

`plt.plot(POFD,POD,'+-')`

`plt.show()`

dd