link to my programming story in the course files I need to provide a link to the best web page that I know of that contains forecast verification methodology information: I have used some of the equations from that page (since the Google docs equation editor doesn't translate to Google sites) # basically, the stuff in this courier font is code # you should be able to copy and paste it into the python shell # things that begin with # are comment lines and are not executed pythonwhat is python? http://pyaos.johnny-lin.com/ open a terminal and start python interactively in the computer lab (linux) type: /opt/EPD/bin/python also open a terminal to run python scripts edit a script, keep the editor open, save and run the script print 'hello world' it is always good to know how to get out of a program...try: quit() try these to see how python handles mathematical expressions and integer data types print 5*4 print 6/2 print 13+5+7 print 6**2 a=5/2 print a dir(a) variables, math, logicals a=3.5 b=-2.1 c=3 d=4 a*b b+c a/c c/d d**c a>b a<=b a==b a!=b strings s1='hello' s2='world' s1 + s2 s1 + ' ' + s2 + "!" lists a=[2, 3.2, 'hello', 'eeee'] len(a) a[0] a[1:3] a[1]='goodbye' tuples (lists that can’t be changed) b=(2, 3.2, 'goodbye', 'aaaa') b[0] continuous variablesspecific values, such as temperature, wind speed, probability of precipitation following the Murphy & Winkler general framework - joint distribution approach modules and packages package is a collection of modules, I’ll probably use these interchangeably import module_name import numpy p=numpy.pi print p numpy.sin(p/2) import numpy as np print np.pi we’re going to generate some fake observations and forecasts and do some quick calculations mu=50 sigma=20 obs=np.random.normal(mu,sigma,1000) help(np.random.normal) dir(np.random) np.max(obs) np.min(obs) np.mean(obs) np.median(obs) randerr=np.random.normal(-1,10,1000) syserr=10.0-0.2*obs fcst=obs+randerr+syserr np.max(fcst) np.min(fcst) np.mean(fcst) np.median(fcst) mean error (bias) error=fcst-obs np.mean(fcst-obs) np.mean(error) positive: forecast is on average above the obs (too warm) negative: forecast is on average below the obs (too cold) mean absolute error np.mean(np.abs(error)) typical magnitude of the error mean square error np.mean(error*error) np.mean(error**2) more sensitive to large errors (outliers) a single bad forecast (or bad ob) will make this large root mean squared error same units as forecast, obs another typical magnitude of error (sensitive to large errors) mse=np.mean(error**2) rmse=np.sqrt(mse) print rmse let’s use joint distribution approach and make a scatter plot forecast values vs. observed values (f,x) to be consistent with contingency table, obs=x coord, fcst=y coord need to import another package that has nice plotting modules import matplotlib.pyplot as plt matplotlib is a Matlab-style plotting package, since I’m familiar with Matlab, it’s easy for me to learn plt.figure() plt.plot(obs,fcst,'.') plt.hold() plt.xlabel('obs') plt.ylabel('fcst') plt.show() we’re looking at p(f,x) from the sample could do some kernel density estimation to estimate p(f,x) in between points or turn this into a categorical forecast verification and use contingency tables can also plot: errors vs forecast values plt.plot(fcst-obs,obs,'.') plt.show() errors vs observed values plt.plot(fcst-obs,obs,'.')
plt.show() this can reveal relationships between the errors and particular predicted or observed values sort of a conditional distribution do we underforecast stronger observed winds? do we overforecast cold observed temperatures? do errors increase in magnitude for rare events? also useful to examine the distribution of forecast and observed values let's look at histograms, which are empirical estimates of the PDF plt.hist(obs) plt.show() plt.hist(fcst) plt.show() plt.hist([obs,fcst]) plt.show() another useful way to look at these distributions is by plotting the empirical cumulative distribution function (CDF) CDF: p(x)=Pr{X<=x} = integral of PDF(x) from -inf to x to do this: 1. sort the data (x) from low to high fcstsort=np.sort(fcst) obssort=np.sort(obs) 2. estimate CDF=p(xi) p(xi)=(i-a)/(n+1-2a) Tukey a=1/3 Gumbel a=1 a=1./3. n=1000. i=np.arange(1.0,1001.0,1.0) px=(i-a)/(n+1.-2.*a) 3. plot sorted data (x) vs p(x) plt.plot(fcstsort,px,'r',obssort,px,'b') plt.show() for a Normal distribution, this will look a bit like the side of a hill starts at zero, ramps up slowly at first as we move along the left tail of the distribution, then ramps up quickly as we move across the main part of the distribution, then ramps up slowly again towards 1.0 as we move along the right tail of the distribution for a uniform distribution, the CDF will be a straight line Linear Error in Probability Space (LEPS) ![]() basically look up the CDF of the obs value and compare that to the CDF of the forecast value (using the same observed distribution to define those CDFs)...take the absolute value of the difference, and average all of those when the underlying observed distribution is uniform, the CDF is linear, this score is basically the same as MAE HOWEVER, when the underlying distribution has tails or skewness, like precipitation for example, large differences in actual values will result in small differences in LEPS when you are comparing values out in the "tail" of a particular distribution. On the other hand, when you compare values near the "bulge" of the distribution, small differences in actual values will result in large differences in LEPS. this is similar to saying that a forecast for 50mm when 25mm is observed is still relatively "good", while a forecast for 1mm when 10mm is observed is relatively bad, even though the actual difference in the first case is larger than in the second...a forecast of a rare event should be "graded easier" than a forecast for a common event cdfobs=px cdffcst=np.interp(fcstsort,obssort,px) here we are using numpy's interpolation function to linearly interpolate the CDF values in the obs distribution to the forecast values. this interpolation function does not extrapolate by default, so any forecast values greater than the observed maximum will be assigned to CDF=0.999 (in this example with sample size=1000) and any forecast values less than the observed minimum will be assigned CDF=0. now we can calculate the LEPS leps=np.mean(np.abs(cdffcst-cdfobs)) print leps doing a scatter plot of these two distributions is what is known as a Q-Q plot (quantile-quantile plot), which is a exploratory data analysis procedure for determining if two samples are from the same distribution, if they are, this will fall along the main diagonal. this is similar to what we would expect the forecast performance to be if we were to do verification following recalibration of the forecasts to remove bias plt.plot(cdfobs,cdffcst,'.') plt.show()
skill score SS_mse = 1 - MSE_forecast/MSE_reference for a reference forecast = climatology where O_bar is the mean observed value, often called "climatology" questions: should this be the mean observation from the verification sample? should this be a long-term climatology? should this be the overall/global mean observation, or should this vary spatially/temporally? error=fcst-obs mse=np.mean(error**2) anomaly=np.mean(obs)-obs mseclimo=np.mean(anomaly**2) ssmse=1-mse/mseclimo print ssmse print mse print mseclimo
correlation coefficient can also be used as a measure of "association" (aspect of forecast quality as defined by Murphy 1993) correlation is not sensitive to bias, a biased forecast could have perfect correlation, so correlation by itself should not be used as a single measure of quality A=np.corrcoef(fcst,obs) r=A[0,1] print A print r this returns a correlation matrix, the values in the diagonal should be = 1, these indicate that the forecasts are perfectly correlated with themselves, and the observations are perfected correlated with themselves. A is a 2x2 matrix, we can refer to the i,j element of A either with print A[1,0] print A[1][0] in general A=np.corrcoef(x0,x1) returns: A1,0 element = A0,1 element = the correlation between x0 (the forecasts) and x1 (the observed values) as a reminder, the definition of the correlation coefficient is the covariance between two variables (X and Y), divided by the product of their standard deviations: Murphy (1988) showed how the MSE can be rewritten in terms of: means, standard deviations, and correlation between forecasts and observations: where sf, sx are the sample standard deviations of the forecast and observations, respectively and rfx is the correlation between the forecasts and observations
note that for a perfect forecast, sf = sx and rfx = 1, therefore the fourth term would cancel out the second and third terms this is telling us: to reduce your MSE...reduce your bias and increase your correlation with the observations this may seem quite obvious, but here we have the math to explain how this works in a similar vein, we can decompose the skill score based on MSE here, the first term is the square of the linear association between the forecasts and observations, a good forecast will have a higher number for this term the second term is related to reliability/conditional bias...it is also negatively oriented, so a good forecast will have a small magnitude for this term
the third term is related to the overall bias of the forecast system, a good forecast will have a small number for this term sx = np.std(obs) sf = np.std(fcst) fbar=np.mean(fcst) xbar=np.mean(obs) print r**2 print (r-sf/sx)**2 print ((fbar-xbar)/sx)**2 ssmse2=r**2- (r-sf/sx)**2- ((fbar-xbar)/sx)**2 print ssmse2 let's try to see how these numbers will change if we examine a biased set of forecasts and a set of forecasts that have larger conditional bias Probability forecastsProbability forecasts can be considered a subset of continuous forecasts since the forecast can have any value between 0 and 1 (or 0-100%). Typically the probability of a specific event, such as precipitation greater than or equal to 0.01" (POP) therefore, the observations can only have two values: 0 or 1 obspop = np.random.binomial(1,0.3,1000) this will return 1000 numbers, either 0 or 1, drawn from a binomial distribution with a 0.3 probability of returning 1 for each let's generate totally random forecasts, from a uniform distribution say that our forecasts are not very sharp, they only vary between 0.1 and 0.5 fcstpop=np.random.uniform(0.1,0.5,1000) check the mean values to see if they are both close to 0.3 np.mean(obspop) np.mean(fcstpop) also, as we have already noted, the forecasts are often categorized by placing the continuous numbers into discrete bins, we have used categorical methods to verify these kinds of forecasts already MSE of probability forecast is known as the Brier Score (BS) for n pairs of forecast/observed events where xk =1 when the observed event occurs, xk = 0 when the observed event does not occur pretty easy to compute in python error=fcstpop-obspop BS=np.mean(error**2) print BS
this is an error, so for a perfect forecast: BS = 0 (no "B.S.") Brier Skill Score (BSS) BSref = usually climatology BS can be decomposed into terms related to:
calibration-refinement factorization from Murphy and Winkler (1987) let's look at the first 5 numbers in each forecast and observation array fcstpop[:5] obspop[:5] let's say the forecast values are indicated by i=1,M categories for example, let's say the i=1 category is for 0 < f <= 0.2 total number of events = n n=len(obspop) let Ni be the number of times each categorical fi was issued let's look at some logical tests on our first 5 forecast values fcstpop[:5]<0.2 this will return a "boolean" array of True and False for each value in the fcstpop array after testing if they are <0.2 we can take advantage of this array of True and False to do conditional calculations, a simple sum treats True =1 and False = 0 np.sum(fcstpop[:5]<0.2) since most categories are greater than or equal to a lower limit and less than an upper limit, we use the numpy logical_and function to do this: np.logical_and(fcstpop[:5]>=0, fcstpop[:5]<0.2) np.sum(np.logical_and(fcstpop[:5]>=0, fcstpop[:5]<0.2)) (should give the same answer as before since fcstpop are all non-negative) take away the array index that was limiting things to the first 5 numbers to see the number of forecasts that were issued in this category: Ni=np.sum(np.logical_and(fcstpop>=0, fcstpop<0.2)) print Ni the marginal distribution of forecast values is given by: relative frequency of each forecast category Ni=np.sum(np.logical_and(fcstpop>=0, fcstpop<0.2)) pfi = Ni*1.0/n print pfi here I multiplied Ni by 1.0 to get this into a real number, otherwise it will do integer division we can also list the forecast values only where the forecasts fall into our category: fcstpop[np.logical_and(fcstpop[:5]>=0., fcstpop[:5]<0.2)] this nice thing about this is that we can also list the observed values that correspond to those specific forecast values, only where the forecasts fall into our category, simply by indexing the observations by the logical test on the forecasts: obspop[np.logical_and(fcstpop[:5]>=0., fcstpop[:5]<0.2)] now let's look at the conditional probability that x=1 given f=fi , which is: sum/average the observed values for only those occasions where fi is issued conditional average observation, expected value of x given fi using numpy, this is simply: np.mean(obspop[np.logical_and(fcstpop[:5]>=0., fcstpop[:5]<0.2)]) #now do the entire array np.mean(obspop[np.logical_and(fcstpop>=0., fcstpop<0.2)]) recall that this is getting at reliability: we want the frequency of observation = yes to be equal to the forecast category for a reliable forecast system fil unconditional sample climatology: decomposed Brier score: BS = "reliability" + "resolution" + "uncertainty" let's set up a script to calculate the components of the Brier score using our POPs and forecast categories every 20% edit a file called "bspop.py", you can use "gedit bspop.py" or vi or emacs if you are comfortable with those editors the first line will execute our version of python everything following will be python code that will be executed just like it would be if we were to type it into the terminal let's start with a 'hello world' (of course) #!/usr/bin/env /opt/EPD/bin/python print 'hello world' quit() save this file, then make sure that the permissions are set so you can execute the file: chmod 777 bspop.py
./bspop.py
now we can edit our script, save it, then execute it this will allow us to build up our python code piece-by-piece let's play around with looping, python can do looping with the "for <index> in <list>" command add this to your script: import numpy as np for k in [7, -3.3, np.pi, 'hello']: print k a = [7, -3.3, np.pi, 'hello'] for i in np.arange(4): print i, a[i] print np.arange(4) save this, an execute it to see what you get here is where some python syntax comes into play, when you use a "for" loop (or an "if" test), the end of that statement will need a colon ":", the block of code that follows must be indented to indicate that this block of code will be what is executed in the loop. the indents must be consistent within that block of code. that block of code is terminated by the next line of python that is NOT indented in the first loop, you see how this for loop works, it simply steps through the provided list sequentially, executing the intended block of code that follows. the list can contain any type of data in the list, here we've provided an integer, float, numpy object, and a string the second loop uses the same list, but uses a numpy array (np.arange) that sets up a simple array of sequential numbers, the for loop steps through those numbers, in this case 0 to 3 (four numbers in total). the block of code prints the loop index, then refers to array "a" by that i a numpy "array" is a special kind of list that only contains one type of variable, say "short" integers or a double-precision floating point numbers. using arrays is faster than regular lists since all of the members of that list are of the same type, there is less overhead in python when operating with arrays than with regular lists you can define a list to be a numpy array using: mylist = np.array([2, 3, 5], dtype = 'i') the first argument is the list, the second argument is the data type, use:
I will need to double check this, but I believe double-precision floats are 64-bit, and long integers are 32-bit, this probably depends on the machine you're using it is usually a good idea to "initialize" an array that you will be working with to set the values to all zeros, using the numpy.zeros function a = np.zeros(1000, dtype='d') where the first argument is the shape of the array try: b = np.zeros((3,2), dtype='d') np.shape(b) np.rank(b) np.size(b) np.dtype.char let's try some other common array manipulation a = np.arange(6) print a b= np.reshape(a,(3,2)) print b c=np.transpose(b) print c a = np.append(a,[7,8]) print a aplus=np.concatenate((a,[10,11])) print aplus aaa=np.repeat(a,3) print aaa let's try to use the looping and arrays to calculate the components of the BS decomposition, try the following python code, make sure that you understand how each line works #!/usr/bin/env /opt/EPD/bin/python import numpy as np
#set up obs as random draw from binomial distribution, 0.3 frequency of yes obspop = np.random.binomial(1,0.3,1000) #set up fcst as random draw from uniform distribution, half between 0.1 and 0.5 # the rest between 0 and 1 fcstpop=np.random.uniform(0.1,0.5,500) fcstpop = np.append(fcstpop,np.random.uniform(0.,1.,500))
#set up arrays for lower and upper bounds on categories lower = np.arange(0.,1.,.1) upper = np.arange(.1, 1.1, .1) n = len(obspop)*1.0 print 'lower',lower print 'upper',upper
#initialize variables with zeroes # N is the total number of forecast in each category N = np.zeros(len(lower)) # pf is the marginal distribution of forecasts pf = np.zeros(len(N),dtype='d') # xbari is the average observation given the corresponding forecast was in a category xbari = np.zeros(len(N),dtype='d') # fi is the average forecast given that forecast was in a category fi = np.zeros(len(N),dtype='d') # loop over all of the catgories, find N, pf, xbari, fi for i in np.arange(len(lower)): N[i]=np.sum(np.logical_and(fcstpop>=lower[i], fcstpop<upper[i])) pf[i]=N[i]/n xbari[i]=np.mean(obspop[np.logical_and(fcstpop>=lower[i], fcstpop<upper[i])]) fi[i]=np.mean(fcstpop[np.logical_and(fcstpop>=lower[i], fcstpop<upper[i])]) #overall unconditional mean obs xbar = np.mean(obspop) #components of BS reliability = np.sum(N*(fi-xbari)**2)/n resolution = np.sum(N*(xbari-xbar)**2)/n uncertainty = xbar*(1-xbar) # print stuff print 'N',N print 'pf',pf print 'xbari',xbari print 'fi',fi print 'reli',reliability print 'reso',resolution print 'uncer',uncertainty print 'BSdecomp =',reliability - resolution + uncertainty print 'BS =',np.mean((fcstpop-obspop)**2) quit() ROC curves http://wise.cgu.edu/sdtmod/measures6.asp graphical way of looking at the conditional distributions using the likelihood-base rate factorization, POD vs POFD for a fixed observed event, varying the forecast threshold across all possible values 2 coneslet's plot ROC curves for "2 cones" that we can control, as in Baldwin and Kain (2006) first set up the forecast "cone" #!/usr/bin/env /opt/EPD/bin/python import numpy as np import matplotlib.pyplot as plt # set up x,y coords # x and y will be 2-D arrays that vary between 0 and 1000 # x[i,j] holds the x coordinate # y[i,j] holds the y coordinate
x,y=np.mgrid[0.:1001.,0.:1001.] # n is the total number of points in the grid n=1001.*1001. # fcst circle characteristics # x, y coords of center of cone xf0=250. yf0=500. # amplitude of forecast cone Af=10. # radius where value of forecast hits zero Rf=250. # rr is the radial distance from the center coordinates rr=np.sqrt((x-xf0)**2+(y-yf0)**2) # fcst is a 2-D array that contains fcst values fcst=Af*(Rf-rr)/Rf #fcst[rr>Rf]=0. # set all negative values of fcst to zero, optional
# plot contours so we can see what this forecast looks like plt.contour(x,y,fcst) plt.show()
# obs circle, similar characteristics as fcst, keeping y coord of center the same # but shifting the x coordinate, similar to Baldwin and Kain (2006) xo0=500. yo0=500. Ao=10. Ro=250. # observed event threshold, this must be held fixed # To=0. # # compute obs values rr=np.sqrt((x-xo0)**2+(y-yo0)**2) obs=Ao*(Ro-rr)/Ro #obs[rr>Ro]=0.
#plot the discrimination diagram, two conditional fcst distributions given obs=yes, obs=no #plt.hist([fcst[obs>To],fcst[obs<=To]])
#plot the forecast errors #plt.contour(x,y,fcst-obs) #plt.show()
# max min values of fcsts fmin=np.min(fcst) fmax=np.max(fcst) print fmin,fmax
# set up an array of forecast thresholds that vary from the min to max, with ~100 pts in between # Tf=thresholds for converting continuous fcst to yes/no step=(fmax-fmin)*0.01 Tf=np.arange(fmin,fmax,step) # set up arrays of POD and POFD with zeroes POD=np.zeros(len(Tf)) POFD=np.zeros(len(Tf))
# loop across the forecast thresholds, find regions in Venn diagram to determine # 2x2 contingency table components for each forecast threshold, then find POD, POFD for i in np.arange(len(Tf)): a=np.logical_and(fcst>Tf[i],obs>To) b=np.logical_and(fcst>Tf[i],obs<=To) c=np.logical_and(fcst<=Tf[i],obs>To) d=np.logical_and(fcst<=Tf[i],obs<=To) aa=np.sum(a)/n bb=np.sum(b)/n cc=np.sum(c)/n dd=np.sum(d)/n POD[i]=aa/(aa+cc) POFD[i]=bb/(bb+dd) # plot ROC curve plt.plot(POFD,POD,'+-') plt.show() dd |
Purdue High Impact Weather Laboratory > Archive > Forecast Verification - EAS 591 - Fall 2012 >