Spatial verification  Cluster Analysis (CA) method 1. Introduction Verification of spatial forecasts over a domain can not be simply done by girdtogrid comparison of forecast field and observation field. Regarding to spacetospace comparison, three typical techniques are usually employed (1) Clustering analysis (2) Variogram/Correlation comparison (3) Optical flow, all of them are based on overall statistical characteristics of forecast fields and observation fields to some extent, other than values of variable at each specific grid point. Here I am going to focus on the first method, Clustering Analysis(CA)[Marzban and Sandgathe 2006, 2008; Marzban et al., 2008], apply it to an accumulative precipitation data set of forecast and observations over US. The CA method aims to identify featured objects/events within a given field. Each identified object corresponds to a cluster candidate, and the CA method is designed in such a way that the distance within each cluster candidate are minimized and also the distance between different cluster candidates are maximized. With the help of CA method, objects or events are found out for forecast field and observation field. The comparison between forecast field and observation field is done by comparing the dissimilarity of featured objects from the two fields. 2. Methodology
(1) Averaged distance n1 is the number of grid points within an object; n2 is number of grid points within another object. Di,j is the the Euclidean distance between point i in one object and point j in another object. By averaging all possible distance Di,j, we are able to obtain a reasonable representation of the distance between the two objects. (2) Minimal distance Notation are same as in (1) Averaged distance. The minimal distance is picked up to represent the distance between objects (3) Maximal distance
The normalization is done by subtracting the mean value and diving the standard deviation for each coordinates. x, y and p are vectors. For analyzing a real data set, x could be longitude, y could be latitude and p could precipitation amount. After conduction the normalization, the variables of x, y and p follow the standard normal distribution.
The threshold is obtained by (1) calculating all the possible pairwise distance between clusters from the two fields; (2) drawing histogram of the pairwise distance, it probably follow a bellshape distribution; (3) picking up a threshold value from the left tail of the histogram. 3. Data The data set includes NCEP observational cumulative precipitation and WRF forecasts cumulative precipitation over the US. The following two plots are for 20070501. As the CA method needs to calculate the pairwise distance between each grid points, I resample the data to 50 by 50 grid points to ease the computational burden. The original data set has 881 by 1121 grid points. But, the featured objects are not changed after resampling. The NCEP cumulative precipitation map has large strong precipitation cells in Southern MidUS, Western coast area and Northeastern US, while the WRF predicted cumulative precipitation map doesn't have precipitation field along the Western US coast area and it has an precipitation cell in Southeastern Florida that is not observed by NCEP data. 4. Results
This following table summarize the distance between all six objects (A~F) from observation field to all the other six objects from forecast field (A~F). The matched objects are highlighted by red color. Object C in observation filed matches object A in forecast field. Object B in observation filed matches object C in forecast field.
Object D in observation filed matches object E in forecast field.
Object A in observation filed is missed in forecast field.
Object E in observation filed is missed in forecast field. Object F in observation filed is missed in forecast field. Object B in forecast field is false alarmed. Object D in forecast field is false alarmed. Object F in forecast field is false alarmed.
We are able to calculate the False Alarm Ratio (FAR), Missed event, and Threat Score (TS) as follows: FAR = b/(a+b)=0.5 Missed = c/(a+c)=0.5 TS=a/(a+b+c)=0.33 5. Future works
6. References

Purdue High Impact Weather Laboratory > Archive > Forecast Verification  EAS 591  Fall 2012 > Final Exam >