By Kim Hoogewind

I used Python to compute the joint,
conditional, and marginal distributions of the given forecast and
observations. I will now attempt a short crash course in Python and how
it may be used to solve the problems presented in the assignment.

__Python Makes Programming... FUN!__

__Python Makes Programming... FUN!__

**that goes through some of the basics. Also, I**

*here***recommend taking a look at Johnny Lin's new intro book to Python in the Atmospheric and Oceanic Sciences (you can get a free e-copy**

*highly**).*

**here**__Python Script__

__Python Script__

`./mypythonscript.py) rather than running the script by "python mypythonscript.py".`

`With Python there are many different libraries or modules that you may import, many of which come standard. The major package that I use in this example (and in nearly all of my work) is NumPy. NumPy is an excellent package for scientific computing, and in particular for working with arrays. To import a module, you simply type "import" and then the module name. Here, you have the option to import the module as a different name, which is quite helpful if the module name is very long, such that you do not have to retype it later on when you call a function or an instance from it later on.`

**Example:**

`This is what I have at the very top of my script. While it is not necessary to import all of your required modules at the top of your code (you can do it anywhere really), I find it to be good practice.`

`#!/usr/bin/env python`

`import numpy as np`

__Reading in the data__

Example:

Example:

**This is the section of my code where I was reading in the observed and forecasted temperatures from the provided csv file. Here, I am going to read in the csv file (it's comma delimited, so delimeter=','), and since the first row is simply the column headers, I would like to skip that (skiprows=1). Since the observed temperature are in column 3 and the forecasted temperatures in column 6, those are two columns I would like to read in. With the usecols argument, I can do this. Unlike the skiprows specification though, you must remember that 0 (zero) is an index, and here would represent the first column. Thus, specifying 2 and 5 would refer to columns 3 and 6, respectively. Lastly, I set unpack to True such that I can assign column 3 to the obs array and column 5 to the fcst array. As a final consideration, it is always good practice to close the file that you opened once you are done with it.**

`f1 = open('/path/to/data/HW_2.csv','r')`

`obs,fcst = np.loadtxt(f1,delimiter=',',skiprows=1,usecols=(2,5),unpack=True)`

`f1.close()`

__Calculate Joint Probability Distribution__- What is really great about Python, NumPy in particular, is that array operations do not require you to loop through all of the array indices when performing calculations. Operations will automatically be applied to each location within the array.

- To calculate the Joint Probability Distribution, I made use of NumPy's
**2d histogram**function - This function requires you to specify the 2 datasets you wish to use as well as the number of bins in each dimension
- For this example, I set up the bins to be a 1d array (or list) of numbers from 10 to 100 in intervals of 10
- This was accomplished using the numpy
function, which returns evenly spaced values for the given interval (see reference #2)*arange*

- Alas, the joint probability distribution is simply the resulting 2d histogram array divided by sum of the elements of that array
- To accomplish this, I simply made use of the
function in NumPy which will sum up all of the elements of the array*sum* - This function also has the ability to sum rows and/or columns (axis specification), which will come in handy when calculating marginal probabilities

**Example:**

Below is the * 2 lines* (!) of code that created the joint probability distribution for the temperature forecasts using 10ºF bins. The output array of the 2d histogram is stored as "hist10" while the other two variables simply represent an array(list) of the bin edges in each dimension. Finally, I simply divided hist10 by the sum of the elements of hist10 to produce the final joint probability distribution, which here is named p10.

** Note: **the order of the two data sets depends upon which one you would like to be represented by columns or rows. To be consistent with Murphy and Winkler's (1987) example (e.g. their Table 1), the forecast will be the columns and the observations will be the rows. Because Python is row-major, the observations are then specified first within the histogram function.

`hist10, o10_bin_edges, f10_bin_edges = np.histogram2d(obs,fcst,bins=np.arange(10,101,10))`

`p10 = hist10/(np.sum(hist10))`

`hist10.sum()`

`hist10, o10_bin_edges, f10_bin_edges = np.histogram2d(obs,fcst,bins=[2,`

`np.arange(10,101,10)`

`])`

__Calculate Marginal probabilities__- To calculate the marginal probabilities, we simply want to sum the rows/columns of the joint probability distribution
- As aforementioned, this may be accomplished using the NumPy sum function, where we can specify which dimension, or axis, of the array we would like to sum
- To sum each row, axis =1
- To sum each column , axis =0

**Example:**

**which simply adds a dimension/axis. For example, to calculate p(x), I sum over rows fore each of the columns and store it in array p_x. When I include [:,np.newaxis], it essentially structures p_x such that it has the same number of rows as the joint probability distribution array (p10) but is one column. Essentially, this step is equivalent to reshaping the array to the desired dimensionality, which may also be accomplished using NumPy**

*newaxis*

*reshape*

`p_x = np.sum(p10,axis=1)[:,np.newaxis]`

`p_f = np.sum(p10,axis=0)[np.newaxis,:]`

`p_x = np.sum(p10,axis=1)`

`p_x = np.reshape(p_x,(len(p_x),1))`

`p_f = np.sum(p10,axis=0)`

`p_f = np.reshape(p_f,(1,len(p_f)))`

**function which will return the length, or number of items, in a list/array/tuple.**

*len*__Calculate Conditional Probabilities__

- The likelihood base rate and calibration refinement may be calculated simply by dividing the joint probability distribution by the appropriate marginal probability.
- This process will divide each row/column of p10 element-wise by the respective marginal probability array--this uses some behind the scenes
such that the computation may be made between arrays of different shape.**broadcasting**

`LBR = p10/p_x`

`CR = p10/p_f`

__Writing out the arrays to a text file__`Though there area few work clunky workarounds in Matplotlib, there is no real good way to create a table in Python (one of only a few negative things I can say about Python). As such, I decided to simply write out the resulting arrays into a delimited text file. Before writing out the Joint Probability distribution, I wanted to append the marginal probabilities to it, so I could write out one big array for easier table creation.`

- Use NumPy
to stack the joint probability distribution array and the marginal probability*vstack**p(f)*vertically (row-wise) to create one array

`p10 = np.vstack((p10,p_f))`

` `

- We would like to do the same thing but horizontally (column-wise) this time to join the new p10 array and the marginal probability
*p(x)* - However, since the joint probability distribution array now has an extra row, the dimensions will not line up
- So, I appended a 1.0 to the end of p(x) (using NumPy
)*append* - Append will flatten the array, so it needs to be reshaped it to its original dimensionality before it can be stacked with p10
- From there, NumPy
can be used add the column*hstack*

`p_`

`x = np.append(p_x,1.0)`

`p_x = np.reshape(p_x,(len(p_x),1))`

`p10 = np.hstack((p10,p_x))`

- To write out data, I simply used the
function from NumPy.*savetxt* - Instead of a writing out the data separated by commas (you would set delimiter=','), I decided to write the data out to a tab delimited file (delimiter='\t') for easier readability when I open it using a text editor. I also formatted the output such that it was rounded to four decimal places to enhance readability.

`np.savetxt("contingency_10bin.csv",p10,fmt="%.4f",delimiter='\t')`

`np.savetxt("LBR_10bin.csv",LBR,fmt="%.4f",delimiter='\t')`

`np.savetxt("CR_10bin.csv",CR,fmt="%.4f",delimiter='\t')`

__Calculating Forecast Bias__**function and basic python subtraction.**

*mean*

`fcst_mean = np.mean(fcst)`

`obs_mean = np.mean(obs)`

`bias = fcst_mean - obs_mean`