Dawson - Matlab How To

For the assignment of calculating various probability distributions with given forecast and observed data, I wrote a script with Matlab to handle the process of making counts from the data and calculating probabilities based on those counts. This initially required the forecast and observed data to be added as arrays in Matlab using the “Import Data” tool. After these data were added, the steps listed below were taken.

The blocks of code show examples of how this work could be done with Matlab. These examples assume one set of forecasts and corresponding observations for a dichotomous (yes or no) event. Therefore, a 2x2 contingency table is used to calculate the respective probability distributions.

1. Make a scatter plot of the joint distribution of forecasts and observations

  • The figure command simply creates a new figure, rather than overwriting an existing one
  • Use the scatter function to make the scatter plot
    • The '.' defines the type of marker on the scatter plot
  • Use the title, xlabel, and ylabel functions to define the title and axes labels for the plot
title('Joint Distribution of Forecasts and Observations')

2. Create 2-D array to represent the contingency table (R-forecasts, C-observations)

  • Use 'for loop' to parse through rows of forecast and observation data and establish counts for each element of the contingency table
  • Use ‘if statements’ to determine the row for the forecast and the column for the observation
  • Within the if statements, define row (i) and column (j) index for the appropriate forecast and observation categories
  • After each iteration of the for loop, add 1 to the value in the appropriate contingency table cell
contingency_table(1:2,1:2) = 0;

for k=1:length(forecasts);
    if forecasts(k) == 1
        i = 1;
        i = 2;
    if observations(k) == 1
        j = 1;
        j = 2;
    contingency_table(i,j) = contingency_table(i,j) + 1;

3. Calculate joint distribution

  • Create joint distribution array
    • Should have same dimensions as contingency table
  • Divide all elements in the contingency table by N
    • N = number of forecast-observation pairs
joint(1:2,1:2) = 0;
joint(:,:) = contingency_table(:,:)/N;

4. Calculate marginal distributions

  • Create forecast and observation count arrays
    • Should have dimensions of number of rows for forecast and columns for observations
  • Use for loop to sum up the rows and columns of contingency table to make count arrays
  • Create marginal distribution arrays
    • Should have same dimensions as count arrays
  • Use for loop to parse through count arrays and divide each count by the total number of forecasts/observations
fcount(1:2,1) = 0;
xcount(1,1:2) = 0;

for i=1:length(contingency_table)
    fcount(i,:) = sum(contingency_table(i,:));
    xcount(:,i) = sum(contingency_table(:,i));

pf(1:2,1) = 0;
px(1,1:2) = 0;

for i=1:length(pf);
    pf(i,1) = fcount(i,1)/length(forecasts);
    px(1,i) = xcount(1,i)/length(observations);

5. Calculate conditional distributions

  • Create conditional distribution arrays
    • Should have same dimensions as contingency table/joint distribution
  • Use nested for loops to count through marginal distributions of forecasts and observations
  • Use if statement to account for the possibility of dividing by a joint probability of zero
  • When p(x) and/or p(f) is not equal to zero, divide each element of the joint distribution by the appropriate marginal distribution
pfx(1:2,1:2) = 0;
pxf(1:2,1:2) = 0;

for i=1:length(pf);
    for j=1:length(px);
        if px(j) == 0
            pfx(i,j) = NaN;
        elseif pf(i) == 0
            pxf(i,j) = NaN;
            pfx(i,j) = joint(i,j)/px(j);
            pxf(i,j) = joint(i,j)/pf(i);

6. Calculate forecast bias

  • Calculate average value of all forecasts using mean function
  • Calculate average value of all observations using mean function
  • Subtract average observed value from average forecast value
avg_f = mean(forecasts(:));
avg_o = mean(observations(:));
bias = avg_f - avg_o;