cprnc README
------------

cprnc is a generic tool for analyzing a netcdf file or comparing
two netcdf files.


Quick Start Guide:
------------------

On cime supported systems you can generate a Macros file using
cime/machines/configure -mach $machinename -compiler $compilername
then run make to build cprnc and put the resulting executable in 
CCSM_CPRNC as defined in config_machines.xml   
You can also build cprnc using cmake. 

 Usage: cprnc  [-v] [-d dimname:start[:count]] file1 [file2]
 -m: Compare each time sample. Default is false, i.e. match "time" 
     coordinate values before comparing
 -v: Verbose output
 -d dimname:start[:count]
     Print variable values for the specified dimname index subrange.


Users Guide:
------------

cprnc is a Fortran-90 application. It relies on netcdf version 3 or
later and uses the f90 netcdf interfaces.  It requires a netcdf include
file and a netcdf library.  

cprnc generates an ascii output file via standard out.  It initially
summarizes some characteristics of the input file[s].  A compare file is
generally 132 characters wide and an analyze file is less than 80 
characters wide.

In analyze mode, the output for a field looks like

                      (   lon,   lat,  time, -----)
              259200  (   587,   134,     1) (   269,    59,     1)
 FX1           96369   8.273160400390625E+02  0.000000000000000E+00
            avg abs field values:   9.052845920820910E+01

and a guide to this information is printed at the top of the file

                      (  dim1,  dim2,  dim3,  dim4)
              ARRSIZ1 ( indx1, indx2, indx3) file 1
 FIELD        NVALID           MAX                   MIN


The first 10 characters of the field name are identified in the first
  dozen columns of the third line.  
The first line summarizes the names of the dimensions of the field
The second line summarizes the indices of the maximum and minimum value 
  of the field for the first three dimensions.  If the fourth dimension 
  exists, it's always assumed to be time.  Time is handled separately.
The third line summarizes the number of valid values in the array
  and the maximum and minimum value over those valid values.  Invalid
  values are values that are identified to be "fill" value.
The last line summarizes some overall statistics including the average 
  absolute value of the valid values of the field.

In comparison mode, the output (132 chars wide) for a field looks like

               96369  (   lon,   lat,  time)
              259200  (   422,   198,     1) (   203,   186,     1)           (    47,   169,     1)         (   224,   171,     1)
 FIRA          96369   1.466549530029297E+02 -3.922052764892578E+01   1.4E+02 -3.037954139709473E+01 1.0E+00 -3.979958057403564E+00
               96369   1.321966247558594E+02 -1.603044700622559E+01            1.084177169799805E+02          3.982142448425293E+00
              259200  (   156,    31,     1) (   573,   178,     1)           (
          avg abs field values:    6.778244097051392E+01    rms diff: 1.4E+01   avg rel diff(npos):  4.6E-02
                                   5.960437961084186E+01                        avg decimal digits(ndif):  1.2 worst:  0.0

and a guide to this information is printed at the top of the file

              NDIFFS  (  dim1,  dim2,  dim3,  dim4, ... )
              ARRSIZ1 ( indx1, indx2, indx3, ... ) file 1
 FIELD        NVALID1          MAX1                  MIN1            DIFFMAX  VALUES                RDIFMAX  VALUES
              NVALID2          MAX2                  MIN2
              ARRSIZ2 ( indx1, indx2, indx3, ...) file 2

The information content is identical to the information in analyze
mode with the following additions.  Two additional lines are added
in the main body.  Lines 4 and 5 are identical to line 3 and 2
respectively but are associated with file 2 instead of file 1.
In addition, the right hand side of lines 2, 3, and 4 contain
information about the maximum difference, the location and values
of the maximum difference, the relative difference and the location
and values of the maximum relative difference.  The last two line
summarize some overall statistics including average absolute values
of the field on the two files, rms difference, average relative 
difference, average number of digits that match, and the worst
case for the number of digits that match.

"avg rel diff" gives the average relative difference (sum of relative
differences normalized by the number of indices where both variables
have valid values). The denominator for each relative difference is
the MAX of the two values.

"avg decimal digits" is determined by: For each diff, determine the
number of digits that match (as -log10(rdiff(i)); add this to a
running sum; then normalize by the number of diffs (ignoring places
where the two variables are the same). For example, if there are 10
values, 8 of which match, one has a relative difference of 1e-3 and
one has a relative difference of 1e-5, then the avg decimal digits
will be 4.

"worst decimal digits" is simply log10(1/rdmax), where rdmax is the
max relative difference (in the above example, this would give 3).

At the end of the output file, a summary is presented that looks like

SUMMARY of cprnc:
 A total number of    119 fields were compared
          of which     83 had non-zero differences
               and     17 had differences in fill patterns
 A total number of     10 fields could not be analyzed
 A total number of      0 fields on file 1 were not found on file2
  diff_test: the two files seem to be DIFFERENT 


This summarizes:
- the number of fields that were compared
- the number of fields that differed (not counting fields that differed
  only in the fill pattern)
- the number of fields with differences in fill patterns
- the number of fields that could not be analyzed
- the number of fields that could not be found on the second file
- whether the files are IDENTICAL or DIFFERENT
  Files are considered DIFFERENT if there are differences either in the
  values or in the fill patterns of any variable.


Developers Guide:
-----------------

The tool works as follows.

Fields can be analyzed if they are int, float or double and 
have between 0 and n dimensions 

In general, fields that appear on both files are 
compared.  If they are sizes, no difference
statistics are computed and only a summary of the fields on
the files are presented.  If  fields only appear
on one file, those fields are analyzed.

The unlimited dimension is treated uniquely.  In general, for files
that have a dimension named "time", the time axes are compared
and matching time values on the two files are compared one
timestep at a time.  Time values that don't match are skipped.
To override the matching behaviour, use cprnc -m.  In this mode, 
timestamps are compared in indexical space.  In analyze mode,
the fields are analyzed one timestamp at a time.  In general,
if there is a "time" axis, it will be the outer-most loop in
the output analysis.  In compare mode, fields with a time axis
and a timestamp that are not common between the two files are
ignored.

It is also possible to compare files that don't have an unlimited
dimension; in this case, the '-m' flag must be given.
