Known Bugs in CLM4.0.54 in CESM1.1.0                            Nov/08/2012

====================================================================================
Bug Number: 1543
large-file format does NOT work in latest clm 

CLM has the NetCDF large-file format hardwired to TRUE. You can NOT use the namelist
variable outnc_large_files to change them to Classic format.

====================================================================================
Bug Number: 1398
clm and mksurfdata_map needs to check map file

If the map files sent to CLM are inconsistent with the datafiles, CLM does NOT
nicely abort, but aborts with the following types of problem....


====================================================================================
Bug Number: 1397
c2l_scale_type not specified for many history fields 

Bill Sacks reports the following problem...

Many history fields do not have a c2l_scale_type parameter (in histFldsMod),
but it seems they should. For example, there is a set of water flux variables,
starting with QFLX_RAIN_GRND and ending with QFLX_DEW_SNOW, most of which do
not have a c2l_scale_type. From talking with Keith Oleson, it seems that at
least some and maybe all of these should have c2l_scale_type='urbanf', by
analogy with similar fluxes that do have a c2l_scale_type specified.

From talking with Keith Oleson: it sounds like most fluxes should have
c2l_scale_type='urbanf', but this isn't necessarily always true. So this will
require more investigation to determine the appropriate scale type for each
history field. 

Most (all?) of the fields that do not have a c2l_scale_type are ones that were
added after the urban model came in - for example, fields that were added when
the CN code came in. So my guess is that whoever added these fields didn't
realize that a c2l_scale_type was required.

After these fields are fixed, perhaps scale_type_c2l should be made a required
argument to hist_addfld1d and hist_addfld2d to prevent this problem from
arising again in the future.

====================================================================================
Bug Number: 1377
CLM1PT mode is in extend mode by default, so fails going beyond one data cycle

PLM1PT mode puts atm forcing data in extend mode so if you go beyond one data
cycle it gives screwy results. I ran the spinup series with PTCLM and since it
was running over multiple years after it ran one cycle it used the last
time-step for the future years.

Here's one way to fix this issue after a case is built...

   sed "s/'extend','cycle'/'cycle','cycle'/g" Buildconf/datm.buildnml.csh > datm.bld
   mv datm.bld Buildconf/datm.buildnml.csh

See bug 1368 below for a case where this causes a major problem with a 
simulation.

====================================================================================
Bug Number: 1368
PTCLM for US-UMB spins up with zero GPP

I ran a spinup with PTCLM (to test the procedure and to get initial conditions
for US-UMB, and be able to work the transient issue). For the final_spinup
phase GPP was identically zero.

This was fixed by setting taxmode='cycle'. See bug 1377 above.
 
====================================================================================
Bug Number: 1360
Can't do a ncdump on the US-UMB single point data for PTCLM

I get the following error doing an ncdump on the US-UMB data...

ncdump -h 1999-01.nc 
ncdump: name begins with space or control-character: 1


I can view the file with ncks, or ncl, or ncview.

It turns out the problem is that the filename -- doesn't start with an
alphabetical letter it starts with a number. So there isn't really a way around
this other than renaming the files and changing the streams. To do a ncdump,
you can simply rename the file to something else and then do the ncdump on it.

====================================================================================
Bug Number: 1348
Restart test with crop on shows differences in landmask field

The following test is failing on edinburgh with the lahey compiler with
clm4_0_32.

006 erTZ4 TER.sh 21p_cncrpsc_ds clm_stdIgnYr^nl_crop 20020401:3600 10x15 USGS
-3+-7 cold ........FAIL! rc= 13


The difference is in landmask field where some values are set to 1.95379e+09
looks like over ocean.

====================================================================================
Bug Number: 1345
Irrigation dataset is upside down! 

The irrigation dataset used to create surface datasets by mksurfdata is upside
down (latitude goes from 90 to -90 rather) relative to other files used to
create surface datasets.

The filename is:

     $CSMDATA/lnd/clm2/rawdata/mksrf_irrig_2160x4320_simyr2000.c090320.nc

This is the same problem as in the VOC dataset in bug number 1044 below.

Even though the file looks incorrect the datasets it creates are fine.

====================================================================================
Bug Number: 1339
Limit on number of files when running with 155 years of MOAR data

In order to run with 155 years of MOAR data the file limit in shr_stream needs
to increase from 1000 to 2000 (technically 1860 would be sufficient, but might
as well bump it to 2000).

Index: shr_stream_mod.F90
===================================================================
--- shr_stream_mod.F90    (revision 28396)
+++ shr_stream_mod.F90    (working copy)
@@ -101,7 +101,7 @@
    end type shr_stream_fileType

    !--- hard-coded array dims ~ could allocate these at run time ---
-   integer(SHR_KIND_IN),parameter :: nFileMax = 1000  ! max number of files
+   integer(SHR_KIND_IN),parameter :: nFileMax = 2000  ! max number of files

    type shr_stream_streamType
       !private                                    ! no public access to internal components

We didn't put this change in as it causes a compiler error on bluefire for AIX
when compiling the dlnd component...

/ptmp/mvertens/SMS.f45_g37.A.bluefire.rel06_d/lnd/obj/Depends
mpxlf90_r -c -I. -I/usr/include -I/usr/local/include -I/usr/local/include -I.
-I/ptmp/mvertens/SMS.f45_g37.A.bluefire.rel06_d/SourceMods/src.dlnd
-I/glade/proj3/cseg/people/mver
tens/src/cesm/cesm1_0_rel06/models/lnd/dlnd
-I/glade/proj3/cseg/people/mvertens/src/cesm/cesm1_0_rel06/models/lnd/dlnd/cpl_mct
-I/ptmp/mvertens/SMS.f45_g37.A.bluefire.rel06_d/li
b/include  -WF,-DMCT_INTERFACE -WF,-DHAVE_MPI -WF,-DAIX -WF,-DSEQ_ -WF,-DFORTRAN_SAME
-q64 -g -qfullpath -qmaxmem=-1 -qarch=auto -qsigtrap=xl__trcedump  -qsclk=micro -O2
-qstri
ct -Q -qsuffix=f=f90:cpp=F90
/glade/proj3/cseg/people/mvertens/src/cesm/cesm1_0_rel06/models/lnd/dlnd/dlnd_comp_mod.F90
touch /ptmp/mvertens/SMS.f45_g37.A.bluefire.rel06_d/lnd/obj/Filepath
    1517-009: (U) Error in compiler runtime system; compilation ended.
xlf90_r: 1501-230 (S) Internal compiler error; please contact your Service
Representative. For more information visit:
http://www.ibm.com/support/docview.wss?uid=swg21110810
1501-511  Compilation failed for file dlnd_comp_mod.F90.
gmake: *** [dlnd_comp_mod.o] Error 40

====================================================================================
Bug Number: 1326
Running with both crop AND irrigation fail with a balance check error

Running tests that have both crop AND irrigation fail with a balance check
error. This is for starting up with arbitrary initial conditions and running
with either CN and/or CNDV.

====================================================================================
Bug Number: 1325
Writing out GDDHARV cause abort when written out to history file

The variables: GDDHARV cause the model to abort when
adding it to the history file and DEBUG mode is on. It aborts in one of the
pft averaging functions in subgridAveMod with a multiply by a NaN. This is on
bluefire. The variables are initialized to spval, so I'm not sure why this
happens.

Here's what the abort looks like...

(seq_mct_drv) : Model initialization complete


  Signal received: SIGTRAP - Trace trap
    Signal generated for floating-point exception:
      FP invalid operation

  Instruction that generated the exception:
    fmul fr02,fr01,fr02
    Source Operand values:
      fr01 =                   NaNS
      fr02 =   1.00000000000000e+00

  Traceback:
    Offset 0x00002104 in procedure __subgridavemod_NMOD_p2g_1d, near line 796
in file /fis/cgd/home/erik/clm_cropbr/models/lnd/clm/src/main/subgridAveMod.F90
    Offset 0x00000540 in procedure
*__subgridavemod_NMOD_p2g_1d_stub_in___histfilemod_NMOD_hist_update_hbuf_field_1d
    Offset 0x00000670 in procedure
__histfilemod_NMOD_hist_update_hbuf_field_1d, near line 1172 in file
/fis/cgd/home/erik/clm_cropbr/models/lnd/clm/src/main/histFileMod.F90
    Offset 0x000000fc in procedure __histfilemod_NMOD_hist_update_hbuf@OL@1
    Location 0x09000000015f2d4c
    Location 0x09000000015eb758
    Offset 0x000000dc in procedure _pthread_body
    --- End of call chain ---

====================================================================================
Bug Number: 1310
Some indices are different for differing number of threads

Some of the 1d indices are different on the history files when differing number of
threads is used.

034 erL83 TER.sh _nrsc_do clm_std^nl_urb 20020115:3600 5x5_amazon navy -5+-5
arb_ic .............FAIL! rc= 13

Everything's bit-for-bit up to...

CLM_compare.sh: comparing clmrun.clm2.h1.2002-01-20-00000.nc
                with     
/ptmp/erik/test-driver.888958/TSM._nrsc_do.clm_std^nl_urb.20020115:3600.5x5_amazon.navy.-10.arb_ic/clmrun.clm2.h1.2002
-01-20-00000.nc
CLM_compare.sh: files are NOT b4b

 RMS land1d_g         7
 RMS cols1d_g         7
 RMS cols1d_l         8
 RMS pfts1d_g         7
 RMS pfts1d_l         8
 RMS pfts1d_c        12

We got around this by removing these fields from the history files.

====================================================================================
Bug Number: 1289
Problem reading in single-point CO2 stream file on franklin

We verified this is a problem on franklin, but NOT other machines such as bluefire
for example.

Zack Subin

Reports on the following problem on Franklin. The problem is a subscript out of
range in a MCT subroutine being used by PIO. Hence why I've added Jim E. and
Rob J. to the list of people.

He's the running the following case documented in the CLM Users Guide.

http://www.cesm.ucar.edu/models/cesm1.0/clm/models/lnd/clm/doc/UsersGuide/x2920.html#AEN2948

I think the thing that's unique here is that the CO2 file only has one
datapoint. There might be an assumption that you are reading more datapoints
and something isn't dimensioned right in MCT, PIO or in datm? Not sure which...

I ran the same case on bluefire and it runs both with DEBUG on and off (as I
say below). But, possibly bluefire is more forgiving on this subscript overflow
than Franklin. Since Franklin is a pretty standard machine it would probably
show up on other platforms as well.

Here's Zack's message, with my previous message to him to give me some data to
file the bug report with.

I'm running I_1850-2000_CN, 1.9x2.5 deg, on Franklin with clm4_0_24.  It
is out of the box with the instructions for passing CO2 except for the
location of the forcing files and the initial conditions file, and the
number of tasks in drv_in:ccsm_pes: *_ntasks.

The end of the standard output log reads:
0: Subscript out of range for array compbuf (rearrange.F90: 300)
   subscript=1, lower bound=1, upper bound=0, dimension=1

The end of the cpl.log reads:
(seq_mct_drv) : Initialize each component: atm, lnd, ocn, and ice
(seq_mct_drv) : Initialize atm component

The end of the datm.log reads:
(shr_dmodel_readLBUB) reading file:
/global/homes/z/zmsubin/Scratch/clmdata/fco2_datm_1765-2007_c100309.nc  
  85

There is no lnd.log.  It does not produce a core file.

When I run with DEBUG off it runs normally, and the PCO2 is identical to
what I get from a run in clm4_0_16 except that the point at (1, 1) has a
nonzero PCO2 whereas it is 0 in clm4_0_16.

When I follow your instructions for setting the number of atm pio tasks
to 1, it still has the same error.

--Zack

I was able to replicate this problem on lynx:

/glade/proj2/fis/cgd/home/erik/clm4_0_24/scripts/DATM_CO2_TSERIES

 cat /ptmp/erik/DATM_CO2_TSERIES/run/ccsm.log.110225-170538

ock size conversion in bytes is          4086.02
8 MB memory   alloc in MB is             8.00
8 MB memory dealloc in MB is             0.00     
Memory block size conversion in bytes is          4086.02
8 MB memory   alloc in MB is             8.00     
8 MB memory dealloc in MB is             0.00
8 MB memory dealloc in MB is             0.00
Memory block size conversion in bytes is          4086.02
Memory block size conversion in bytes is          4086.02
.
.
.
.
0: Subscript out of range for array compbuf (rearrange.F90: 300)
    subscript=1, lower bound=1, upper bound=0, dimension=1
0: Subscript out of range for array compbuf (rearrange.F90: 300)
    subscript=1, lower bound=1, upper bound=0, dimension=1
[NID 00073] 2011-02-25 17:06:52 Apid 136614: initiated application termination
Application 136614 exit codes: 127
Application 136614 exit signals: Killed
Application 136614 resources: utime 0, stime 0

atm.log file ends with...

(shr_dmodel_readLBUB) reading file:
/glade/proj2/fis/cgd/cseg/csm/inputdata/atm/datm7/CO2/fco2_datm_1765-2007_c100614.nc
     85

====================================================================================
Bug Number: 1282
Trouble running datm8 to the last time-step for datasets with missing data

The urban single-point sites all have only a portion of a complete year of atm
forcing data. Hence, all of them abort with a dtlimit error when you try to run
until the last time-step. This is because it reads in the data for the next
time-step (thinking it needs to do a time-interpolation) and finds the
difference in time-step is large (since it's over the part of the year with
missing data). This is for the 1x1_mexicocityMEX, 1x1_vancouverCAN, and
1x1_urbanc_alpha sites, but would be the case for other datasets with missing
time-periods.

The fix is to change the datm namelist to add settings for tintalgo and dtlimit
in the datm namelist as follows...

 &shr_strdata_nml
.
.
.
   tintalgo       = 'nearest','linear'
   dtlimit        = 25000.,1.5
  /

Thus it will use the nearest point in time, and won't die with a dtlimit error,
as we are setting the dtlimit to a very high value.

====================================================================================
Bug Number: 1164
Restart trouble for CNC13 with INTEL, PGI and LAHEY compilers

017 erR53 TER.sh 17p_cnc13sc_do clm_std^nl_urb 20020115:NONE:1800 10x15
USGS@1850 10+38 cold ....FAIL! rc= 13

Answers differ and gradually diverge in time. This could be a restart issue or a
multi-processing or threading issue.

====================================================================================
Bug Number: 1163
CN finidat files have a bunch of fields with NaN's on it.

For example on:

$CSMDATA/ccsm4_init/I2000CN_f09_g16_c100503/0001-01-01/ \
I2000CN_f09_g16_c100503.clm2.r.0001-01-01-00000.nc

the fields: mlaidiff, flx_absdv, flx_absdn, flx_absiv, flx_absin, and
qflx_snofrz_lyr all have NaN's, with mlaidiff being completely full of NaN's
(since mlaidiff is only defined for CLMSP or if drydep is on).

====================================================================================
Bug Number: 1127
interpinic not tested for CNDV, yet; expected not to work 

Interpinic has not worked for the old dgvm since probably before clm3.5.
Interpinic has not been tested, yet, for CNDV. Therefore, we assume that it
does not work.

====================================================================================
Bug Number: 1124
Reported energy for grid-cell is not quite right for pftdyn

The amount of water is conserved correctly in pftdyn mode, but the energy isn't
reported quite accurately.

====================================================================================
Bug Number: 1101
suplnitro=ALL mode is over-productive

suplnitro=ALL mode is over-productive. This is because it provides unlimited
Nitrogen. Fixing it requires using fnitr from the pft-physiology file, a different 
pft-physiology file with fnitr scaled appropriately and some code modifications 
to get this all to work.

====================================================================================
Bug Number: 1063
Problems restarting for CESM spinup data mode

Exact restarts for the 1850 CN spinup compset fail on bluefire...

ERS.f09_g16.I1850SPINUPCN.bluefire

also the ERB test fails, and the ERB_D test fails with optimization set to
zero.

(note ERS for the I1850CN compset passes, it's just the SPINUP case that fails)

In the coupler log file there's a single field that is different...

The good thing is that it's a single field from the land model that's causing
trouble...

Comparing initial log file with second log file
Error:
/ptmp/erik/ERS.f09_g16.I1850SPINUPCN.bluefire.124426/run/cpl.log.091029-130401
and
/ptmp/erik/ERS.f09_g16.I1850SPINUPCN.bluefire.124426/run/cpl.log.091029-130648
are different.
>comm_diag xxx sorr   1 4.5555498624818352000E+16 recv lnd Sl_t

<comm_diag xxx sorr   1 4.5555508855413304000E+16 recv lnd Sl_t


But, there are many clm history fields that are different.

====================================================================================
Bug Number: 1044
VOC input raw data file has reverse coordinates and hence upside down LANDMASK

The file

$CSMDATA/lnd/clm2/rawdata/mksrf_vocef.c060502.nc

produces reasonable results for VOC emission fields on surfdata files. But, the
LANDMASK when viewed with ncview is upside down and shifted from what's
expected. I think this is because the latitude coordinates are reversed from
the other files (N to S instead of S to N).

====================================================================================
Bug number: 669
Y10K problem for clm

CESM can't use negative years or years > 9999. Having dates of Y10K or more
is sometimes useful for paleo simulations.
For clm to get past the Y10K barrier -- it needs the subroutines

set_hist_filename
restFile_filename
set_dgvm_filename

changed to allow 5 or 6 digit years rather than just 4-digit ones.

scripts, drv, and csm_share also have problems with Y10K as well.

====================================================================================
