Generalize Calendar supported by Analysis¶
Xylar Asay-Davis
date: 2017/02/09
Summary
Currently, the time variable in xarray
data sets within MPAS-Analysis has two
major shortcomings, inherited from xarray
(through pandas
and numpy.datetime64
).
First, only the Gregorian calendar is supported. Second, there is not support
for dates outside the years 1678 to 2262. The analysis needs to support both
the Gregorian (‘gregorian’) and the 365-day (‘gregorian_noleap’) calendars. It also needs to
support, at a minimum, years between 0001 and 9999, and preferably arbitrary
years both positive and negative.
A major challenge is that it seems that xarray cannot easily be forced to
use an alternative representation of dates to the troublesome
numpy.datetime64
type (see, for example,
pydata/xarray#1084).
The most obvious alternative, datetime.datetime
,
seemingly cannot be used directly in xarray
because objects of this type
are converted to numpy.datetime64
objects at various stages when using
features from pandas, raising errors when dates are out of range. While an
alternative date class (e.g. netcdftime.DatetimNoLeap
) might be used to
represent dates on the ‘gregorian_noleap’ calendar, there is no such
preexisting alternative for the ‘gregorian’ calendar.
The solution proposed herein is to store time as floating-point days since the
reference date 0001-01-01 and to convert dates in this format to
datetime.datetime
and MpasRelativeDelta
objects whenever mathematical
manipulation of dates is required.
A successful implementation would produce essentially identical analysis to
what is currently produced, but making use of the dates from the MPAS calendar
(whether Gregorian or 365-day) without the need for artifical offsets (e.g.
yearOffset
used in the current code. Plots of horizontal fields would remain
unchanged while plots of time series would have a time axis with the simulation
date instead of the offset date.
Requirements
Requirement: The 'Time' coordinate of xarray data sets must be consistent
with the MPAS calendar
Date last modified: 2017/02/09
Contributors: Xylar Asay-Davis
For all data sets used in the analysis, the ‘Time’ coordinate must represent dates on the appropriate MPAS calendar, either ‘gregorian’ or ‘gregorian_noleap’, depending on the namelist option ‘config_calendar_type’. There must be ways of mathematically manipulating times (e.g. adding/subtracting offsets and figuring out the amount of time between two dates) and of making plots that are consistent with these calendars.
Requirement: The 'Time' coordinate of xarray data sets must support at least years
0001 and 9999, and preferably any conceivable value
Date last modified: 2017/02/16
Contributors: Xylar Asay-Davis
For all data sets used in the analysis, the ‘Time’ coordinate must, at a minimum,
support years between 0001 and 9999 (the range of datetime.datetime
) and preferably
a broader range.
Algorithmic Formulations (optional)
Design solution: The 'Time' coordinate of xarray data sets must be consistent
with the MPAS calendar
Date last modified: 2017/02/11
Contributors: Xylar Asay-Davis, Phillip J. Wolfram
The proposed solution represents time in xarray.DataSet
objects as the number of
days since the reference date 0001-01-01.
This is reasonable because the smallest unit of time output in MPAS components is
seconds (and unlikely to ever be shorter than ms). We note that a date specified
as a 64-bit float has a precision high enough to represent seconds for dates up
to +/- 100 million years:
>>> import sys
>>> 1./(sys.float_info.epsilon*365*24*60*60)
142808207.36207813
We should have no trouble representing any number we might want (including paleo timescales) with this system.
For purposes of performing mathematical operations and plotting dates, these
values will be converted to datetime.datetime
objects (via the proposed
days_to_datetime
utility function) and back (via the proposed
datetime_to_days
).
The conversion operations within datetime_to_days
and days_to_datetime
will be
performed with the calendar-aware functions netCDF4.date2num
and
netCDF4.num2date
, respectively. Both functions will support lists/arrays of dates
(for efficiency and simplicity of calling code) in addition to single values.
Curve ploting can be supported with matplotlib.pyplot.plot_date
, which takes a date
of exactly the format used here (days since 0001-01-01). The compatibility with plot_date
was part of the reason for choosing this format for the date.
Design solution: The 'Time' coordinate of xarray data sets must support at least years
0001 and 9999, and preferably any conceivable value
Date last modified: 2017/02/09
Contributors: Xylar Asay-Davis
Same as above. In theory, the use of days since 0001-01-01 would allow any year
to be supported, not just the range from 0001 to 9999. However, the conversions
to datetime.datetime
objects for mathematical manipulation will constrain
the dates to be between datetime.min
(0001-01-01) and datetime.max
(9999-12-31).
Design and Implementation
Implementation: The 'Time' coordinate of xarray data sets must be consistent
with the MPAS calendar
Date last modified: 2017/02/16
Contributors: Xylar Asay-Davis
The proposed implementation is on the branch xylar/generalize_calendar
A helper funciton, mpas_xarray._parse_dataset_time
, computes times as days since
0001-01-01, and serves as a replacement for mpas_xarray._get_datetimes
.
Note: the current implementation breaks the convention that mpas_xarray
remains
separate from the rest of MPAS-Analyis by using 3 functions from timekeeping.utility
in mpas_xarray
:
from ..timekeeping.utility import string_to_days_since_date, \
days_to_datetime, datetime_to_days
This violates the first requirement in the
Design Document: Moving variable mapping out of mpas_xarray.
I am open to alternative solutions for keeping mpas_xarray
separate from the rest
of analysis but these 3 functions do not conceptually belong in mpas_xarray
. The
problem is exacerbated by the fact that there are analysis-specific functions in
timekeeping
, meaning that this cannot easily be made a submodule of mpas_xarray
(nor would this make very much logical sense). Having 2 timekeeping
modules, one
for mpas_xarray
and one for MPAS-Analysis, seems unnecessarily confunsing.
The functions generalized_reader.open_multifile_dataset
and
mpas_xarray.open_multifile_dataset
have been updated to use this method for parsing
times. This involves removing the year_offset
argument and adding an optional
simulation_start_time
argument for supplying a date to use to convert variables
like daysSinceStartOfSim
to days since 0001-01-01.
An example of opening a data set and manipulating times withe the new approach in the OHC script is:
from ..shared.timekeeping.utility import get_simulation_start_time, \
date_to_days, days_to_datetime, string_to_datetime
...
def ohc_timeseries(config, streamMap=None, variableMap=None):
...
simulationStartTime = get_simulation_start_time(streams)
...
ds = open_multifile_dataset(file_names=file_names,
calendar=calendar,
simulation_start_time=simulation_start_time,
time_variable_name='Time',
variable_list=variable_list,
variable_map=variableMap,
start_date=startDate,
end_date=endDate)
timeStart = string_to_datetime(startDate)
timeEnd = string_to_datetime(endDate)
# Select year-1 data and average it (for later computing anomalies)
timeStartFirstYear = string_to_datetime(simulation_start_time)
if timeStartFirstYear < timeStart:
startDateFirstYear = simulation_start_time
firstYear = int(startDateFirstYear[0:4])
endDateFirstYear = '{:04d}-12-31_23:59:59'.format(firstYear)
filesFirstYear = streams.readpath(streamName,
startDate=startDateFirstYear,
endDate=endDateFirstYear,
calendar=calendar)
dsFirstYear = open_multifile_dataset(
file_names=filesFirstYear,
calendar=calendar,
simulation_start_time=simulation_start_time,
time_variable_name='Time',
variable_list=variable_list,
variable_map=variableMap,
start_date=startDateFirstYear,
end_date=endDateFirstYear)
else:
dsFirstYear = ds
firstYear = timeStart.year
timeStartFirstYear = date_to_days(year=firstYear, month=1, day=1,
calendar=calendar)
timeEndFirstYear = date_to_days(year=firstYear, month=12, day=31,
hour=23, minute=59, second=59,
calendar=calendar)
dsFirstYear = dsFirstYear.sel(Time=slice(timeStartFirstYear,
timeEndFirstYear))
meanFirstYear = dsFirstYear.mean('Time')
...
yearStart = days_to_datetime(ds.Time.min()).year
yearEnd = days_to_datetime(ds.Time.max()).year
timeStart = date_to_days(year=yearStart, month=1, day=1,
calendar=calendar)
timeEnd = date_to_days(year=yearEnd, month=12, day=31,
calendar=calendar)
if preprocessedReferenceRunName != 'None':
print ' Load in OHC from preprocessed reference run...'
inFilesPreprocessed = '{}/OHC.{}.year*.nc'.format(
preprocessedInputDirectory, preprocessedReferenceRunName)
dsPreprocessed = open_multifile_dataset(
file_names=inFilesPreprocessed,
calendar=calendar,
simulation_start_time=simulation_start_time,
time_variable_name='xtime')
yearEndPreprocessed = days_to_datetime(dsPreprocessed.Time.max()).year
...
The replicate_cycles
function in sea_ice.timeseries
has been a particular
challenge with the existing calendar. Here is that function with the new ‘Time’
coordinate:
def replicate_cycle(ds, dsToReplicate, calendar):
dsStartTime = days_to_datetime(ds.Time.min(), calendar=calendar)
dsEndTime = days_to_datetime(ds.Time.max(), calendar=calendar)
repStartTime = days_to_datetime(dsToReplicate.Time.min(),
calendar=calendar)
repEndTime = days_to_datetime(dsToReplicate.Time.max(),
calendar=calendar)
repSecondTime = days_to_datetime(dsToReplicate.Time.isel(Time=1),
calendar=calendar)
period = (MpasRelativeDelta(repEndTime, repStartTime) +
MpasRelativeDelta(repSecondTime, repStartTime))
startIndex = 0
while(dsStartTime > repStartTime + (startIndex+1)*period):
startIndex += 1
endIndex = 0
while(dsEndTime > repEndTime + (endIndex+1)*period):
endIndex += 1
dsShift = dsToReplicate.copy()
times = days_to_datetime(dsShift.Time, calendar=calendar)
dsShift.coords['Time'] = ('Time',
datetime_to_days(times + startIndex*period,
calendar=calendar))
# replicate cycle:
for cycleIndex in range(startIndex, endIndex):
dsNew = dsToReplicate.copy()
dsNew.coords['Time'] = ('Time',
datetime_to_days(times + (cycleIndex+1)*period,
calendar=calendar))
dsShift = xr.concat([dsShift, dsNew], dim='Time')
return dsShift
Implementation: The 'Time' coordinate of xarray data sets must support at least years
0001 and 9999, and preferably any conceivable value
Date last modified: 2017/02/09
Contributors: Xylar Asay-Davis
Same as above.
Testing
Testing and Validation: The 'Time' coordinate of xarray data sets must be consistent
with the MPAS calendar
Date last modified: 2017/02/11
Contributors: Xylar Asay-Davis
In [xylar/generalize_calendar](https://github.com/xylar/MPAS-Analysis/tree/generalize_calendar),
unit testing has been added for `timekeeping` and `mpas_xarray` that checks both the `gregorian`
and `gregorian_noleap` calendars under simple test conditions. However, we have no data sets
that test `gregorian`, so we have a somewhat limited ability to test this calendar option.
Fortunately, there are also no immediate plans to run with `gregorian`.I will make sure all tests with config files in the configs/lanl
and configs/edison
directories produce bit-for-bit results with the current develop
.
Testing and Validation: The 'Time' coordinate of xarray data sets must support at least years
0001 and 9999, and preferably any conceivable value
Date last modified: 2017/02/11
Contributors: Xylar Asay-Davis
Unit tests have been added to ensure that dates both close to 0001-01-01 and typical calendar dates (e.g. 2017-01-01) function as expected.
@akturner’s MPAS-SeaIce run with real dates (mentioned in #81) has been successfully run with the proposed approach. This run started in 1958, and had presented a problem for MPAS-Analysis with the previous calendar.