2025-09-16 08:01:59,108 - root - INFO - --------------- Versions --------------- 2025-09-16 08:01:59,226 - root - INFO - git branch: b'* multicheckpoint' 2025-09-16 08:01:59,247 - root - INFO - git hash: b'4b7dcddea9084b60c41957440a9f3e14f7d2567d' 2025-09-16 08:01:59,247 - root - INFO - Torch: 2.2.0a0+6a974be 2025-09-16 08:01:59,247 - root - INFO - ---------------------------------------- 2025-09-16 08:01:59,247 - root - INFO - ------------------ Configuration ------------------ 2025-09-16 08:01:59,247 - root - INFO - Configuration file: /global/u2/a/amahesh/ms_finetune/modulus-makani-fork/config/sfnonet.yaml 2025-09-16 08:01:59,247 - root - INFO - Configuration name: multistep_sfno_linear_74chq_sc2_layers8_edim620_wstgl2 2025-09-16 08:01:59,247 - root - INFO - wandb_group multistep_sfno_linear_74chq_sc2_layers8_edim620_wstgl2-0.1.0 2025-09-16 08:01:59,247 - root - INFO - scheduler CosineAnnealingLR 2025-09-16 08:01:59,247 - root - INFO - max_epochs 20 2025-09-16 08:01:59,247 - root - INFO - scheduler_T_max 20 2025-09-16 08:01:59,247 - root - INFO - lr 0.0001 2025-09-16 08:01:59,247 - root - INFO - load_counters False 2025-09-16 08:01:59,248 - root - INFO - load_optimizer False 2025-09-16 08:01:59,248 - root - INFO - load_scheduler False 2025-09-16 08:01:59,248 - root - INFO - finetune True 2025-09-16 08:01:59,248 - root - INFO - pretrained_checkpoint_path /pscratch/sd/a/amahesh/earth2mip_prod_registry/sfno_linear_74chq_sc2_layers8_edim620_wstgl2-epoch70_seed102/training_checkpoints/best_ckpt_mp0.tar 2025-09-16 08:01:59,248 - root - INFO - embed_dim 620 2025-09-16 08:01:59,248 - root - INFO - num_layers 8 2025-09-16 08:01:59,248 - root - INFO - scale_factor 2 2025-09-16 08:01:59,248 - root - INFO - hard_thresholding_fraction 1.0 2025-09-16 08:01:59,248 - root - INFO - loss weighted squared temp-std geometric l2 2025-09-16 08:01:59,248 - root - INFO - valid_autoreg_steps 1 2025-09-16 08:01:59,248 - root - INFO - metadata_json_path /pscratch/sd/p/pharring/74var-6hourly/staging/data.json 2025-09-16 08:01:59,248 - root - INFO - train_data_path /pscratch/sd/p/pharring/74var-6hourly/staging/train 2025-09-16 08:01:59,248 - root - INFO - valid_data_path /pscratch/sd/p/pharring/74var-6hourly/staging/valid 2025-09-16 08:01:59,248 - root - INFO - exp_dir /pscratch/sd/a/amahesh/fcn_training/modulus-makani_runs-0.1.0gmd-fcndev_stats/ 2025-09-16 08:01:59,248 - root - INFO - n_years 1 2025-09-16 08:01:59,248 - root - INFO - img_shape_x 721 2025-09-16 08:01:59,248 - root - INFO - img_shape_y 1440 2025-09-16 08:01:59,248 - root - INFO - min_path /pscratch/sd/p/pharring/74var-6hourly/staging/stats/mins.npy 2025-09-16 08:01:59,248 - root - INFO - max_path /pscratch/sd/p/pharring/74var-6hourly/staging/stats/maxs.npy 2025-09-16 08:01:59,248 - root - INFO - time_means_path /pscratch/sd/p/pharring/74var-6hourly/staging/stats_fcndev/time_means.npy 2025-09-16 08:01:59,248 - root - INFO - global_means_path /pscratch/sd/p/pharring/74var-6hourly/staging/stats_fcndev/global_means.npy 2025-09-16 08:01:59,249 - root - INFO - global_stds_path /pscratch/sd/p/pharring/74var-6hourly/staging/stats_fcndev/global_stds.npy 2025-09-16 08:01:59,249 - root - INFO - time_diff_means_path /pscratch/sd/p/pharring/74var-6hourly/staging/stats_fcndev/time_diff_means.npy 2025-09-16 08:01:59,249 - root - INFO - time_diff_stds_path /pscratch/sd/p/pharring/74var-6hourly/staging/stats_fcndev/time_diff_stds.npy 2025-09-16 08:01:59,249 - root - INFO - nettype SFNO 2025-09-16 08:01:59,249 - root - INFO - model_grid_type equiangular 2025-09-16 08:01:59,249 - root - INFO - sht_grid_type legendre-gauss 2025-09-16 08:01:59,249 - root - INFO - filter_type linear 2025-09-16 08:01:59,249 - root - INFO - complex_activation real 2025-09-16 08:01:59,249 - root - INFO - normalization_layer instance_norm 2025-09-16 08:01:59,249 - root - INFO - use_mlp True 2025-09-16 08:01:59,249 - root - INFO - mlp_mode serial 2025-09-16 08:01:59,249 - root - INFO - mlp_ratio 2 2025-09-16 08:01:59,249 - root - INFO - separable False 2025-09-16 08:01:59,249 - root - INFO - operator_type dhconv 2025-09-16 08:01:59,249 - root - INFO - activation_function gelu 2025-09-16 08:01:59,249 - root - INFO - pos_embed none 2025-09-16 08:01:59,249 - root - INFO - channel_weights auto 2025-09-16 08:01:59,249 - root - INFO - n_eval_samples 8760 2025-09-16 08:01:59,249 - root - INFO - batch_size 1 2025-09-16 08:01:59,249 - root - INFO - weight_decay 0.0 2025-09-16 08:01:59,249 - root - INFO - scheduler_factor 0.1 2025-09-16 08:01:59,249 - root - INFO - scheduler_patience 10 2025-09-16 08:01:59,250 - root - INFO - scheduler_step_size 100 2025-09-16 08:01:59,250 - root - INFO - scheduler_gamma 0.5 2025-09-16 08:01:59,250 - root - INFO - lr_warmup_steps 0 2025-09-16 08:01:59,250 - root - INFO - verbose False 2025-09-16 08:01:59,250 - root - INFO - wireup_info mpi 2025-09-16 08:01:59,250 - root - INFO - wireup_store tcp 2025-09-16 08:01:59,250 - root - INFO - num_data_workers 2 2025-09-16 08:01:59,250 - root - INFO - num_visualization_workers 2 2025-09-16 08:01:59,250 - root - INFO - dt 1 2025-09-16 08:01:59,250 - root - INFO - n_history 0 2025-09-16 08:01:59,250 - root - INFO - prediction_type iterative 2025-09-16 08:01:59,250 - root - INFO - prediction_length 35 2025-09-16 08:01:59,250 - root - INFO - n_initial_conditions 5 2025-09-16 08:01:59,250 - root - INFO - n_train_samples_per_epoch 54000 2025-09-16 08:01:59,250 - root - INFO - ics_type specify_number 2025-09-16 08:01:59,250 - root - INFO - save_raw_forecasts True 2025-09-16 08:01:59,250 - root - INFO - save_channel False 2025-09-16 08:01:59,250 - root - INFO - masked_acc False 2025-09-16 08:01:59,250 - root - INFO - maskpath None 2025-09-16 08:01:59,250 - root - INFO - perturb False 2025-09-16 08:01:59,250 - root - INFO - add_noise False 2025-09-16 08:01:59,251 - root - INFO - noise_std 0.0 2025-09-16 08:01:59,251 - root - INFO - target default 2025-09-16 08:01:59,251 - root - INFO - normalize_residual False 2025-09-16 08:01:59,251 - root - INFO - channel_names ['u10m', 'v10m', 'u100m', 'v100m', 't2m', 'sp', 'msl', 'tcwv', 'd2m', 'u50', 'u100', 'u150', 'u200', 'u250', 'u300', 'u400', 'u500', 'u600', 'u700', 'u850', 'u925', 'u1000', 'v50', 'v100', 'v150', 'v200', 'v250', 'v300', 'v400', 'v500', 'v600', 'v700', 'v850', 'v925', 'v1000', 'z50', 'z100', 'z150', 'z200', 'z250', 'z300', 'z400', 'z500', 'z600', 'z700', 'z850', 'z925', 'z1000', 't50', 't100', 't150', 't200', 't250', 't300', 't400', 't500', 't600', 't700', 't850', 't925', 't1000', 'q50', 'q100', 'q150', 'q200', 'q250', 'q300', 'q400', 'q500', 'q600', 'q700', 'q850', 'q925', 'q1000'] 2025-09-16 08:01:59,251 - root - INFO - normalization zscore 2025-09-16 08:01:59,251 - root - INFO - add_grid True 2025-09-16 08:01:59,251 - root - INFO - gridtype sinusoidal 2025-09-16 08:01:59,251 - root - INFO - grid_num_frequencies 16 2025-09-16 08:01:59,251 - root - INFO - roll False 2025-09-16 08:01:59,251 - root - INFO - add_zenith True 2025-09-16 08:01:59,251 - root - INFO - add_orography True 2025-09-16 08:01:59,251 - root - INFO - orography_path /global/cfs/cdirs/m3522/cmip6/ERA5/e5.oper.invariant/197901/e5.oper.invariant.128_129_z.ll025sc.1979010100_1979010100.nc 2025-09-16 08:01:59,251 - root - INFO - add_landmask True 2025-09-16 08:01:59,251 - root - INFO - landmask_path /global/cfs/cdirs/m3522/cmip6/ERA5/e5.oper.invariant/197901/e5.oper.invariant.128_172_lsm.ll025sc.1979010100_1979010100.nc 2025-09-16 08:01:59,251 - root - INFO - log_to_screen True 2025-09-16 08:01:59,251 - root - INFO - log_to_wandb True 2025-09-16 08:01:59,251 - root - INFO - log_video 20 2025-09-16 08:01:59,251 - root - INFO - save_checkpoint legacy 2025-09-16 08:01:59,251 - root - INFO - optimizer_type AdamW 2025-09-16 08:01:59,251 - root - INFO - optimizer_beta1 0.9 2025-09-16 08:01:59,251 - root - INFO - optimizer_beta2 0.95 2025-09-16 08:01:59,251 - root - INFO - optimizer_max_grad_norm 32 2025-09-16 08:01:59,252 - root - INFO - crop_size_x None 2025-09-16 08:01:59,252 - root - INFO - crop_size_y None 2025-09-16 08:01:59,252 - root - INFO - inf_data_path /pscratch/sd/p/pharring/74var-6hourly/staging/out_of_sample 2025-09-16 08:01:59,252 - root - INFO - wandb_name None 2025-09-16 08:01:59,252 - root - INFO - wandb_project ERA5_sfno 2025-09-16 08:01:59,252 - root - INFO - wandb_entity weatherbenching 2025-09-16 08:01:59,252 - root - INFO - pos_drop_rate 0.1 2025-09-16 08:01:59,252 - root - INFO - initialization_seed None 2025-09-16 08:01:59,252 - root - INFO - epsilon_factor 0 2025-09-16 08:01:59,252 - root - INFO - fin_parallel_size 1 2025-09-16 08:01:59,252 - root - INFO - fout_parallel_size 1 2025-09-16 08:01:59,252 - root - INFO - h_parallel_size 4 2025-09-16 08:01:59,252 - root - INFO - w_parallel_size 1 2025-09-16 08:01:59,252 - root - INFO - model_parallel_sizes [4, 1, 1, 1] 2025-09-16 08:01:59,252 - root - INFO - model_parallel_names ['h', 'w', 'fin', 'fout'] 2025-09-16 08:01:59,252 - root - INFO - parameters_reduction_buffer_count 1 2025-09-16 08:01:59,252 - root - INFO - load_checkpoint flexible 2025-09-16 08:01:59,252 - root - INFO - world_size 16 2025-09-16 08:01:59,252 - root - INFO - global_batch_size 4 2025-09-16 08:01:59,252 - root - INFO - experiment_dir /pscratch/sd/a/amahesh/fcn_training/modulus-makani_runs-0.1.0gmd-fcndev_stats/multistep_sfno_linear_74chq_sc2_layers8_edim620_wstgl2/v0.1.0-seed102 2025-09-16 08:01:59,252 - root - INFO - checkpoint_path /pscratch/sd/a/amahesh/fcn_training/modulus-makani_runs-0.1.0gmd-fcndev_stats/multistep_sfno_linear_74chq_sc2_layers8_edim620_wstgl2/v0.1.0-seed102/training_checkpoints/ckpt_mp{mp_rank}.tar 2025-09-16 08:01:59,253 - root - INFO - best_checkpoint_path /pscratch/sd/a/amahesh/fcn_training/modulus-makani_runs-0.1.0gmd-fcndev_stats/multistep_sfno_linear_74chq_sc2_layers8_edim620_wstgl2/v0.1.0-seed102/training_checkpoints/best_ckpt_mp{mp_rank}.tar 2025-09-16 08:01:59,253 - root - INFO - resuming False 2025-09-16 08:01:59,253 - root - INFO - amp_mode bf16 2025-09-16 08:01:59,253 - root - INFO - jit_mode none 2025-09-16 08:01:59,253 - root - INFO - cuda_graph_mode none 2025-09-16 08:01:59,253 - root - INFO - skip_validation False 2025-09-16 08:01:59,253 - root - INFO - enable_odirect False 2025-09-16 08:01:59,253 - root - INFO - checkpointing 0 2025-09-16 08:01:59,253 - root - INFO - enable_synthetic_data False 2025-09-16 08:01:59,253 - root - INFO - split_data_channels False 2025-09-16 08:01:59,253 - root - INFO - print_timings_frequency -1 2025-09-16 08:01:59,253 - root - INFO - multistep_count 2 2025-09-16 08:01:59,253 - root - INFO - n_future 1 2025-09-16 08:01:59,253 - root - INFO - enable_benchy False 2025-09-16 08:01:59,253 - root - INFO - disable_ddp False 2025-09-16 08:01:59,253 - root - INFO - enable_grad_anomaly_detection False 2025-09-16 08:01:59,253 - root - INFO - wandb_dir /pscratch/sd/a/amahesh/fcn_training/modulus-makani_runs-0.1.0gmd-fcndev_stats/multistep_sfno_linear_74chq_sc2_layers8_edim620_wstgl2/v0.1.0-seed102 2025-09-16 08:01:59,253 - root - INFO - _yaml_filename /global/u2/a/amahesh/ms_finetune/modulus-makani-fork/config/sfnonet.yaml 2025-09-16 08:01:59,253 - root - INFO - _config_name multistep_sfno_linear_74chq_sc2_layers8_edim620_wstgl2 2025-09-16 08:01:59,253 - root - INFO - --------------------------------------------------- 2025-09-16 08:02:03,109 - root - INFO - Enabling automatic mixed precision in bf16. 2025-09-16 08:02:35,367 - root - INFO - Using channel names: ['u10m', 'v10m', 'u100m', 'v100m', 't2m', 'sp', 'msl', 'tcwv', 'd2m', 'u50', 'u100', 'u150', 'u200', 'u250', 'u300', 'u400', 'u500', 'u600', 'u700', 'u850', 'u925', 'u1000', 'v50', 'v100', 'v150', 'v200', 'v250', 'v300', 'v400', 'v500', 'v600', 'v700', 'v850', 'v925', 'v1000', 'z50', 'z100', 'z150', 'z200', 'z250', 'z300', 'z400', 'z500', 'z600', 'z700', 'z850', 'z925', 'z1000', 't50', 't100', 't150', 't200', 't250', 't300', 't400', 't500', 't600', 't700', 't850', 't925', 't1000', 'q50', 'q100', 'q150', 'q200', 'q250', 'q300', 'q400', 'q500', 'q600', 'q700', 'q850', 'q925', 'q1000'] 2025-09-16 08:02:35,368 - root - INFO - initializing data loader 2025-09-16 08:02:37,130 - root - INFO - Getting file stats from /pscratch/sd/p/pharring/74var-6hourly/staging/train/1979.h5 2025-09-16 08:02:37,149 - root - INFO - Average number of samples per year: 1461.0 2025-09-16 08:02:37,150 - root - INFO - Found data at path ['/pscratch/sd/p/pharring/74var-6hourly/staging/train']. Number of examples: 54056. Full image Shape: 721 x 1440 x 74. Read Shape: 181 x 1440 x 74 2025-09-16 08:02:37,150 - root - INFO - Using 54056 from the total number of available samples with 54000 samples per epoch (corresponds to 13500 steps for 4 shards with local batch size 1) 2025-09-16 08:02:37,150 - root - INFO - Delta t: 6 hours 2025-09-16 08:02:37,150 - root - INFO - Including 6 hours of past history in training at a frequency of 6 hours 2025-09-16 08:02:37,151 - root - INFO - Including 12 hours of future targets in training at a frequency of 6 hours 2025-09-16 08:03:03,511 - root - INFO - Getting file stats from /pscratch/sd/p/pharring/74var-6hourly/staging/valid/2016.h5 2025-09-16 08:03:03,514 - root - INFO - Average number of samples per year: 1462.0 2025-09-16 08:03:03,514 - root - INFO - Found data at path ['/pscratch/sd/p/pharring/74var-6hourly/staging/valid']. Number of examples: 2924. Full image Shape: 721 x 1440 x 74. Read Shape: 181 x 1440 x 74 2025-09-16 08:03:03,514 - root - INFO - Using 2924 from the total number of available samples with 2924 samples per epoch (corresponds to 731 steps for 4 shards with local batch size 1) 2025-09-16 08:03:03,514 - root - INFO - Delta t: 6 hours 2025-09-16 08:03:03,515 - root - INFO - Including 6 hours of past history in training at a frequency of 6 hours 2025-09-16 08:03:03,515 - root - INFO - Including 12 hours of future targets in training at a frequency of 6 hours 2025-09-16 08:03:29,472 - root - INFO - data loader initialized 2025-09-16 08:04:32,304 - root - INFO - Auxiliary channel names: ['xzen', 'xgrlat', 'xgrlon', 'xoro', 'xlsml', 'xlsms'] 2025-09-16 08:04:34,227 - root - INFO - MultiStepWrapper( (preprocessor): Preprocessor2D() (model): SphericalFourierNeuralOperatorNet( (trans_down): DistributedRealSHT( nlat=721, nlon=1440, lmax=360, mmax=361, grid=equiangular, csphase=True ) (itrans_up): DistributedInverseRealSHT( nlat=721, nlon=1440, lmax=360, mmax=361, grid=equiangular, csphase=True ) (trans): DistributedRealSHT( nlat=360, nlon=720, lmax=360, mmax=361, grid=legendre-gauss, csphase=True ) (itrans): DistributedInverseRealSHT( nlat=360, nlon=720, lmax=360, mmax=361, grid=legendre-gauss, csphase=True ) (encoder): EncoderDecoder( (fwd): Sequential( (0): Conv2d(110, 620, kernel_size=(1, 1), stride=(1, 1)) (1): GELU(approximate='none') (2): Conv2d(620, 620, kernel_size=(1, 1), stride=(1, 1), bias=False) ) ) (pos_drop): Dropout(p=0.1, inplace=False) (blocks): ModuleList( (0): FourierNeuralOperatorBlock( (norm0): DistributedInstanceNorm2d() (filter): SpectralFilterLayer( (filter): SpectralConv( (forward_transform): DistributedRealSHT( nlat=721, nlon=1440, lmax=360, mmax=361, grid=equiangular, csphase=True ) (inverse_transform): DistributedInverseRealSHT( nlat=360, nlon=720, lmax=360, mmax=361, grid=legendre-gauss, csphase=True ) ) ) (act_layer0): GELU(approximate='none') (norm1): DistributedInstanceNorm2d() (outer_skip): Conv2d(620, 620, kernel_size=(1, 1), stride=(1, 1), bias=False) (mlp): MLP( (fwd): Sequential( (0): Conv2d(620, 1240, kernel_size=(1, 1), stride=(1, 1)) (1): GELU(approximate='none') (2): Identity() (3): Conv2d(1240, 620, kernel_size=(1, 1), stride=(1, 1)) (4): Identity() ) ) (drop_path): Identity() ) (1-6): 6 x FourierNeuralOperatorBlock( (norm0): DistributedInstanceNorm2d() (filter): SpectralFilterLayer( (filter): SpectralConv( (forward_transform): DistributedRealSHT( nlat=360, nlon=720, lmax=360, mmax=361, grid=legendre-gauss, csphase=True ) (inverse_transform): DistributedInverseRealSHT( nlat=360, nlon=720, lmax=360, mmax=361, grid=legendre-gauss, csphase=True ) ) ) (act_layer0): GELU(approximate='none') (norm1): DistributedInstanceNorm2d() (outer_skip): Conv2d(620, 620, kernel_size=(1, 1), stride=(1, 1), bias=False) (mlp): MLP( (fwd): Sequential( (0): Conv2d(620, 1240, kernel_size=(1, 1), stride=(1, 1)) (1): GELU(approximate='none') (2): Identity() (3): Conv2d(1240, 620, kernel_size=(1, 1), stride=(1, 1)) (4): Identity() ) ) (drop_path): Identity() ) (7): FourierNeuralOperatorBlock( (norm0): DistributedInstanceNorm2d() (filter): SpectralFilterLayer( (filter): SpectralConv( (forward_transform): DistributedRealSHT( nlat=360, nlon=720, lmax=360, mmax=361, grid=legendre-gauss, csphase=True ) (inverse_transform): DistributedInverseRealSHT( nlat=721, nlon=1440, lmax=360, mmax=361, grid=equiangular, csphase=True ) ) ) (act_layer0): GELU(approximate='none') (norm1): DistributedInstanceNorm2d() (outer_skip): Conv2d(620, 620, kernel_size=(1, 1), stride=(1, 1), bias=False) (mlp): MLP( (fwd): Sequential( (0): Conv2d(620, 1240, kernel_size=(1, 1), stride=(1, 1)) (1): GELU(approximate='none') (2): Identity() (3): Conv2d(1240, 620, kernel_size=(1, 1), stride=(1, 1)) (4): Identity() ) ) (drop_path): Identity() ) ) (decoder): EncoderDecoder( (fwd): Sequential( (0): Conv2d(620, 620, kernel_size=(1, 1), stride=(1, 1)) (1): GELU(approximate='none') (2): Conv2d(620, 74, kernel_size=(1, 1), stride=(1, 1), bias=False) ) ) (residual_transform): Conv2d(110, 74, kernel_size=(1, 1), stride=(1, 1), bias=False) ) ) 2025-09-16 08:04:35,071 - root - INFO - using AdamW 2025-09-16 08:04:36,013 - root - INFO - Loading checkpoint /pscratch/sd/a/amahesh/earth2mip_prod_registry/sfno_linear_74chq_sc2_layers8_edim620_wstgl2-epoch70_seed102/training_checkpoints/best_ckpt_mp0.tar in flexible mode 2025-09-16 08:04:43,168 - root - WARNING - missing module.model.encoder.fwd.0.weight 2025-09-16 08:04:43,169 - root - WARNING - missing module.model.encoder.fwd.0.bias 2025-09-16 08:04:43,169 - root - WARNING - missing module.model.encoder.fwd.2.weight 2025-09-16 08:04:43,169 - root - WARNING - missing module.model.blocks.0.norm0.weight 2025-09-16 08:04:43,170 - root - WARNING - missing module.model.blocks.0.norm0.bias 2025-09-16 08:04:43,170 - root - WARNING - missing module.model.blocks.0.filter.filter.weight 2025-09-16 08:04:43,170 - root - WARNING - missing module.model.blocks.0.norm1.weight 2025-09-16 08:04:43,170 - root - WARNING - missing module.model.blocks.0.norm1.bias 2025-09-16 08:04:43,171 - root - WARNING - missing module.model.blocks.0.outer_skip.weight 2025-09-16 08:04:43,171 - root - WARNING - missing module.model.blocks.0.mlp.fwd.0.weight 2025-09-16 08:04:43,171 - root - WARNING - missing module.model.blocks.0.mlp.fwd.0.bias 2025-09-16 08:04:43,171 - root - WARNING - missing module.model.blocks.0.mlp.fwd.3.weight 2025-09-16 08:04:43,171 - root - WARNING - missing module.model.blocks.0.mlp.fwd.3.bias 2025-09-16 08:04:43,172 - root - WARNING - missing module.model.blocks.1.norm0.weight 2025-09-16 08:04:43,172 - root - WARNING - missing module.model.blocks.1.norm0.bias 2025-09-16 08:04:43,172 - root - WARNING - missing module.model.blocks.1.filter.filter.weight 2025-09-16 08:04:43,172 - root - WARNING - missing module.model.blocks.1.norm1.weight 2025-09-16 08:04:43,172 - root - WARNING - missing module.model.blocks.1.norm1.bias 2025-09-16 08:04:43,173 - root - WARNING - missing module.model.blocks.1.outer_skip.weight 2025-09-16 08:04:43,173 - root - WARNING - missing module.model.blocks.1.mlp.fwd.0.weight 2025-09-16 08:04:43,173 - root - WARNING - missing module.model.blocks.1.mlp.fwd.0.bias 2025-09-16 08:04:43,173 - root - WARNING - missing module.model.blocks.1.mlp.fwd.3.weight 2025-09-16 08:04:43,173 - root - WARNING - missing module.model.blocks.1.mlp.fwd.3.bias 2025-09-16 08:04:43,173 - root - WARNING - missing module.model.blocks.2.norm0.weight 2025-09-16 08:04:43,174 - root - WARNING - missing module.model.blocks.2.norm0.bias 2025-09-16 08:04:43,174 - root - WARNING - missing module.model.blocks.2.filter.filter.weight 2025-09-16 08:04:43,174 - root - WARNING - missing module.model.blocks.2.norm1.weight 2025-09-16 08:04:43,174 - root - WARNING - missing module.model.blocks.2.norm1.bias 2025-09-16 08:04:43,174 - root - WARNING - missing module.model.blocks.2.outer_skip.weight 2025-09-16 08:04:43,175 - root - WARNING - missing module.model.blocks.2.mlp.fwd.0.weight 2025-09-16 08:04:43,175 - root - WARNING - missing module.model.blocks.2.mlp.fwd.0.bias 2025-09-16 08:04:43,175 - root - WARNING - missing module.model.blocks.2.mlp.fwd.3.weight 2025-09-16 08:04:43,175 - root - WARNING - missing module.model.blocks.2.mlp.fwd.3.bias 2025-09-16 08:04:43,175 - root - WARNING - missing module.model.blocks.3.norm0.weight 2025-09-16 08:04:43,175 - root - WARNING - missing module.model.blocks.3.norm0.bias 2025-09-16 08:04:43,176 - root - WARNING - missing module.model.blocks.3.filter.filter.weight 2025-09-16 08:04:43,176 - root - WARNING - missing module.model.blocks.3.norm1.weight 2025-09-16 08:04:43,176 - root - WARNING - missing module.model.blocks.3.norm1.bias 2025-09-16 08:04:43,176 - root - WARNING - missing module.model.blocks.3.outer_skip.weight 2025-09-16 08:04:43,176 - root - WARNING - missing module.model.blocks.3.mlp.fwd.0.weight 2025-09-16 08:04:43,177 - root - WARNING - missing module.model.blocks.3.mlp.fwd.0.bias 2025-09-16 08:04:43,177 - root - WARNING - missing module.model.blocks.3.mlp.fwd.3.weight 2025-09-16 08:04:43,177 - root - WARNING - missing module.model.blocks.3.mlp.fwd.3.bias 2025-09-16 08:04:43,177 - root - WARNING - missing module.model.blocks.4.norm0.weight 2025-09-16 08:04:43,177 - root - WARNING - missing module.model.blocks.4.norm0.bias 2025-09-16 08:04:43,178 - root - WARNING - missing module.model.blocks.4.filter.filter.weight 2025-09-16 08:04:43,178 - root - WARNING - missing module.model.blocks.4.norm1.weight 2025-09-16 08:04:43,178 - root - WARNING - missing module.model.blocks.4.norm1.bias 2025-09-16 08:04:43,178 - root - WARNING - missing module.model.blocks.4.outer_skip.weight 2025-09-16 08:04:43,178 - root - WARNING - missing module.model.blocks.4.mlp.fwd.0.weight 2025-09-16 08:04:43,178 - root - WARNING - missing module.model.blocks.4.mlp.fwd.0.bias 2025-09-16 08:04:43,179 - root - WARNING - missing module.model.blocks.4.mlp.fwd.3.weight 2025-09-16 08:04:43,179 - root - WARNING - missing module.model.blocks.4.mlp.fwd.3.bias 2025-09-16 08:04:43,179 - root - WARNING - missing module.model.blocks.5.norm0.weight 2025-09-16 08:04:43,179 - root - WARNING - missing module.model.blocks.5.norm0.bias 2025-09-16 08:04:43,179 - root - WARNING - missing module.model.blocks.5.filter.filter.weight 2025-09-16 08:04:43,180 - root - WARNING - missing module.model.blocks.5.norm1.weight 2025-09-16 08:04:43,180 - root - WARNING - missing module.model.blocks.5.norm1.bias 2025-09-16 08:04:43,180 - root - WARNING - missing module.model.blocks.5.outer_skip.weight 2025-09-16 08:04:43,180 - root - WARNING - missing module.model.blocks.5.mlp.fwd.0.weight 2025-09-16 08:04:43,180 - root - WARNING - missing module.model.blocks.5.mlp.fwd.0.bias 2025-09-16 08:04:43,180 - root - WARNING - missing module.model.blocks.5.mlp.fwd.3.weight 2025-09-16 08:04:43,181 - root - WARNING - missing module.model.blocks.5.mlp.fwd.3.bias 2025-09-16 08:04:43,181 - root - WARNING - missing module.model.blocks.6.norm0.weight 2025-09-16 08:04:43,181 - root - WARNING - missing module.model.blocks.6.norm0.bias 2025-09-16 08:04:43,181 - root - WARNING - missing module.model.blocks.6.filter.filter.weight 2025-09-16 08:04:43,181 - root - WARNING - missing module.model.blocks.6.norm1.weight 2025-09-16 08:04:43,182 - root - WARNING - missing module.model.blocks.6.norm1.bias 2025-09-16 08:04:43,182 - root - WARNING - missing module.model.blocks.6.outer_skip.weight 2025-09-16 08:04:43,182 - root - WARNING - missing module.model.blocks.6.mlp.fwd.0.weight 2025-09-16 08:04:43,182 - root - WARNING - missing module.model.blocks.6.mlp.fwd.0.bias 2025-09-16 08:04:43,182 - root - WARNING - missing module.model.blocks.6.mlp.fwd.3.weight 2025-09-16 08:04:43,182 - root - WARNING - missing module.model.blocks.6.mlp.fwd.3.bias 2025-09-16 08:04:43,183 - root - WARNING - missing module.model.blocks.7.norm0.weight 2025-09-16 08:04:43,183 - root - WARNING - missing module.model.blocks.7.norm0.bias 2025-09-16 08:04:43,183 - root - WARNING - missing module.model.blocks.7.filter.filter.weight 2025-09-16 08:04:43,183 - root - WARNING - missing module.model.blocks.7.norm1.weight 2025-09-16 08:04:43,183 - root - WARNING - missing module.model.blocks.7.norm1.bias 2025-09-16 08:04:43,184 - root - WARNING - missing module.model.blocks.7.outer_skip.weight 2025-09-16 08:04:43,184 - root - WARNING - missing module.model.blocks.7.mlp.fwd.0.weight 2025-09-16 08:04:43,184 - root - WARNING - missing module.model.blocks.7.mlp.fwd.0.bias 2025-09-16 08:04:43,184 - root - WARNING - missing module.model.blocks.7.mlp.fwd.3.weight 2025-09-16 08:04:43,184 - root - WARNING - missing module.model.blocks.7.mlp.fwd.3.bias 2025-09-16 08:04:43,184 - root - WARNING - missing module.model.decoder.fwd.0.weight 2025-09-16 08:04:43,185 - root - WARNING - missing module.model.decoder.fwd.0.bias 2025-09-16 08:04:43,185 - root - WARNING - missing module.model.decoder.fwd.2.weight 2025-09-16 08:04:43,185 - root - WARNING - missing module.model.residual_transform.weight 2025-09-16 17:28:09,924 - root - INFO - --------------- Versions --------------- 2025-09-16 17:28:10,034 - root - INFO - git branch: b'* multicheckpoint' 2025-09-16 17:28:10,048 - root - INFO - git hash: b'4b7dcddea9084b60c41957440a9f3e14f7d2567d' 2025-09-16 17:28:10,049 - root - INFO - Torch: 2.2.0a0+6a974be 2025-09-16 17:28:10,049 - root - INFO - ---------------------------------------- 2025-09-16 17:28:10,049 - root - INFO - ------------------ Configuration ------------------ 2025-09-16 17:28:10,049 - root - INFO - Configuration file: /global/u2/a/amahesh/ms_finetune/modulus-makani-fork/config/sfnonet.yaml 2025-09-16 17:28:10,049 - root - INFO - Configuration name: multistep_sfno_linear_74chq_sc2_layers8_edim620_wstgl2 2025-09-16 17:28:10,049 - root - INFO - wandb_group multistep_sfno_linear_74chq_sc2_layers8_edim620_wstgl2-0.1.0 2025-09-16 17:28:10,049 - root - INFO - scheduler CosineAnnealingLR 2025-09-16 17:28:10,049 - root - INFO - max_epochs 20 2025-09-16 17:28:10,049 - root - INFO - scheduler_T_max 20 2025-09-16 17:28:10,049 - root - INFO - lr 0.0001 2025-09-16 17:28:10,049 - root - INFO - load_counters False 2025-09-16 17:28:10,049 - root - INFO - load_optimizer False 2025-09-16 17:28:10,049 - root - INFO - load_scheduler False 2025-09-16 17:28:10,049 - root - INFO - finetune True 2025-09-16 17:28:10,050 - root - INFO - pretrained_checkpoint_path /pscratch/sd/a/amahesh/recovered_fcn_training/sfno_linear_74chq_sc2_layers8_edim620_wstgl2/v0.1.0-seed77/training_checkpoints/best_ckpt_mp0.tar 2025-09-16 17:28:10,050 - root - INFO - embed_dim 620 2025-09-16 17:28:10,050 - root - INFO - num_layers 8 2025-09-16 17:28:10,050 - root - INFO - scale_factor 2 2025-09-16 17:28:10,050 - root - INFO - hard_thresholding_fraction 1.0 2025-09-16 17:28:10,050 - root - INFO - loss weighted squared temp-std geometric l2 2025-09-16 17:28:10,050 - root - INFO - valid_autoreg_steps 1 2025-09-16 17:28:10,050 - root - INFO - metadata_json_path /pscratch/sd/p/pharring/74var-6hourly/staging/data.json 2025-09-16 17:28:10,050 - root - INFO - train_data_path /pscratch/sd/p/pharring/74var-6hourly/staging/train 2025-09-16 17:28:10,050 - root - INFO - valid_data_path /pscratch/sd/p/pharring/74var-6hourly/staging/valid 2025-09-16 17:28:10,050 - root - INFO - exp_dir /pscratch/sd/a/amahesh/fcn_training/modulus-makani_runs-0.1.0gmd-fcndev_stats/ 2025-09-16 17:28:10,050 - root - INFO - n_years 1 2025-09-16 17:28:10,050 - root - INFO - img_shape_x 721 2025-09-16 17:28:10,050 - root - INFO - img_shape_y 1440 2025-09-16 17:28:10,050 - root - INFO - min_path /pscratch/sd/p/pharring/74var-6hourly/staging/stats/mins.npy 2025-09-16 17:28:10,050 - root - INFO - max_path /pscratch/sd/p/pharring/74var-6hourly/staging/stats/maxs.npy 2025-09-16 17:28:10,050 - root - INFO - time_means_path /pscratch/sd/p/pharring/74var-6hourly/staging/stats_fcndev/time_means.npy 2025-09-16 17:28:10,050 - root - INFO - global_means_path /pscratch/sd/p/pharring/74var-6hourly/staging/stats_fcndev/global_means.npy 2025-09-16 17:28:10,050 - root - INFO - global_stds_path /pscratch/sd/p/pharring/74var-6hourly/staging/stats_fcndev/global_stds.npy 2025-09-16 17:28:10,050 - root - INFO - time_diff_means_path /pscratch/sd/p/pharring/74var-6hourly/staging/stats_fcndev/time_diff_means.npy 2025-09-16 17:28:10,050 - root - INFO - time_diff_stds_path /pscratch/sd/p/pharring/74var-6hourly/staging/stats_fcndev/time_diff_stds.npy 2025-09-16 17:28:10,051 - root - INFO - nettype SFNO 2025-09-16 17:28:10,051 - root - INFO - model_grid_type equiangular 2025-09-16 17:28:10,051 - root - INFO - sht_grid_type legendre-gauss 2025-09-16 17:28:10,051 - root - INFO - filter_type linear 2025-09-16 17:28:10,051 - root - INFO - complex_activation real 2025-09-16 17:28:10,051 - root - INFO - normalization_layer instance_norm 2025-09-16 17:28:10,051 - root - INFO - use_mlp True 2025-09-16 17:28:10,051 - root - INFO - mlp_mode serial 2025-09-16 17:28:10,051 - root - INFO - mlp_ratio 2 2025-09-16 17:28:10,051 - root - INFO - separable False 2025-09-16 17:28:10,051 - root - INFO - operator_type dhconv 2025-09-16 17:28:10,051 - root - INFO - activation_function gelu 2025-09-16 17:28:10,051 - root - INFO - pos_embed none 2025-09-16 17:28:10,051 - root - INFO - channel_weights auto 2025-09-16 17:28:10,051 - root - INFO - n_eval_samples 8760 2025-09-16 17:28:10,051 - root - INFO - batch_size 1 2025-09-16 17:28:10,051 - root - INFO - weight_decay 0.0 2025-09-16 17:28:10,051 - root - INFO - scheduler_factor 0.1 2025-09-16 17:28:10,051 - root - INFO - scheduler_patience 10 2025-09-16 17:28:10,051 - root - INFO - scheduler_step_size 100 2025-09-16 17:28:10,051 - root - INFO - scheduler_gamma 0.5 2025-09-16 17:28:10,051 - root - INFO - lr_warmup_steps 0 2025-09-16 17:28:10,052 - root - INFO - verbose False 2025-09-16 17:28:10,052 - root - INFO - wireup_info mpi 2025-09-16 17:28:10,052 - root - INFO - wireup_store tcp 2025-09-16 17:28:10,052 - root - INFO - num_data_workers 2 2025-09-16 17:28:10,052 - root - INFO - num_visualization_workers 2 2025-09-16 17:28:10,052 - root - INFO - dt 1 2025-09-16 17:28:10,052 - root - INFO - n_history 0 2025-09-16 17:28:10,052 - root - INFO - prediction_type iterative 2025-09-16 17:28:10,052 - root - INFO - prediction_length 35 2025-09-16 17:28:10,052 - root - INFO - n_initial_conditions 5 2025-09-16 17:28:10,052 - root - INFO - n_train_samples_per_epoch 54000 2025-09-16 17:28:10,052 - root - INFO - ics_type specify_number 2025-09-16 17:28:10,052 - root - INFO - save_raw_forecasts True 2025-09-16 17:28:10,052 - root - INFO - save_channel False 2025-09-16 17:28:10,052 - root - INFO - masked_acc False 2025-09-16 17:28:10,052 - root - INFO - maskpath None 2025-09-16 17:28:10,052 - root - INFO - perturb False 2025-09-16 17:28:10,052 - root - INFO - add_noise False 2025-09-16 17:28:10,052 - root - INFO - noise_std 0.0 2025-09-16 17:28:10,052 - root - INFO - target default 2025-09-16 17:28:10,052 - root - INFO - normalize_residual False 2025-09-16 17:28:10,053 - root - INFO - channel_names ['u10m', 'v10m', 'u100m', 'v100m', 't2m', 'sp', 'msl', 'tcwv', 'd2m', 'u50', 'u100', 'u150', 'u200', 'u250', 'u300', 'u400', 'u500', 'u600', 'u700', 'u850', 'u925', 'u1000', 'v50', 'v100', 'v150', 'v200', 'v250', 'v300', 'v400', 'v500', 'v600', 'v700', 'v850', 'v925', 'v1000', 'z50', 'z100', 'z150', 'z200', 'z250', 'z300', 'z400', 'z500', 'z600', 'z700', 'z850', 'z925', 'z1000', 't50', 't100', 't150', 't200', 't250', 't300', 't400', 't500', 't600', 't700', 't850', 't925', 't1000', 'q50', 'q100', 'q150', 'q200', 'q250', 'q300', 'q400', 'q500', 'q600', 'q700', 'q850', 'q925', 'q1000'] 2025-09-16 17:28:10,053 - root - INFO - normalization zscore 2025-09-16 17:28:10,053 - root - INFO - add_grid True 2025-09-16 17:28:10,053 - root - INFO - gridtype sinusoidal 2025-09-16 17:28:10,053 - root - INFO - grid_num_frequencies 16 2025-09-16 17:28:10,053 - root - INFO - roll False 2025-09-16 17:28:10,053 - root - INFO - add_zenith True 2025-09-16 17:28:10,053 - root - INFO - add_orography True 2025-09-16 17:28:10,053 - root - INFO - orography_path /global/cfs/cdirs/m3522/cmip6/ERA5/e5.oper.invariant/197901/e5.oper.invariant.128_129_z.ll025sc.1979010100_1979010100.nc 2025-09-16 17:28:10,053 - root - INFO - add_landmask True 2025-09-16 17:28:10,053 - root - INFO - landmask_path /global/cfs/cdirs/m3522/cmip6/ERA5/e5.oper.invariant/197901/e5.oper.invariant.128_172_lsm.ll025sc.1979010100_1979010100.nc 2025-09-16 17:28:10,053 - root - INFO - log_to_screen True 2025-09-16 17:28:10,053 - root - INFO - log_to_wandb True 2025-09-16 17:28:10,053 - root - INFO - log_video 20 2025-09-16 17:28:10,053 - root - INFO - save_checkpoint legacy 2025-09-16 17:28:10,053 - root - INFO - optimizer_type AdamW 2025-09-16 17:28:10,053 - root - INFO - optimizer_beta1 0.9 2025-09-16 17:28:10,053 - root - INFO - optimizer_beta2 0.95 2025-09-16 17:28:10,053 - root - INFO - optimizer_max_grad_norm 32 2025-09-16 17:28:10,053 - root - INFO - crop_size_x None 2025-09-16 17:28:10,054 - root - INFO - crop_size_y None 2025-09-16 17:28:10,054 - root - INFO - inf_data_path /pscratch/sd/p/pharring/74var-6hourly/staging/out_of_sample 2025-09-16 17:28:10,054 - root - INFO - wandb_name None 2025-09-16 17:28:10,054 - root - INFO - wandb_project ERA5_sfno 2025-09-16 17:28:10,054 - root - INFO - wandb_entity weatherbenching 2025-09-16 17:28:10,054 - root - INFO - pos_drop_rate 0.1 2025-09-16 17:28:10,054 - root - INFO - initialization_seed None 2025-09-16 17:28:10,054 - root - INFO - epsilon_factor 0 2025-09-16 17:28:10,054 - root - INFO - fin_parallel_size 1 2025-09-16 17:28:10,054 - root - INFO - fout_parallel_size 1 2025-09-16 17:28:10,054 - root - INFO - h_parallel_size 4 2025-09-16 17:28:10,054 - root - INFO - w_parallel_size 1 2025-09-16 17:28:10,054 - root - INFO - model_parallel_sizes [4, 1, 1, 1] 2025-09-16 17:28:10,054 - root - INFO - model_parallel_names ['h', 'w', 'fin', 'fout'] 2025-09-16 17:28:10,054 - root - INFO - parameters_reduction_buffer_count 1 2025-09-16 17:28:10,054 - root - INFO - load_checkpoint legacy 2025-09-16 17:28:10,054 - root - INFO - world_size 16 2025-09-16 17:28:10,054 - root - INFO - global_batch_size 4 2025-09-16 17:28:10,054 - root - INFO - experiment_dir /pscratch/sd/a/amahesh/fcn_training/modulus-makani_runs-0.1.0gmd-fcndev_stats/multistep_sfno_linear_74chq_sc2_layers8_edim620_wstgl2/v0.1.0-seed102 2025-09-16 17:28:10,054 - root - INFO - checkpoint_path /pscratch/sd/a/amahesh/fcn_training/modulus-makani_runs-0.1.0gmd-fcndev_stats/multistep_sfno_linear_74chq_sc2_layers8_edim620_wstgl2/v0.1.0-seed102/training_checkpoints/ckpt_mp{mp_rank}.tar 2025-09-16 17:28:10,054 - root - INFO - best_checkpoint_path /pscratch/sd/a/amahesh/fcn_training/modulus-makani_runs-0.1.0gmd-fcndev_stats/multistep_sfno_linear_74chq_sc2_layers8_edim620_wstgl2/v0.1.0-seed102/training_checkpoints/best_ckpt_mp{mp_rank}.tar 2025-09-16 17:28:10,054 - root - INFO - resuming False 2025-09-16 17:28:10,055 - root - INFO - amp_mode bf16 2025-09-16 17:28:10,055 - root - INFO - jit_mode none 2025-09-16 17:28:10,055 - root - INFO - cuda_graph_mode none 2025-09-16 17:28:10,055 - root - INFO - skip_validation False 2025-09-16 17:28:10,055 - root - INFO - enable_odirect False 2025-09-16 17:28:10,055 - root - INFO - checkpointing 0 2025-09-16 17:28:10,055 - root - INFO - enable_synthetic_data False 2025-09-16 17:28:10,055 - root - INFO - split_data_channels False 2025-09-16 17:28:10,055 - root - INFO - print_timings_frequency -1 2025-09-16 17:28:10,055 - root - INFO - multistep_count 2 2025-09-16 17:28:10,055 - root - INFO - n_future 1 2025-09-16 17:28:10,055 - root - INFO - enable_benchy False 2025-09-16 17:28:10,055 - root - INFO - disable_ddp False 2025-09-16 17:28:10,055 - root - INFO - enable_grad_anomaly_detection False 2025-09-16 17:28:10,055 - root - INFO - wandb_dir /pscratch/sd/a/amahesh/fcn_training/modulus-makani_runs-0.1.0gmd-fcndev_stats/multistep_sfno_linear_74chq_sc2_layers8_edim620_wstgl2/v0.1.0-seed102 2025-09-16 17:28:10,055 - root - INFO - _yaml_filename /global/u2/a/amahesh/ms_finetune/modulus-makani-fork/config/sfnonet.yaml 2025-09-16 17:28:10,055 - root - INFO - _config_name multistep_sfno_linear_74chq_sc2_layers8_edim620_wstgl2 2025-09-16 17:28:10,055 - root - INFO - --------------------------------------------------- 2025-09-16 17:28:11,056 - root - INFO - Enabling automatic mixed precision in bf16. 2025-09-16 17:28:42,931 - root - INFO - Using channel names: ['u10m', 'v10m', 'u100m', 'v100m', 't2m', 'sp', 'msl', 'tcwv', 'd2m', 'u50', 'u100', 'u150', 'u200', 'u250', 'u300', 'u400', 'u500', 'u600', 'u700', 'u850', 'u925', 'u1000', 'v50', 'v100', 'v150', 'v200', 'v250', 'v300', 'v400', 'v500', 'v600', 'v700', 'v850', 'v925', 'v1000', 'z50', 'z100', 'z150', 'z200', 'z250', 'z300', 'z400', 'z500', 'z600', 'z700', 'z850', 'z925', 'z1000', 't50', 't100', 't150', 't200', 't250', 't300', 't400', 't500', 't600', 't700', 't850', 't925', 't1000', 'q50', 'q100', 'q150', 'q200', 'q250', 'q300', 'q400', 'q500', 'q600', 'q700', 'q850', 'q925', 'q1000'] 2025-09-16 17:28:42,931 - root - INFO - initializing data loader 2025-09-16 17:28:45,133 - root - INFO - Getting file stats from /pscratch/sd/p/pharring/74var-6hourly/staging/train/1979.h5 2025-09-16 17:28:45,157 - root - INFO - Average number of samples per year: 1461.0 2025-09-16 17:28:45,158 - root - INFO - Found data at path ['/pscratch/sd/p/pharring/74var-6hourly/staging/train']. Number of examples: 54056. Full image Shape: 721 x 1440 x 74. Read Shape: 181 x 1440 x 74 2025-09-16 17:28:45,158 - root - INFO - Using 54056 from the total number of available samples with 54000 samples per epoch (corresponds to 13500 steps for 4 shards with local batch size 1) 2025-09-16 17:28:45,158 - root - INFO - Delta t: 6 hours 2025-09-16 17:28:45,158 - root - INFO - Including 6 hours of past history in training at a frequency of 6 hours 2025-09-16 17:28:45,158 - root - INFO - Including 12 hours of future targets in training at a frequency of 6 hours 2025-09-16 17:29:10,883 - root - INFO - Getting file stats from /pscratch/sd/p/pharring/74var-6hourly/staging/valid/2016.h5 2025-09-16 17:29:10,885 - root - INFO - Average number of samples per year: 1462.0 2025-09-16 17:29:10,886 - root - INFO - Found data at path ['/pscratch/sd/p/pharring/74var-6hourly/staging/valid']. Number of examples: 2924. Full image Shape: 721 x 1440 x 74. Read Shape: 181 x 1440 x 74 2025-09-16 17:29:10,886 - root - INFO - Using 2924 from the total number of available samples with 2924 samples per epoch (corresponds to 731 steps for 4 shards with local batch size 1) 2025-09-16 17:29:10,886 - root - INFO - Delta t: 6 hours 2025-09-16 17:29:10,886 - root - INFO - Including 6 hours of past history in training at a frequency of 6 hours 2025-09-16 17:29:10,887 - root - INFO - Including 12 hours of future targets in training at a frequency of 6 hours 2025-09-16 17:29:33,562 - root - INFO - data loader initialized 2025-09-16 17:29:39,854 - root - INFO - Auxiliary channel names: ['xzen', 'xgrlat', 'xgrlon', 'xoro', 'xlsml', 'xlsms'] 2025-09-16 17:29:41,097 - root - INFO - MultiStepWrapper( (preprocessor): Preprocessor2D() (model): SphericalFourierNeuralOperatorNet( (trans_down): DistributedRealSHT( nlat=721, nlon=1440, lmax=360, mmax=361, grid=equiangular, csphase=True ) (itrans_up): DistributedInverseRealSHT( nlat=721, nlon=1440, lmax=360, mmax=361, grid=equiangular, csphase=True ) (trans): DistributedRealSHT( nlat=360, nlon=720, lmax=360, mmax=361, grid=legendre-gauss, csphase=True ) (itrans): DistributedInverseRealSHT( nlat=360, nlon=720, lmax=360, mmax=361, grid=legendre-gauss, csphase=True ) (encoder): EncoderDecoder( (fwd): Sequential( (0): Conv2d(110, 620, kernel_size=(1, 1), stride=(1, 1)) (1): GELU(approximate='none') (2): Conv2d(620, 620, kernel_size=(1, 1), stride=(1, 1), bias=False) ) ) (pos_drop): Dropout(p=0.1, inplace=False) (blocks): ModuleList( (0): FourierNeuralOperatorBlock( (norm0): DistributedInstanceNorm2d() (filter): SpectralFilterLayer( (filter): SpectralConv( (forward_transform): DistributedRealSHT( nlat=721, nlon=1440, lmax=360, mmax=361, grid=equiangular, csphase=True ) (inverse_transform): DistributedInverseRealSHT( nlat=360, nlon=720, lmax=360, mmax=361, grid=legendre-gauss, csphase=True ) ) ) (act_layer0): GELU(approximate='none') (norm1): DistributedInstanceNorm2d() (outer_skip): Conv2d(620, 620, kernel_size=(1, 1), stride=(1, 1), bias=False) (mlp): MLP( (fwd): Sequential( (0): Conv2d(620, 1240, kernel_size=(1, 1), stride=(1, 1)) (1): GELU(approximate='none') (2): Identity() (3): Conv2d(1240, 620, kernel_size=(1, 1), stride=(1, 1)) (4): Identity() ) ) (drop_path): Identity() ) (1-6): 6 x FourierNeuralOperatorBlock( (norm0): DistributedInstanceNorm2d() (filter): SpectralFilterLayer( (filter): SpectralConv( (forward_transform): DistributedRealSHT( nlat=360, nlon=720, lmax=360, mmax=361, grid=legendre-gauss, csphase=True ) (inverse_transform): DistributedInverseRealSHT( nlat=360, nlon=720, lmax=360, mmax=361, grid=legendre-gauss, csphase=True ) ) ) (act_layer0): GELU(approximate='none') (norm1): DistributedInstanceNorm2d() (outer_skip): Conv2d(620, 620, kernel_size=(1, 1), stride=(1, 1), bias=False) (mlp): MLP( (fwd): Sequential( (0): Conv2d(620, 1240, kernel_size=(1, 1), stride=(1, 1)) (1): GELU(approximate='none') (2): Identity() (3): Conv2d(1240, 620, kernel_size=(1, 1), stride=(1, 1)) (4): Identity() ) ) (drop_path): Identity() ) (7): FourierNeuralOperatorBlock( (norm0): DistributedInstanceNorm2d() (filter): SpectralFilterLayer( (filter): SpectralConv( (forward_transform): DistributedRealSHT( nlat=360, nlon=720, lmax=360, mmax=361, grid=legendre-gauss, csphase=True ) (inverse_transform): DistributedInverseRealSHT( nlat=721, nlon=1440, lmax=360, mmax=361, grid=equiangular, csphase=True ) ) ) (act_layer0): GELU(approximate='none') (norm1): DistributedInstanceNorm2d() (outer_skip): Conv2d(620, 620, kernel_size=(1, 1), stride=(1, 1), bias=False) (mlp): MLP( (fwd): Sequential( (0): Conv2d(620, 1240, kernel_size=(1, 1), stride=(1, 1)) (1): GELU(approximate='none') (2): Identity() (3): Conv2d(1240, 620, kernel_size=(1, 1), stride=(1, 1)) (4): Identity() ) ) (drop_path): Identity() ) ) (decoder): EncoderDecoder( (fwd): Sequential( (0): Conv2d(620, 620, kernel_size=(1, 1), stride=(1, 1)) (1): GELU(approximate='none') (2): Conv2d(620, 74, kernel_size=(1, 1), stride=(1, 1), bias=False) ) ) (residual_transform): Conv2d(110, 74, kernel_size=(1, 1), stride=(1, 1), bias=False) ) ) 2025-09-16 17:29:41,907 - root - INFO - using AdamW 2025-09-16 17:29:42,845 - root - INFO - Loading checkpoint /pscratch/sd/a/amahesh/recovered_fcn_training/sfno_linear_74chq_sc2_layers8_edim620_wstgl2/v0.1.0-seed77/training_checkpoints/best_ckpt_mp0.tar in legacy mode 2025-09-16 17:31:39,001 - root - INFO - --------------- Versions --------------- 2025-09-16 17:31:39,037 - root - INFO - git branch: b'* multicheckpoint' 2025-09-16 17:31:39,053 - root - INFO - git hash: b'4b7dcddea9084b60c41957440a9f3e14f7d2567d' 2025-09-16 17:31:39,053 - root - INFO - Torch: 2.2.0a0+6a974be 2025-09-16 17:31:39,053 - root - INFO - ---------------------------------------- 2025-09-16 17:31:39,053 - root - INFO - ------------------ Configuration ------------------ 2025-09-16 17:31:39,053 - root - INFO - Configuration file: /global/u2/a/amahesh/ms_finetune/modulus-makani-fork/config/sfnonet.yaml 2025-09-16 17:31:39,053 - root - INFO - Configuration name: multistep_sfno_linear_74chq_sc2_layers8_edim620_wstgl2 2025-09-16 17:31:39,053 - root - INFO - wandb_group multistep_sfno_linear_74chq_sc2_layers8_edim620_wstgl2-0.1.0 2025-09-16 17:31:39,053 - root - INFO - scheduler CosineAnnealingLR 2025-09-16 17:31:39,053 - root - INFO - max_epochs 20 2025-09-16 17:31:39,053 - root - INFO - scheduler_T_max 20 2025-09-16 17:31:39,053 - root - INFO - lr 0.0001 2025-09-16 17:31:39,053 - root - INFO - load_counters False 2025-09-16 17:31:39,054 - root - INFO - load_optimizer False 2025-09-16 17:31:39,054 - root - INFO - load_scheduler False 2025-09-16 17:31:39,054 - root - INFO - finetune True 2025-09-16 17:31:39,054 - root - INFO - pretrained_checkpoint_path /pscratch/sd/a/amahesh/recovered_fcn_training/modulus-makani_runs-0.1.0-fcndev_stats/sfno_linear_74chq_sc2_layers8_edim620_wstgl2/v0.1.0-seed77/training_checkpoints/best_ckpt_mp0.tar 2025-09-16 17:31:39,054 - root - INFO - embed_dim 620 2025-09-16 17:31:39,054 - root - INFO - num_layers 8 2025-09-16 17:31:39,054 - root - INFO - scale_factor 2 2025-09-16 17:31:39,054 - root - INFO - hard_thresholding_fraction 1.0 2025-09-16 17:31:39,054 - root - INFO - loss weighted squared temp-std geometric l2 2025-09-16 17:31:39,054 - root - INFO - valid_autoreg_steps 1 2025-09-16 17:31:39,054 - root - INFO - metadata_json_path /pscratch/sd/p/pharring/74var-6hourly/staging/data.json 2025-09-16 17:31:39,054 - root - INFO - train_data_path /pscratch/sd/p/pharring/74var-6hourly/staging/train 2025-09-16 17:31:39,054 - root - INFO - valid_data_path /pscratch/sd/p/pharring/74var-6hourly/staging/valid 2025-09-16 17:31:39,054 - root - INFO - exp_dir /pscratch/sd/a/amahesh/fcn_training/modulus-makani_runs-0.1.0gmd-fcndev_stats/ 2025-09-16 17:31:39,054 - root - INFO - n_years 1 2025-09-16 17:31:39,054 - root - INFO - img_shape_x 721 2025-09-16 17:31:39,054 - root - INFO - img_shape_y 1440 2025-09-16 17:31:39,054 - root - INFO - min_path /pscratch/sd/p/pharring/74var-6hourly/staging/stats/mins.npy 2025-09-16 17:31:39,054 - root - INFO - max_path /pscratch/sd/p/pharring/74var-6hourly/staging/stats/maxs.npy 2025-09-16 17:31:39,054 - root - INFO - time_means_path /pscratch/sd/p/pharring/74var-6hourly/staging/stats_fcndev/time_means.npy 2025-09-16 17:31:39,054 - root - INFO - global_means_path /pscratch/sd/p/pharring/74var-6hourly/staging/stats_fcndev/global_means.npy 2025-09-16 17:31:39,055 - root - INFO - global_stds_path /pscratch/sd/p/pharring/74var-6hourly/staging/stats_fcndev/global_stds.npy 2025-09-16 17:31:39,055 - root - INFO - time_diff_means_path /pscratch/sd/p/pharring/74var-6hourly/staging/stats_fcndev/time_diff_means.npy 2025-09-16 17:31:39,055 - root - INFO - time_diff_stds_path /pscratch/sd/p/pharring/74var-6hourly/staging/stats_fcndev/time_diff_stds.npy 2025-09-16 17:31:39,055 - root - INFO - nettype SFNO 2025-09-16 17:31:39,055 - root - INFO - model_grid_type equiangular 2025-09-16 17:31:39,055 - root - INFO - sht_grid_type legendre-gauss 2025-09-16 17:31:39,055 - root - INFO - filter_type linear 2025-09-16 17:31:39,055 - root - INFO - complex_activation real 2025-09-16 17:31:39,055 - root - INFO - normalization_layer instance_norm 2025-09-16 17:31:39,055 - root - INFO - use_mlp True 2025-09-16 17:31:39,055 - root - INFO - mlp_mode serial 2025-09-16 17:31:39,055 - root - INFO - mlp_ratio 2 2025-09-16 17:31:39,055 - root - INFO - separable False 2025-09-16 17:31:39,055 - root - INFO - operator_type dhconv 2025-09-16 17:31:39,055 - root - INFO - activation_function gelu 2025-09-16 17:31:39,055 - root - INFO - pos_embed none 2025-09-16 17:31:39,055 - root - INFO - channel_weights auto 2025-09-16 17:31:39,055 - root - INFO - n_eval_samples 8760 2025-09-16 17:31:39,055 - root - INFO - batch_size 1 2025-09-16 17:31:39,055 - root - INFO - weight_decay 0.0 2025-09-16 17:31:39,055 - root - INFO - scheduler_factor 0.1 2025-09-16 17:31:39,056 - root - INFO - scheduler_patience 10 2025-09-16 17:31:39,056 - root - INFO - scheduler_step_size 100 2025-09-16 17:31:39,056 - root - INFO - scheduler_gamma 0.5 2025-09-16 17:31:39,056 - root - INFO - lr_warmup_steps 0 2025-09-16 17:31:39,056 - root - INFO - verbose False 2025-09-16 17:31:39,056 - root - INFO - wireup_info mpi 2025-09-16 17:31:39,056 - root - INFO - wireup_store tcp 2025-09-16 17:31:39,056 - root - INFO - num_data_workers 2 2025-09-16 17:31:39,056 - root - INFO - num_visualization_workers 2 2025-09-16 17:31:39,056 - root - INFO - dt 1 2025-09-16 17:31:39,056 - root - INFO - n_history 0 2025-09-16 17:31:39,056 - root - INFO - prediction_type iterative 2025-09-16 17:31:39,056 - root - INFO - prediction_length 35 2025-09-16 17:31:39,056 - root - INFO - n_initial_conditions 5 2025-09-16 17:31:39,056 - root - INFO - n_train_samples_per_epoch 54000 2025-09-16 17:31:39,056 - root - INFO - ics_type specify_number 2025-09-16 17:31:39,056 - root - INFO - save_raw_forecasts True 2025-09-16 17:31:39,056 - root - INFO - save_channel False 2025-09-16 17:31:39,056 - root - INFO - masked_acc False 2025-09-16 17:31:39,056 - root - INFO - maskpath None 2025-09-16 17:31:39,056 - root - INFO - perturb False 2025-09-16 17:31:39,056 - root - INFO - add_noise False 2025-09-16 17:31:39,057 - root - INFO - noise_std 0.0 2025-09-16 17:31:39,057 - root - INFO - target default 2025-09-16 17:31:39,057 - root - INFO - normalize_residual False 2025-09-16 17:31:39,057 - root - INFO - channel_names ['u10m', 'v10m', 'u100m', 'v100m', 't2m', 'sp', 'msl', 'tcwv', 'd2m', 'u50', 'u100', 'u150', 'u200', 'u250', 'u300', 'u400', 'u500', 'u600', 'u700', 'u850', 'u925', 'u1000', 'v50', 'v100', 'v150', 'v200', 'v250', 'v300', 'v400', 'v500', 'v600', 'v700', 'v850', 'v925', 'v1000', 'z50', 'z100', 'z150', 'z200', 'z250', 'z300', 'z400', 'z500', 'z600', 'z700', 'z850', 'z925', 'z1000', 't50', 't100', 't150', 't200', 't250', 't300', 't400', 't500', 't600', 't700', 't850', 't925', 't1000', 'q50', 'q100', 'q150', 'q200', 'q250', 'q300', 'q400', 'q500', 'q600', 'q700', 'q850', 'q925', 'q1000'] 2025-09-16 17:31:39,057 - root - INFO - normalization zscore 2025-09-16 17:31:39,057 - root - INFO - add_grid True 2025-09-16 17:31:39,057 - root - INFO - gridtype sinusoidal 2025-09-16 17:31:39,057 - root - INFO - grid_num_frequencies 16 2025-09-16 17:31:39,057 - root - INFO - roll False 2025-09-16 17:31:39,057 - root - INFO - add_zenith True 2025-09-16 17:31:39,057 - root - INFO - add_orography True 2025-09-16 17:31:39,057 - root - INFO - orography_path /global/cfs/cdirs/m3522/cmip6/ERA5/e5.oper.invariant/197901/e5.oper.invariant.128_129_z.ll025sc.1979010100_1979010100.nc 2025-09-16 17:31:39,057 - root - INFO - add_landmask True 2025-09-16 17:31:39,057 - root - INFO - landmask_path /global/cfs/cdirs/m3522/cmip6/ERA5/e5.oper.invariant/197901/e5.oper.invariant.128_172_lsm.ll025sc.1979010100_1979010100.nc 2025-09-16 17:31:39,057 - root - INFO - log_to_screen True 2025-09-16 17:31:39,057 - root - INFO - log_to_wandb True 2025-09-16 17:31:39,057 - root - INFO - log_video 20 2025-09-16 17:31:39,057 - root - INFO - save_checkpoint legacy 2025-09-16 17:31:39,057 - root - INFO - optimizer_type AdamW 2025-09-16 17:31:39,057 - root - INFO - optimizer_beta1 0.9 2025-09-16 17:31:39,058 - root - INFO - optimizer_beta2 0.95 2025-09-16 17:31:39,058 - root - INFO - optimizer_max_grad_norm 32 2025-09-16 17:31:39,058 - root - INFO - crop_size_x None 2025-09-16 17:31:39,058 - root - INFO - crop_size_y None 2025-09-16 17:31:39,058 - root - INFO - inf_data_path /pscratch/sd/p/pharring/74var-6hourly/staging/out_of_sample 2025-09-16 17:31:39,058 - root - INFO - wandb_name None 2025-09-16 17:31:39,058 - root - INFO - wandb_project ERA5_sfno 2025-09-16 17:31:39,058 - root - INFO - wandb_entity weatherbenching 2025-09-16 17:31:39,058 - root - INFO - pos_drop_rate 0.1 2025-09-16 17:31:39,058 - root - INFO - initialization_seed None 2025-09-16 17:31:39,058 - root - INFO - epsilon_factor 0 2025-09-16 17:31:39,058 - root - INFO - fin_parallel_size 1 2025-09-16 17:31:39,058 - root - INFO - fout_parallel_size 1 2025-09-16 17:31:39,058 - root - INFO - h_parallel_size 4 2025-09-16 17:31:39,058 - root - INFO - w_parallel_size 1 2025-09-16 17:31:39,058 - root - INFO - model_parallel_sizes [4, 1, 1, 1] 2025-09-16 17:31:39,058 - root - INFO - model_parallel_names ['h', 'w', 'fin', 'fout'] 2025-09-16 17:31:39,058 - root - INFO - parameters_reduction_buffer_count 1 2025-09-16 17:31:39,058 - root - INFO - load_checkpoint legacy 2025-09-16 17:31:39,058 - root - INFO - world_size 16 2025-09-16 17:31:39,058 - root - INFO - global_batch_size 4 2025-09-16 17:31:39,058 - root - INFO - experiment_dir /pscratch/sd/a/amahesh/fcn_training/modulus-makani_runs-0.1.0gmd-fcndev_stats/multistep_sfno_linear_74chq_sc2_layers8_edim620_wstgl2/v0.1.0-seed102 2025-09-16 17:31:39,059 - root - INFO - checkpoint_path /pscratch/sd/a/amahesh/fcn_training/modulus-makani_runs-0.1.0gmd-fcndev_stats/multistep_sfno_linear_74chq_sc2_layers8_edim620_wstgl2/v0.1.0-seed102/training_checkpoints/ckpt_mp{mp_rank}.tar 2025-09-16 17:31:39,059 - root - INFO - best_checkpoint_path /pscratch/sd/a/amahesh/fcn_training/modulus-makani_runs-0.1.0gmd-fcndev_stats/multistep_sfno_linear_74chq_sc2_layers8_edim620_wstgl2/v0.1.0-seed102/training_checkpoints/best_ckpt_mp{mp_rank}.tar 2025-09-16 17:31:39,059 - root - INFO - resuming False 2025-09-16 17:31:39,059 - root - INFO - amp_mode bf16 2025-09-16 17:31:39,059 - root - INFO - jit_mode none 2025-09-16 17:31:39,059 - root - INFO - cuda_graph_mode none 2025-09-16 17:31:39,059 - root - INFO - skip_validation False 2025-09-16 17:31:39,059 - root - INFO - enable_odirect False 2025-09-16 17:31:39,059 - root - INFO - checkpointing 0 2025-09-16 17:31:39,059 - root - INFO - enable_synthetic_data False 2025-09-16 17:31:39,059 - root - INFO - split_data_channels False 2025-09-16 17:31:39,059 - root - INFO - print_timings_frequency -1 2025-09-16 17:31:39,059 - root - INFO - multistep_count 2 2025-09-16 17:31:39,059 - root - INFO - n_future 1 2025-09-16 17:31:39,059 - root - INFO - enable_benchy False 2025-09-16 17:31:39,059 - root - INFO - disable_ddp False 2025-09-16 17:31:39,059 - root - INFO - enable_grad_anomaly_detection False 2025-09-16 17:31:39,059 - root - INFO - wandb_dir /pscratch/sd/a/amahesh/fcn_training/modulus-makani_runs-0.1.0gmd-fcndev_stats/multistep_sfno_linear_74chq_sc2_layers8_edim620_wstgl2/v0.1.0-seed102 2025-09-16 17:31:39,059 - root - INFO - _yaml_filename /global/u2/a/amahesh/ms_finetune/modulus-makani-fork/config/sfnonet.yaml 2025-09-16 17:31:39,059 - root - INFO - _config_name multistep_sfno_linear_74chq_sc2_layers8_edim620_wstgl2 2025-09-16 17:31:39,059 - root - INFO - --------------------------------------------------- 2025-09-16 17:31:39,806 - root - INFO - Enabling automatic mixed precision in bf16. 2025-09-16 17:32:23,290 - root - INFO - Using channel names: ['u10m', 'v10m', 'u100m', 'v100m', 't2m', 'sp', 'msl', 'tcwv', 'd2m', 'u50', 'u100', 'u150', 'u200', 'u250', 'u300', 'u400', 'u500', 'u600', 'u700', 'u850', 'u925', 'u1000', 'v50', 'v100', 'v150', 'v200', 'v250', 'v300', 'v400', 'v500', 'v600', 'v700', 'v850', 'v925', 'v1000', 'z50', 'z100', 'z150', 'z200', 'z250', 'z300', 'z400', 'z500', 'z600', 'z700', 'z850', 'z925', 'z1000', 't50', 't100', 't150', 't200', 't250', 't300', 't400', 't500', 't600', 't700', 't850', 't925', 't1000', 'q50', 'q100', 'q150', 'q200', 'q250', 'q300', 'q400', 'q500', 'q600', 'q700', 'q850', 'q925', 'q1000'] 2025-09-16 17:32:23,291 - root - INFO - initializing data loader 2025-09-16 17:32:25,382 - root - INFO - Getting file stats from /pscratch/sd/p/pharring/74var-6hourly/staging/train/1979.h5 2025-09-16 17:32:25,401 - root - INFO - Average number of samples per year: 1461.0 2025-09-16 17:32:25,402 - root - INFO - Found data at path ['/pscratch/sd/p/pharring/74var-6hourly/staging/train']. Number of examples: 54056. Full image Shape: 721 x 1440 x 74. Read Shape: 181 x 1440 x 74 2025-09-16 17:32:25,402 - root - INFO - Using 54056 from the total number of available samples with 54000 samples per epoch (corresponds to 13500 steps for 4 shards with local batch size 1) 2025-09-16 17:32:25,402 - root - INFO - Delta t: 6 hours 2025-09-16 17:32:25,402 - root - INFO - Including 6 hours of past history in training at a frequency of 6 hours 2025-09-16 17:32:25,403 - root - INFO - Including 12 hours of future targets in training at a frequency of 6 hours 2025-09-16 17:32:49,853 - root - INFO - Getting file stats from /pscratch/sd/p/pharring/74var-6hourly/staging/valid/2016.h5 2025-09-16 17:32:49,856 - root - INFO - Average number of samples per year: 1462.0 2025-09-16 17:32:49,856 - root - INFO - Found data at path ['/pscratch/sd/p/pharring/74var-6hourly/staging/valid']. Number of examples: 2924. Full image Shape: 721 x 1440 x 74. Read Shape: 181 x 1440 x 74 2025-09-16 17:32:49,856 - root - INFO - Using 2924 from the total number of available samples with 2924 samples per epoch (corresponds to 731 steps for 4 shards with local batch size 1) 2025-09-16 17:32:49,856 - root - INFO - Delta t: 6 hours 2025-09-16 17:32:49,857 - root - INFO - Including 6 hours of past history in training at a frequency of 6 hours 2025-09-16 17:32:49,857 - root - INFO - Including 12 hours of future targets in training at a frequency of 6 hours 2025-09-16 17:33:12,639 - root - INFO - data loader initialized 2025-09-16 17:33:19,147 - root - INFO - Auxiliary channel names: ['xzen', 'xgrlat', 'xgrlon', 'xoro', 'xlsml', 'xlsms'] 2025-09-16 17:33:20,334 - root - INFO - MultiStepWrapper( (preprocessor): Preprocessor2D() (model): SphericalFourierNeuralOperatorNet( (trans_down): DistributedRealSHT( nlat=721, nlon=1440, lmax=360, mmax=361, grid=equiangular, csphase=True ) (itrans_up): DistributedInverseRealSHT( nlat=721, nlon=1440, lmax=360, mmax=361, grid=equiangular, csphase=True ) (trans): DistributedRealSHT( nlat=360, nlon=720, lmax=360, mmax=361, grid=legendre-gauss, csphase=True ) (itrans): DistributedInverseRealSHT( nlat=360, nlon=720, lmax=360, mmax=361, grid=legendre-gauss, csphase=True ) (encoder): EncoderDecoder( (fwd): Sequential( (0): Conv2d(110, 620, kernel_size=(1, 1), stride=(1, 1)) (1): GELU(approximate='none') (2): Conv2d(620, 620, kernel_size=(1, 1), stride=(1, 1), bias=False) ) ) (pos_drop): Dropout(p=0.1, inplace=False) (blocks): ModuleList( (0): FourierNeuralOperatorBlock( (norm0): DistributedInstanceNorm2d() (filter): SpectralFilterLayer( (filter): SpectralConv( (forward_transform): DistributedRealSHT( nlat=721, nlon=1440, lmax=360, mmax=361, grid=equiangular, csphase=True ) (inverse_transform): DistributedInverseRealSHT( nlat=360, nlon=720, lmax=360, mmax=361, grid=legendre-gauss, csphase=True ) ) ) (act_layer0): GELU(approximate='none') (norm1): DistributedInstanceNorm2d() (outer_skip): Conv2d(620, 620, kernel_size=(1, 1), stride=(1, 1), bias=False) (mlp): MLP( (fwd): Sequential( (0): Conv2d(620, 1240, kernel_size=(1, 1), stride=(1, 1)) (1): GELU(approximate='none') (2): Identity() (3): Conv2d(1240, 620, kernel_size=(1, 1), stride=(1, 1)) (4): Identity() ) ) (drop_path): Identity() ) (1-6): 6 x FourierNeuralOperatorBlock( (norm0): DistributedInstanceNorm2d() (filter): SpectralFilterLayer( (filter): SpectralConv( (forward_transform): DistributedRealSHT( nlat=360, nlon=720, lmax=360, mmax=361, grid=legendre-gauss, csphase=True ) (inverse_transform): DistributedInverseRealSHT( nlat=360, nlon=720, lmax=360, mmax=361, grid=legendre-gauss, csphase=True ) ) ) (act_layer0): GELU(approximate='none') (norm1): DistributedInstanceNorm2d() (outer_skip): Conv2d(620, 620, kernel_size=(1, 1), stride=(1, 1), bias=False) (mlp): MLP( (fwd): Sequential( (0): Conv2d(620, 1240, kernel_size=(1, 1), stride=(1, 1)) (1): GELU(approximate='none') (2): Identity() (3): Conv2d(1240, 620, kernel_size=(1, 1), stride=(1, 1)) (4): Identity() ) ) (drop_path): Identity() ) (7): FourierNeuralOperatorBlock( (norm0): DistributedInstanceNorm2d() (filter): SpectralFilterLayer( (filter): SpectralConv( (forward_transform): DistributedRealSHT( nlat=360, nlon=720, lmax=360, mmax=361, grid=legendre-gauss, csphase=True ) (inverse_transform): DistributedInverseRealSHT( nlat=721, nlon=1440, lmax=360, mmax=361, grid=equiangular, csphase=True ) ) ) (act_layer0): GELU(approximate='none') (norm1): DistributedInstanceNorm2d() (outer_skip): Conv2d(620, 620, kernel_size=(1, 1), stride=(1, 1), bias=False) (mlp): MLP( (fwd): Sequential( (0): Conv2d(620, 1240, kernel_size=(1, 1), stride=(1, 1)) (1): GELU(approximate='none') (2): Identity() (3): Conv2d(1240, 620, kernel_size=(1, 1), stride=(1, 1)) (4): Identity() ) ) (drop_path): Identity() ) ) (decoder): EncoderDecoder( (fwd): Sequential( (0): Conv2d(620, 620, kernel_size=(1, 1), stride=(1, 1)) (1): GELU(approximate='none') (2): Conv2d(620, 74, kernel_size=(1, 1), stride=(1, 1), bias=False) ) ) (residual_transform): Conv2d(110, 74, kernel_size=(1, 1), stride=(1, 1), bias=False) ) ) 2025-09-16 17:33:20,754 - root - INFO - using AdamW 2025-09-16 17:33:21,578 - root - INFO - Loading checkpoint /pscratch/sd/a/amahesh/recovered_fcn_training/modulus-makani_runs-0.1.0-fcndev_stats/sfno_linear_74chq_sc2_layers8_edim620_wstgl2/v0.1.0-seed77/training_checkpoints/best_ckpt_mp0.tar in legacy mode 2025-09-16 17:33:28,013 - root - INFO - Number of trainable model parameters: 1123374980 2025-09-16 17:33:28,014 - root - INFO - Scaffolding memory high watermark: 12.631591796875 GB (6.0722336769104 GB for pytorch) 2025-09-16 17:33:28,015 - root - INFO - Starting Training Loop... 2025-09-16 17:55:00,025 - py.warnings - WARNING - /usr/local/lib/python3.10/dist-packages/wandb/wandb_torch.py:191: UserWarning: Casting complex values to real discards the imaginary part (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/native/Copy.cpp:299.) flat = flat.type(torch.cuda.FloatTensor)