Acosta, Mario C., Sergi Palomas, Stella V Paronuzzi Ticco, Gladys Utrera, Joachim Biercamp, Pierre-Antoine Bretonniere, Reinhard Budich, Miguel Castrillo, Arnaud Caubel, Francisco J Doblas-Reyes, Italo Epicoco, Uwe Fladrich, Sylvie Joussaume, Alok Kumar Gupta, Bryan N Lawrence, Philippe Le Sager, Grenville Lister, Marie-Pierre Moine, Jean-Christophe Rioual, Sophie Valcke, Niki Zadeh, and V Balaji, April 2024: The computational and energy cost of simulation and storage for climate science: lessons from CMIP6. Geoscientific Model Development, 17(8), DOI:10.5194/gmd-17-3081-20243081–3098. Abstract
The Coupled Model Intercomparison Project (CMIP) is one of the biggest international efforts aimed at better understanding the past, present, and future of climate changes in a multi-model context. A total of 21 model intercomparison projects (MIPs) were endorsed in its sixth phase (CMIP6), which included 190 different experiments that were used to simulate 40 000 years and produced around 40 PB of data in total. This paper presents the main findings obtained from the CPMIP (the Computational Performance Model Intercomparison Project), a collection of a common set of metrics, specifically designed for assessing climate model performance. These metrics were exclusively collected from the production runs of experiments used in CMIP6 and primarily from institutions within the IS-ENES3 consortium. The document presents the full set of CPMIP metrics per institution and experiment, including a detailed analysis and discussion of each of the measurements. During the analysis, we found a positive correlation between the core hours needed, the complexity of the models, and the resolution used. Likewise, we show that between 5 %–15 % of the execution cost is spent in the coupling between independent components, and it only gets worse by increasing the number of resources. From the data, it is clear that queue times have a great impact on the actual speed achieved and have a huge variability across different institutions, ranging from none to up to 78 % execution overhead. Furthermore, our evaluation shows that the estimated carbon footprint of running such big simulations within the IS-ENES3 consortium is 1692 t of CO2 equivalent.
As a result of the collection, we contribute to the creation of a comprehensive database for future community reference, establishing a benchmark for evaluation and facilitating the multi-model, multi-platform comparisons crucial for understanding climate modelling performance. Given the diverse range of applications, configurations, and hardware utilised, further work is required for the standardisation and formulation of general rules. The paper concludes with recommendations for future exercises aimed at addressing the encountered challenges which will facilitate more collections of a similar nature.
Clare, Mariana C., Maike Sonnewald, Redouane Lguensat, Julie Deshayes, and V Balaji, November 2022: Explainable artificial intelligence for Bayesian neural networks: Toward trustworthy predictions of ocean dynamics. Journal of Advances in Modeling Earth Systems, 14(11), DOI:10.1029/2022MS003162. Abstract
The trustworthiness of neural networks is often challenged because they lack the ability to express uncertainty and explain their skill. This can be problematic given the increasing use of neural networks in high stakes decision-making such as in climate change applications. We address both issues by successfully implementing a Bayesian Neural Network (BNN), where parameters are distributions rather than deterministic, and applying novel implementations of explainable AI (XAI) techniques. The uncertainty analysis from the BNN provides a comprehensive overview of the prediction more suited to practitioners' needs than predictions from a classical neural network. Using a BNN means we can calculate the entropy (i.e., uncertainty) of the predictions and determine if the probability of an outcome is statistically significant. To enhance trustworthiness, we also spatially apply the two XAI techniques of Layer-wise Relevance Propagation (LRP) and SHapley Additive exPlanation (SHAP) values. These XAI methods reveal the extent to which the BNN is suitable and/or trustworthy. Using two techniques gives a more holistic view of BNN skill and its uncertainty, as LRP considers neural network parameters, whereas SHAP considers changes to outputs. We verify these techniques using comparison with intuition from physical theory. The differences in explanation identify potential areas where new physical theory guided studies are needed.
Sinha, Eva, Anna M Michalak, V Balaji, and Laure Resplandy, July 2022: India’s riverine nitrogen runoff strongly impacted by monsoon variability. Environmental Science & Technology, 56, 16, DOI:10.1021/acs.est.2c0127411335-11342. Abstract
Agricultural intensification in India has increased nitrogen pollution, leading to water quality impairments. The fate of reactive nitrogen applied to the land is largely unknown, however. Long-term records of riverine nitrogen fluxes are nonexistent and drivers of variability remain unexamined, limiting the development of nitrogen management strategies. Here, we leverage dissolved inorganic nitrogen (DIN) and discharge data to characterize the seasonal, annual, and regional variability of DIN fluxes and their drivers for seven major river basins from 1981 to 2014. We find large seasonal and interannual variability in nitrogen runoff, with 68% to 94% of DIN fluxes occurring in June through October and with the coefficient of variation across years ranging from 44% to 93% for individual basins. This variability is primarily explained by variability in precipitation, with year- and basin-specific annual precipitation explaining 52% of the combined regional and interannual variability. We find little correlation with rising fertilizer application rates in five of the seven basins, implying that agricultural intensification has thus far primarily impacted groundwater and atmospheric emissions rather than riverine runoff. These findings suggest that riverine nitrogen runoff in India is highly sensitive to projected future increases in precipitation and intensification of the seasonal monsoon, while the impact of projected continued land use intensification is highly uncertain.
Balaji, V, February 2021: Climbing down Charney's ladder: Machine learning and the post-Dennard era of computational climate science. Philosophical Transactions of the Royal Society A, DOI:10.1098/rsta.2020.0085. Abstract
The advent of digital computing in the 1950s sparked a revolution in the science of weather and climate. Meteorology, long based on extrapolating patterns in space and time, gave way to computational methods in a decade of advances in numerical weather forecasting. Those same methods also gave rise to computational climate science, studying the behaviour of those same numerical equations over intervals much longer than weather events, and changes in external boundary conditions. Several subsequent decades of exponential growth in computational power have brought us to the present day, where models ever grow in resolution and complexity, capable of mastery of many small-scale phenomena with global repercussions, and ever more intricate feedbacks in the Earth system. The current juncture in computing, seven decades later, heralds an end to what is called Dennard scaling, the physics behind ever smaller computational units and ever faster arithmetic. This is prompting a fundamental change in our approach to the simulation of weather and climate, potentially as revolutionary as that wrought by John von Neumann in the 1950s. One approach could return us to an earlier era of pattern recognition and extrapolation, this time aided by computational power. Another approach could lead us to insights that continue to be expressed in mathematical equations. In either approach, or any synthesis of those, it is clearly no longer the steady march of the last few decades, continuing to add detail to ever more elaborate models. In this prospectus, we attempt to show the outlines of how this may unfold in the coming decades, a new harnessing of physical knowledge, computation and data.
Efforts to manage living marine resources (LMRs) under climate change need projections of future ocean conditions, yet most global climate models (GCMs) poorly represent critical coastal habitats. GCM utility for LMR applications will increase with higher spatial resolution but obstacles including computational and data storage costs, obstinate regional biases, and formulations prioritizing global robustness over regional skill will persist. Downscaling can help address GCM limitations, but significant improvements are needed to robustly support LMR science and management. We synthesize past ocean downscaling efforts to suggest a protocol to achieve this goal. The protocol emphasizes LMR-driven design to ensure delivery of decision-relevant information. It prioritizes ensembles of downscaled projections spanning the range of ocean futures with durations long enough to capture climate change signals. This demands judicious resolution refinement, with pragmatic consideration for LMR-essential ocean features superseding theoretical investigation. Statistical downscaling can complement dynamical approaches in building these ensembles. Inconsistent use of bias correction indicates a need for objective best practices. Application of the suggested protocol should yield regional ocean projections that, with effective dissemination and translation to decision-relevant analytics, can robustly support LMR science and management under climate change.
Sonnewald, Maike, Redouane Lguensat, Daniel C Jones, Peter D Dueben, Julien Brajard, and V Balaji, July 2021: Bridging observations, theory and numerical simulation of the ocean using machine learning. Environmental Research Letters, 16(7), DOI:10.1088/1748-9326/ac0eb0. Abstract
Progress within physical oceanography has been concurrent with the increasing sophistication of tools available for its study. The incorporation of machine learning (ML) techniques offers exciting possibilities for advancing the capacity and speed of established methods and for making substantial and serendipitous discoveries. Beyond vast amounts of complex data ubiquitous in many modern scientific fields, the study of the ocean poses a combination of unique challenges that ML can help address. The observational data available is largely spatially sparse, limited to the surface, and with few time series spanning more than a handful of decades. Important timescales span seconds to millennia, with strong scale interactions and numerical modelling efforts complicated by details such as coastlines. This review covers the current scientific insight offered by applying ML and points to where there is imminent potential. We cover the main three branches of the field: observations, theory, and numerical modelling. Highlighting both challenges and opportunities, we discuss both the historical context and salient ML tools. We focus on the use of ML in situ sampling and satellite observations, and the extent to which ML applications can advance theoretical oceanographic exploration, as well as aid numerical simulations. Applications that are also covered include model error and bias correction and current and potential use within data assimilation. While not without risk, there is great interest in the potential benefits of oceanographic ML applications; this review caters to this interest within the research community.
Sonnewald, Maike, Redouane Lguensat, Aparna Radhakrishnan, Zoubero Sayibou, V Balaji, and Andrew T Wittenberg, 2021: Revealing the impact of global warming on climate modes using transparent machine learning and a suite of climate models In ICML 2021 Workshop on Tackling Climate Change with Machine Learning, . Abstract
https://www.climatechange.ai/papers/icml2021/13
Balaji, V, 2020: “Data science” versus physical science: is data technology leading us towards a new synthesis?Comptes Rendus Géoscience, 352(4-5), DOI:10.5802/crgeos.24297-308. Abstract
We live, it is said, in the age of “data science”. Machine learning (ML) from data astonishes us with its advances, such as autonomous vehicles and translation tools, and also worries us with its ability to monitor and interpret human faces, gestures and behaviors. In science, we are witnessing a new explosion of literature around machine learning, capable of interpreting massive amounts of data, otherwise known as “big data”. Some predict that numerical computation will soon be overtaken by ML as a tool for understanding and predicting dynamic systems.
No field of science is as closely related to HPC as meteorology and climate science. Their history dates back to the dawn of numerical computation, the technology that von Neumann and his colleagues pioneered in the post-war era. In this article, we will use the numerical simulation of the Earth system as an example to highlight some of the fundamental questions posed by machine learning. We will return to the history of meteorology to understand the dialectic between knowledge — our understanding of the atmosphere — and forecasting, for example the knowledge of the weather of the next day. This question is raised again today by learning, because it is not necessarily possible to interpret physically because it comes directly from the data. On the other hand, the central role of Earth system simulation to help us decipher the future of the planet and climate change, requires us to get out of the actuality of the data and make comparisons with fictitious Earths (without industrial emissions for example) and several leads to the future , what we call “scenarios”. Here observations do have a role, but it is often data from simulations that are analyzed. Finally, these climate data have a societal weight, and the democratization of access to them has grown strongly in recent years. We will show here some aspects of the evolution of simulation and data technologies and its important stakes for Earth system sciences. what we call “scenarios”. Here observations do have a role, but it is often data from simulations that are analyzed. Finally, these climate data have a societal weight, and the democratization of access to them has grown strongly in recent years. We will show here some aspects of the evolution of simulation and data technologies and its important stakes for Earth system sciences. what we call “scenarios”. Here observations do have a role, but it is often data from simulations that are analyzed. Finally, these climate data have a societal weight, and the democratization of access to them has grown strongly in recent years. We will show here some aspects of the evolution of simulation and data technologies and its important stakes for Earth system sciences.
We describe the baseline coupled model configuration and simulation characteristics of GFDL's Earth System Model Version 4.1 (ESM4.1), which builds on component and coupled model developments at GFDL over 2013–2018 for coupled carbon‐chemistry‐climate simulation contributing to the sixth phase of the Coupled Model Intercomparison Project. In contrast with GFDL's CM4.0 development effort that focuses on ocean resolution for physical climate, ESM4.1 focuses on comprehensiveness of Earth system interactions. ESM4.1 features doubled horizontal resolution of both atmosphere (2° to 1°) and ocean (1° to 0.5°) relative to GFDL's previous‐generation coupled ESM2‐carbon and CM3‐chemistry models. ESM4.1 brings together key representational advances in CM4.0 dynamics and physics along with those in aerosols and their precursor emissions, land ecosystem vegetation and canopy competition, and multiday fire; ocean ecological and biogeochemical interactions, comprehensive land‐atmosphere‐ocean cycling of CO2, dust and iron, and interactive ocean‐atmosphere nitrogen cycling are described in detail across this volume of JAMES and presented here in terms of the overall coupling and resulting fidelity. ESM4.1 provides much improved fidelity in CO2 and chemistry over ESM2 and CM3, captures most of CM4.0's baseline simulations characteristics, and notably improves on CM4.0 in (1) Southern Ocean mode and intermediate water ventilation, (2) Southern Ocean aerosols, and (3) reduced spurious ocean heat uptake. ESM4.1 has reduced transient and equilibrium climate sensitivity compared to CM4.0. Fidelity concerns include (1) moderate degradation in sea surface temperature biases, (2) degradation in aerosols in some regions, and (3) strong centennial scale climate modulation by Southern Ocean convection.
We document the configuration and emergent simulation features from the Geophysical Fluid Dynamics Laboratory (GFDL) OM4.0 ocean/sea‐ice model. OM4 serves as the ocean/sea‐ice component for the GFDL climate and Earth system models. It is also used for climate science research and is contributing to the Coupled Model Intercomparison Project version 6 Ocean Model Intercomparison Project (CMIP6/OMIP). The ocean component of OM4 uses version 6 of the Modular Ocean Model (MOM6) and the sea‐ice component uses version 2 of the Sea Ice Simulator (SIS2), which have identical horizontal grid layouts (Arakawa C‐grid). We follow the Coordinated Ocean‐sea ice Reference Experiments (CORE) protocol to assess simulation quality across a broad suite of climate relevant features. We present results from two versions differing by horizontal grid spacing and physical parameterizations: OM4p5 has nominal 0.5° spacing and includes mesoscale eddy parameterizations and OM4p25 has nominal 0.25° spacing with no mesoscale eddy parameterization.
MOM6 makes use of a vertical Lagrangian‐remap algorithm that enables general vertical coordinates. We show that use of a hybrid depth‐isopycnal coordinate reduces the mid‐depth ocean warming drift commonly found in pure z* vertical coordinate ocean models. To test the need for the mesoscale eddy parameterization used in OM4p5, we examine the results from a simulation that removes the eddy parameterization. The water mass structure and model drift are physically degraded relative to OM4p5, thus supporting the key role for a mesoscale closure at this resolution.
We describe GFDL's CM4.0 physical climate model, with emphasis on those aspects that may be of particular importance to users of this model and its simulations. The model is built with the AM4.0/LM4.0 atmosphere/land model and OM4.0 ocean model. Topics include the rationale for key choices made in the model formulation, the stability as well as drift of the pre‐industrial control simulation, and comparison of key aspects of the historical simulations with observations from recent decades. Notable achievements include the relatively small biases in seasonal spatial patterns of top‐of‐atmosphere fluxes, surface temperature, and precipitation; reduced double Intertropical Convergence Zone bias; dramatically improved representation of ocean boundary currents; a high quality simulation of climatological Arctic sea ice extent and its recent decline; and excellent simulation of the El Niño‐Southern Oscillation spectrum and structure. Areas of concern include inadequate deep convection in the Nordic Seas; an inaccurate Antarctic sea ice simulation; precipitation and wind composites still affected by the equatorial cold tongue bias; muted variability in the Atlantic Meridional Overturning Circulation; strong 100 year quasi‐periodicity in Southern Ocean ventilation; and a lack of historical warming before 1990 and too rapid warming thereafter due to high climate sensitivity and strong aerosol forcing, in contrast to the observational record. Overall, CM4.0 scores very well in its fidelity against observations compared to the Coupled Model Intercomparison Project Phase 5 generation in terms of both mean state and modes of variability and should prove a valuable new addition for analysis across a broad array of applications.
Responses of tropical cyclones (TCs) to CO2 doubling are explored using coupled global climate models (GCMs) with increasingly refined atmospheric/land horizontal grids (~ 200 km, ~ 50 km and ~ 25 km). The three models exhibit similar changes in background climate fields thought to regulate TC activity, such as relative sea surface temperature (SST), potential intensity, and wind shear. However, global TC frequency decreases substantially in the 50 km model, while the 25 km model shows no significant change. The ~ 25 km model also has a substantial and spatially-ubiquitous increase of Category 3–4–5 hurricanes. Idealized perturbation experiments are performed to understand the TC response. Each model’s transient fully-coupled 2 × CO2 TC activity response is largely recovered by “time-slice” experiments using time-invariant SST perturbations added to each model’s own SST climatology. The TC response to SST forcing depends on each model’s background climatological SST biases: removing these biases leads to a global TC intensity increase in the ~ 50 km model, and a global TC frequency increase in the ~ 25 km model, in response to CO2-induced warming patterns and CO2 doubling. Isolated CO2 doubling leads to a significant TC frequency decrease, while isolated uniform SST warming leads to a significant global TC frequency increase; the ~ 25 km model has a greater tendency for frequency increase. Global TC frequency responds to both (1) changes in TC “seeds”, which increase due to warming (more so in the ~ 25 km model) and decrease due to higher CO2 concentrations, and (2) less efficient development of these“seeds” into TCs, largely due to the nonlinear relation between temperature and saturation specific humidity.
Balaji, V, Karl E Taylor, M Juckes, M Lautenschlager, Chris Blanton, L Cinquini, S Denvil, Paul J Durack, M Elkington, F Guglielmo, Eric Guilyardi, D Hassell, S Kharin, S Kindermann, Bryan N Lawrence, Sergei Nikonov, and Aparna Radhakrishnan, et al., September 2018: Requirements for a global data infrastructure in support of CMIP6. Geoscientific Model Development, 11(9), DOI:10.5194/gmd-11-3659-2018. Abstract
The World Climate Research Programme (WCRP)'s Working Group on Climate Modeling (WGCM) Infrastructure Panel (WIP) was formed in 2014 in response to the explosive growth in size and complexity of Coupled Model Intercomparison Projects (CMIPs) between CMIP3 (2005-06) and CMIP5 (2011-12). This article presents the WIP recommendations for the global data infrastructure needed to support CMIP design, future growth and evolution. Developed in close coordination with those who build and run the existing infrastructure (the Earth System Grid Federation), the recommendations are based on several principles beginning with the need to separate requirements, implementation, and operations. Other important principles include the consideration of data as a commodity in an ecosystem of users, the importance of provenance, the need for automation, and the obligation to measure costs and benefits. This paper concentrates on requirements, recognising the diversity of communities involved (modelers, analysts, software developers, and downstream users). Such requirements include the need for scientific reproducibility and accountability alongside the need to record and track data usage for the purpose of assigning credit. One key element is to generate a dataset-centric rather than system-centric focus, with an aim to making the infrastructure less prone to systemic failure. With these overarching principles and requirements, the WIP has produced a set of position papers, which are summarized here. They provide specifications for managing and delivering model output, including strategies for replication and versioning, licensing, data quality assurance, citation, long-term archival, and dataset tracking. They also describe a new and more formal approach for specifying what data, and associated metadata, should be saved, which enables future data volumes to be estimated. The paper concludes with a future-facing consideration of the global data infrastructure evolution that follows from the blurring of boundaries between climate and weather, and the changing nature of published scientific results in the digital age.
Unprecedented high intensity flooding induced by extreme precipitation was reported over Chennai in India during November-December of 2015, which led to extensive damage to human life and property. It is of utmost importance to determine the odds of occurrence of such extreme floods in future and the related climate phenomena, for planning and mitigation purposes. Here, we make use of a suite of simulations from GFDL high-resolution coupled climate models to investigate the odds of occurrence of extreme floods induced by extreme precipitation over Chennai and the role of radiative forcing and/or large-scale SST forcing in enhancing the probability of such events in future. Climate of 20th century experiments with large ensembles suggest that the radiative forcing may not enhance the probability of extreme floods over Chennai. Doubling of CO2 experiments also fail to show evidence for increase of such events in a global warming scenario. Further, this study explores the role of SST forcing from the Indian and Pacific Oceans on the odds of occurrence of Chennai-like floods. Neither an El Niño nor La Niña enhances the probability of extreme floods over Chennai. However, warm Bay of Bengal tends to increase the odds of occurrence of extreme Chennai-like floods. The atmospheric condition such as a tropical depression over Bay of Bengal favoring the transport of moisture from warm Bay of Bengal is conducive for intense precipitation.
In this two-part paper, a description is provided of a version of the AM4.0/LM4.0 atmosphere/land model that will serve as a base for a new set of climate and Earth system models (CM4 and ESM4) under development at NOAA's Geophysical Fluid Dynamics Laboratory (GFDL). This version, with roughly 100km horizontal resolution and 33 levels in the vertical, contains an aerosol model that generates aerosol fields from emissions and a “light” chemistry mechanism designed to support the aerosol model but with prescribed ozone. In Part I, the quality of the simulation in AMIP (Atmospheric Model Intercomparison Project) mode – with prescribed sea surface temperatures (SSTs) and sea ice distribution – is described and compared with previous GFDL models and with the CMIP5 archive of AMIP simulations. The model's Cess sensitivity (response in the top-of-atmosphere radiative flux to uniform warming of SSTs) and effective radiative forcing are also presented. In Part II, the model formulation is described more fully and key sensitivities to aspects of the model formulation are discussed, along with the approach to model tuning.
In Part II of this two-part paper, documentation is provided of key aspects of a version of the AM4.0/LM4.0 atmosphere/land model that will serve as a base for a new set of climate and Earth system models (CM4 and ESM4) under development at NOAA's Geophysical Fluid Dynamics Laboratory (GFDL). The quality of the simulation in AMIP (Atmospheric Model Intercomparison Project) mode has been provided in Part I. Part II provides documentation of key components and some sensitivities to choices of model formulation and values of parameters, highlighting the convection parameterization and orographic gravity wave drag. The approach taken to tune the model's clouds to observations is a particular focal point. Care is taken to describe the extent to which aerosol effective forcing and Cess sensitivity have been tuned through the model development process, both of which are relevant to the ability of the model to simulate the evolution of temperatures over the last century when coupled to an ocean model.
Balaji, V, E Maisonnave, Niki Zadeh, Bryan N Lawrence, Joachim Biercamp, Uwe Fladrich, G Aloisio, Rusty Benson, Arnaud Caubel, Jeffrey W Durachta, M-A Foujols, Grenville Lister, S Mocavero, Seth D Underwood, and Garrett Wright, January 2017: CPMIP: Measurements of Real Computational Performance of Earth System Models. Geoscientific Model Development, 10(1), DOI:10.5194/gmd-10-19-2017. Abstract
A climate model represents a multitude of processes on a variety of time and space scales; a canonical example of multi-physics multi-scale modeling. The underlying climate system is physically characterized by sensitive dependence on initial conditions, and natural stochastic variability, so very long integrations are needed to extract signals of climate change. Algorithms generally possess weak scaling and can be I/O and/or memory bound. Such weak-scaling, I/O and memory-bound multi-physics codes present particular challenges to computational performance.
Traditional metrics of computational efficiency such as performance counters and scaling curves do not tell us enough about real sustained performance from climate models on different machines. They also do not provide a satisfactory basis for comparative information across models.
We introduce a set of metrics that can be used for the study of computational performance of climate (and Earth System) models. These measures do not require specialized software or specific hardware counters, and should be accessible to anyone. They are independent of platform, and underlying parallel programming models. We show how these metrics can be used to measure actually attained performance of Earth system models on different machines, and identify the most fruitful areas of research and development for performance engineering.
We present results for these measures for a diverse suite of models from several modeling centres, and propose to use these measures as a basis for a CPMIP, a computational performance MIP.
The process of parameter estimation targeting a chosen set of observations is an essential aspect of numerical modeling. This process is usually named tuning in the climate modeling community. In climate models, the variety and complexity of physical processes involved, and their interplay through a wide range of spatial and temporal scales, must be summarized in a series of approximate submodels. Most submodels depend on uncertain parameters. Tuning consists of adjusting the values of these parameters to bring the solution as a whole into line with aspects of the observed climate. Tuning is an essential aspect of climate modeling with its own scientific issues, which is probably not advertised enough outside the community of model developers. Optimization of climate models raises important questions about whether tuning methods a priori constrain the model results in unintended ways that would affect our confidence in climate projections. Here, we present the definition and rationale behind model tuning, review specific methodological aspects, and survey the diversity of tuning approaches used in current climate models. We also discuss the challenges and opportunities in applying so-called objective methods in climate model tuning. We discuss how tuning methodologies may affect fundamental results of climate models, such as climate sensitivity. The article concludes with a series of recommendations to make the process of climate model tuning more transparent.
Sinha, Eva, Anna M Michalak, and V Balaji, July 2017: Eutrophication will increase during the 21st century as a result of precipitation changes. Science, 357(6349), DOI:10.1126/science.aan2409. Abstract
Eutrophication, or excessive nutrient enrichment, threatens water resources across the globe. We show that climate change–induced precipitation changes alone will substantially increase (19 ± 14%) riverine total nitrogen loading within the continental United States by the end of the century for the “business-as-usual” scenario. The impacts, driven by projected increases in both total and extreme precipitation, will be especially strong for the Northeast and the corn belt of the United States. Offsetting this increase would require a 33 ± 24% reduction in nitrogen inputs, representing a massive management challenge. Globally, changes in precipitation are especially likely to also exacerbate eutrophication in India, China, and Southeast Asia. It is therefore imperative that water quality management strategies account for the impact of projected future changes in precipitation on nitrogen loading.
Balaji, V, Rusty Benson, Bruce Wyman, and Isaac M Held, October 2016: Coarse-grained component concurrency in Earth system modeling: parallelizing atmospheric radiative transfer in the GFDL AM3 model using the Flexible Modeling System coupling framework. Geoscientific Model Development, 9(10), DOI:10.5194/gmd-9-3605-2016. Abstract
Climate models represent a large variety of processes on a variety of timescales and space scales, a canonical example of multi-physics multi-scale modeling. Current hardware trends, such as Graphical Processing Units (GPUs) and Many Integrated Core (MIC) chips, are based on, at best, marginal increases in clock speed, coupled with vast increases in concurrency, particularly at the fine grain. Multi-physics codes face particular challenges in achieving fine-grained concurrency, as different physics and dynamics components have different computational profiles, and universal solutions are hard to come by.
We propose here one approach for multi-physics codes. These codes are typically structured as components interacting via software frameworks. The component structure of a typical Earth system model consists of a hierarchical and recursive tree of components, each representing a different climate process or dynamical system. This recursive structure generally encompasses a modest level of concurrency at the highest level (e.g., atmosphere and ocean on different processor sets) with serial organization underneath.
We propose to extend concurrency much further by running more and more lower- and higher-level components in parallel with each other. Each component can further be parallelized on the fine grain, potentially offering a major increase in the scalability of Earth system models.
We present here first results from this approach, called coarse-grained component concurrency, or CCC. Within the Geophysical Fluid Dynamics Laboratory (GFDL) Flexible Modeling System (FMS), the atmospheric radiative transfer component has been configured to run in parallel with a composite component consisting of every other atmospheric component, including the atmospheric dynamics and all other atmospheric physics components. We will explore the algorithmic challenges involved in such an approach, and present results from such simulations. Plans to achieve even greater levels of coarse-grained concurrency by extending this approach within other components, such as the ocean, will be discussed.
Empirical statistical downscaling (ESD) methods seek to refine global climate model (GCM) outputs via processes that glean information from a combination of observations and GCM simulations. They aim to create value-added climate projections by reducing biases and adding finer spatial detail. Analysis techniques, such as cross-validation, allow assessments of how well ESD methods meet these goals during observational periods. However, the extent to which an ESD method’s skill might differ when applied to future climate projections cannot be assessed readily in the same manner. Here we present a “perfect model” experimental design that quantifies aspects of ESD method performance for both historical and late 21st century time periods. The experimental design tests a key stationarity assumption inherent to ESD methods – namely, that ESD performance when applied to future projections is similar to that during the observational training period. Case study results employing a single ESD method (an Asynchronous Regional Regression Model variant) and climate variable (daily maximum temperature) demonstrate that violations of the stationarity assumption can vary geographically, seasonally, and with the amount of projected climate change. For the ESD method tested, the greatest challenges in downscaling daily maximum temperature projections are revealed to occur along coasts, in summer, and under conditions of greater projected warming. We conclude with a discussion of the potential use and expansion of the perfect model experimental design, both to inform the development of improved ESD methods and to provide guidance on the use of ESD products in climate impacts analyses and decision-support applications.
Eyring, Veronika, Peter J Gleckler, C Heinze, Ronald J Stouffer, Karl E Taylor, V Balaji, and Eric Guilyardi, et al., November 2016: Towards improved and more routine Earth system model evaluation in CMIP. Earth System Dynamics, 7(4), DOI:10.5194/esd-7-813-2016. Abstract
The Coupled Model Intercomparison Project (CMIP) has successfully provided the climate community with a rich collection of simulation output from Earth system models (ESMs) that can be used to understand past climate changes and make projections and uncertainty estimates of the future. Confidence in ESMs can be gained because the models are based on physical principles and reproduce many important aspects of observed climate. More research is required to identify the processes that are most responsible for systematic biases and the magnitude and uncertainty of future projections so that more relevant performance tests can be developed. At the same time, there are many aspects of ESM evaluation that are well established and considered an essential part of systematic evaluation but have been implemented ad hoc with little community coordination. Given the diversity and complexity of ESM analysis, we argue that the CMIP community has reached a critical juncture at which many baseline aspects of model evaluation need to be performed much more efficiently and consistently. Here, we provide a perspective and viewpoint on how a more systematic, open, and rapid performance assessment of the large and diverse number of models that will participate in current and future phases of CMIP can be achieved, and announce our intention to implement such a system for CMIP6. Accomplishing this could also free up valuable resources as many scientists are frequently "re-inventing the wheel" by re-writing analysis routines for well-established analysis methods. A more systematic approach for the community would be to develop and apply evaluation tools that are based on the latest scientific knowledge and observational reference, are well suited for routine use, and provide a wide range of diagnostics and performance metrics that comprehensively characterize model behaviour as soon as the output is published to the Earth System Grid Federation (ESGF). The CMIP infrastructure enforces data standards and conventions for model output and documentation accessible via the ESGF, additionally publishing observations (obs4MIPs) and reanalyses (ana4MIPs) for model intercomparison projects using the same data structure and organization as the ESM output. This largely facilitates routine evaluation of the ESMs, but to be able to process the data automatically alongside the ESGF, the infrastructure needs to be extended with processing capabilities at the ESGF data nodes where the evaluation tools can be executed on a routine basis. Efforts are already underway to develop community-based evaluation tools, and we encourage experts to provide additional diagnostic codes that would enhance this capability for CMIP. At the same time, we encourage the community to contribute observations and reanalyses for model evaluation to the obs4MIPs and ana4MIPs archives. The intention is to produce through the ESGF a widely accepted quasi-operational evaluation framework for CMIP6 that would routinely execute a series of standardized evaluation tasks. Over time, as this capability matures, we expect to produce an increasingly systematic characterization of models which, compared with early phases of CMIP, will more quickly and openly identify the strengths and weaknesses of the simulations. This will also reveal whether long-standing model errors remain evident in newer models and will assist modelling groups in improving their models. This framework will be designed to readily incorporate updates, including new observations and additional diagnostics and metrics as they become available from the research community.
Gaitán, Carlos F., V Balaji, and B Moore, July 2016: Can we obtain viable alternatives to Manning’s equation using genetic programming?Artificial Intelligence Research, 5(2), DOI:10.5430/air.v5n2p92. Abstract
Applied water research, like the one derived from open-channel hydraulics, traditionally links empirical formulas to observational data; for example Manning’s formula for open channel flow driven by gravity relates the discharge (Q), cross-sectional average velocity (V), the hydraulic radius (R), and the slope of the water surface (S) with a friction coefficient n, characteristic of the channel’s surface needed in the location of interest. Here we use Genetic Programming (GP), a machine learning technique inspired by nature’s evolutionary rules, to derive empirical relationships based on synthetic datasets of the aforementioned parameters. Specifically, we evaluated if Manning’s formula could be retrieved from datasets with: a) 300 pentads of A, n, R, S, and Q (from Manning’s equation), b) from datasets containing an uncorrelated variable and the parameters from (a), and c) from a dataset containing the parameters from (b) but using values of Q containing noise. The cross-validated results show success retrieving the functional form from the synthetic data in the first two experiments, and a more complex solution of Q for the third experiment. The results encourage the application of GP on problems where traditional empirical relationships show high biases or are non-parsimonious. The results also show alternative flow equations that might be used in the absence of one or more predictors; however, these equations should be used with caution outside of the training intervals.
Griffies, Stephen M., Gokhan Danabasoglu, Paul J Durack, Alistair Adcroft, V Balaji, C Böning, Eric P Chassignet, Enrique N Curchitser, Julie Deshayes, H Drange, Baylor Fox-Kemper, Peter J Gleckler, Jonathan M Gregory, Helmuth Haak, Robert Hallberg, Helene T Hewitt, David M Holland, Tatiana Ilyina, J H Jungclaus, Y Komuro, John P Krasting, William G Large, S J Marsland, S Masina, Trevor J McDougall, A J George Nurser, James C Orr, Anna Pirani, Fangli Qiao, Ronald J Stouffer, Karl E Taylor, A M Treguier, Hiroyuki Tsujino, P Uotila, M Valdivieso, Michael Winton, and Stephen G Yeager, September 2016: OMIP contribution to CMIP6: experimental and diagnostic protocol for the physical component of the Ocean Model Intercomparison Project. Geoscientific Model Development, 9(9), DOI:10.5194/gmd-9-3231-2016. Abstract
The Ocean Model Intercomparison Project (OMIP) aims to provide a framework for evaluating, understanding, and improving the ocean and sea-ice components of global climate and earth system models contributing to the Coupled Model Intercomparison Project Phase 6 (CMIP6). OMIP addresses these aims in two complementary manners: (A) by providing an experimental protocol for global ocean/sea-ice models run with a prescribed atmospheric forcing, (B) by providing a protocol for ocean diagnostics to be saved as part of CMIP6. We focus here on the physical component of OMIP, with a companion paper (Orr et al., 2016) offering details for the inert chemistry and interactive biogeochemistry. The physical portion of the OMIP experimental protocol follows that of the interannual Coordinated Ocean-ice Reference Experiments (CORE-II). Since 2009, CORE-I (Normal Year Forcing) and CORE-II have become the standard method to evaluate global ocean/sea-ice simulations and to examine mechanisms for forced ocean climate variability. The OMIP diagnostic protocol is relevant for any ocean model component of CMIP6, including the DECK (Diagnostic, Evaluation and Characterization of Klima experiments), historical simulations, FAFMIP (Flux Anomaly Forced MIP), C4MIP (Coupled Carbon Cycle Climate MIP), DAMIP (Detection and Attribution MIP), DCPP (Decadal Climate Prediction Project), ScenarioMIP (Scenario MIP), as well as the ocean-sea ice OMIP simulations. The bulk of this paper offers scientific rationale for saving these diagnostics.
Theurich, G, C DeLuca, T Campbell, F Liu, K Saint, M Vertenstein, J Chen, R C Oehmke, James D Doyle, T Whitcomb, A Wallcraft, M Iredell, T Black, A Da Silva, T Clune, R Ferraro, P Li, M Kelley, I Aleinov, V Balaji, and Niki Zadeh, et al., July 2016: The Earth System Prediction Suite: Toward a Coordinated U.S. Modeling Capability. Bulletin of the American Meteorological Society, 97(7), DOI:10.1175/BAMS-D-14-00164.1. Abstract
The Earth System Prediction Suite (ESPS) is a collection of flagship U.S. weather and climate models and model components that are being instrumented to conform to interoperability conventions, documented to follow metadata standards, and made available either under open source terms or to credentialed users.
The ESPS represents a culmination of efforts to create a common Earth system model architecture, and the advent of increasingly coordinated model development activities in the U.S. ESPS component interfaces are based on the Earth System Modeling Framework (ESMF), community-developed software for building and coupling models, and the National Unified Operational Prediction Capability (NUOPC) Layer, a set of ESMF-based component templates and interoperability conventions. This shared infrastructure simplifies the process of model coupling by guaranteeing that components conform to a set of technical and semantic behaviors. The ESPS encourages distributed, multi-agency development of coupled modeling systems, controlled experimentation and testing, and exploration of novel model configurations, such as those motivated by research involving managed and interactive ensembles. ESPS codes include the Navy Global Environmental Model (NavGEM), HYbrid Coordinate Ocean Model (HYCOM), and Coupled Ocean Atmosphere Mesoscale Prediction System (COAMPS®); the NOAA Environmental Modeling System (NEMS) and the Modular Ocean Model (MOM); the Community Earth System Model (CESM); and the NASA ModelE climate model and GEOS-5 atmospheric general circulation model.
Williams, D N., and V Balaji, et al., May 2016: A Global Repository for Planet-Sized Experiments and Observations. Bulletin of the American Meteorological Society, 97(5), DOI:10.1175/BAMS-D-15-00132.1. Abstract
Working across U.S. federal agencies, international agencies, and multiple worldwide data centres, and spanning seven international network organizations, the Earth System Grid Federation (ESGF) allows users to access, analyse, and visualize data using a globally federated collection of networks, computers, and software. Its architecture employs a system of geographically distributed peer nodes that are independently administered yet united by common federation protocols and application programming interfaces (API). The full ESGF infrastructure has now been adopted by multiple Earth science projects and allows access to petabytes of geophysical data, including the Coupled Model Intercomparison Project (CMIP)—output used by the Intergovernmental Panel on Climate Change assessment reports. Data served by ESGF, not only include model output (i.e., CMIP simulation runs) but also include observational data from satellites and instruments, reanalysis, and generated images. Metadata summarizes basic information about the data for fast and easy data discovery.
Climate models represent a large variety of processes on different time and space scalesâa canonical example of multiphysics, multiscale modeling. In addition, the system is physically characterized by sensitive dependence on initial conditions and natural stochastic variability, with very long integrations needed to extract signals of climate change. Weak scaling, I/O, and memory-bound multiphysics codes present particular challenges to computational performance. The author presents trends in climate science that are driving models toward higher resolution, greater complexity, and larger ensembles, all of which present computing challenges. He also discusses the prospects for adapting these models to novel hardware and programming models.
This study demonstrates skillful seasonal prediction of 2m air temperature and precipitation over land in a new high-resolution climate model developed by Geophysical Fluid Dynamics Laboratory, and explores the possible sources of the skill. We employ a statistical optimization approach to identify the most predictable components of seasonal mean temperature and precipitation over land, and demonstrate the predictive skill of these components. First, we show improved skill of the high-resolution model over the previous lower-resolution model in seasonal prediction of NINO3.4 index and other aspects of interest. Then we measure the skill of temperature and precipitation in the high-resolution model for boreal winter and summer, and diagnose the sources of the skill. Lastly, we reconstruct predictions using a few most predictable components to yield more skillful predictions than the raw model predictions. Over three decades of hindcasts, we find that the two most predictable components of temperature are characterized by a component that is likely due to changes in external radiative forcing in boreal winter and summer, and an ENSO-related pattern in boreal winter. The most predictable components of precipitation in both seasons are very likely ENSO-related. These components of temperature and precipitation can be predicted with significant correlation skill at least 9 months in advance. The reconstructed predictions using only the first few predictable components from the model show considerably better skill relative to observations than raw model predictions. This study shows that the use of refined statistical analysis and a high-resolution dynamical model leads to significant skill in seasonal predictions of 2m air temperature and precipitation over land.
The seasonal predictability of extratropical storm tracks in Geophysical Fluid Dynamics Laboratory (GFDL)’s high-resolution climate model has been investigated using an average predictability time analysis. The leading predictable components of extratropical storm tracks are ENSO-related spatial pattern for both boreal winter and summer, and the second predictable components are mostly due to changes in external radiative forcing and multidecadal oceanic variability. These two predictable components for both seasons show significant correlation skill for all leads from 0 to 9 months, while the skill of predicting the boreal winter storm track is consistently higher than that of the austral winter. The predictable components of extratropical storm tracks are dynamically consistent with the predictable components of the upper troposphere jet flow for both seasons. Over the region with strong storm track signals in North America, the model is able to predict the changes in statistics of extremes connected to storm track changes (e.g., extreme low and high sea level pressure and extreme 2m air temperature) in response to different ENSO phases. These results point towards the possibility of providing skillful seasonal predictions of the statistics of extratropical extremes over land using high-resolution coupled models.
Moine, Marie-Pierre, Sophie Valcke, Bryan N Lawrence, C Pascoe, R W Ford, A Alias, V Balaji, P Bentley, G Devine, and Eric Guilyardi, March 2014: Development and exploitation of a controlled vocabulary in support of climate modelling. Geoscientific Model Development, 7(2), DOI:10.5194/gmd-7-479-2014. Abstract
There are three key components for developing a metadata system: a container structure laying out the key semantic issues of interest and their relationships; an extensible controlled vocabulary providing possible content; and tools to create and manipulate that content. While metadata systems must allow users to enter their own information, the use of a controlled vocabulary both imposes consistency of definition and ensures comparability of the objects described. Here we describe the controlled vocabulary (CV) and metadata creation tool built by the METAFOR project for use in the context of describing the climate models, simulations and experiments of the fifth Coupled Model Intercomparison Project (CMIP5). The CV and resulting tool chain introduced here is designed for extensibility and re-use and should find applicability in many more projects.
Tropical cyclones (TCs) are a hazard to life and property and a prominent element of the global climate system, therefore understanding and predicting TC location, intensity and frequency is of both societal and scientific significance. Methodologies exist to predict basin-wide, seasonally-aggregated TC activity months, seasons and even years in advance. We show that a newly developed high-resolution global climate model can produce skillful forecasts of seasonal TC activity on spatial scales finer than basin-wide, from months and seasons in advance of the TC season. The climate model used here is targeted at predicting regional climate and the statistics of weather extremes on seasonal to decadal timescales, and is comprised of high-resolution (50km×50km) atmosphere and land components, and more moderate resolution (~100km) sea ice and ocean components. The simulation of TC climatology and interannual variations in this climate model is substantially improved by correcting systematic ocean biases through “flux-adjustment.” We perform a suite of 12-month duration retrospective forecasts over the 1981-2012 period, after initializing the climate model to observationally-constrained conditions at the start of each forecast period – using both the standard and flux-adjusted versions of the model. The standard and flux-adjusted forecasts exhibit equivalent skill at predicting Northern Hemisphere TC season sea surface temperature, but the flux-adjusted model exhibits substantially improved basin-wide and regional TC activity forecasts, highlighting the role of systematic biases in limiting the quality of TC forecasts. These results suggest that dynamical forecasts of seasonally-aggregated regional TC activity months in advance are feasible.
Balaji, V, R Redler, and Reinhard Budich, 2013: Earth System Modelling - Volume 4: IO and Postprocessing , New York, NY: Springer, 58pp. Abstract
The problem of reading and writing data from media such as disk and tape while performing computation has long been a challenging one. As the quotes above indicate, I/O has often been an afterthought, a sideshow to the theater of more exciting developments in the field of high-performance computing.
While Earth System models have been among the very largest consumers of high-performance computing (HPC) cycles, they very rarely figure on lists of computing tours de force, such as the Gordon Bell prize. Among the reasons are that ESMs, especially climate models, tend to be among the most I/O intensive applications in HPC.
In order to summarize the advances in I/O technology outlined in this Brief, and chart a course to the future, it is necessary to appreciate how far we have come by looking back into the past, and the early years of supercomputing. In the pioneering era of Seymour Cray (note his epigram at the start of the Brief) I/O was somewhat of an afterthought. The Cray vector machines introduced the Flexible Formatted I/O (FFIO) software layer that enabled setting and configuration of buffers and read/write layers which could be directly optimized for the disk hardware layer underneath. It required detailed knowledge of spindle and sector layout and exposed them directly in scientific code. Any changes to the underlying hardware meant rewriting considerable amounts of I/O code. The data models—ways to understand and interpret the actual byte sequences — were also hardware-specific and subject to change between systems.
Climate modeling has come a long way since von Neumann declared it a problem too hard for pencil and paper, but tailor-made for the new digital computers. As the models and computers both evolve toward ever-greater complexity, they are changing our notions of digital simulation itself.
Guilyardi, Eric, and V Balaji, et al., May 2013: Documenting climate models and their simulations. Bulletin of the American Meteorological Society, 94(5), DOI:10.1175/BAMS-D-11-00035.1. Abstract
The results of climate models are of increasing and widespread importance. No longer is climate model output of sole interest to climate scientists and researchers in the climate change impacts and adaptation fields. Now non-specialists such as government officials, policy-makers, and the general public, all have an increasing need to access climate model output and understand its implications. For this host of users, accurate and complete metadata (i.e., information about how and why the data were produced) is required to document the climate modeling results. Here we describe a pilot community initiative to collect and make available documentation of climate models and their simulations. In an initial application, a metadata repository is being established to provide information of this kind for a major internationally coordinated modeling activity known as CMIP5 (Coupled Model Intercomparison Project, Phase 5). It is expected that for a wide range of stakeholders this and similar community-managed metadata repositories will spur development of analysis tools that facilitate discovery and exploitation of earth system simulations.
Modeling entails storage of large quantities of data for varying lengths of time. It Is often difficult to know when a new analytical tool or scientific insight may necessitate reanalyzing model output in a new way.
The period 2000–2010 may be considered the decade when Earth system modeling came of age. Systematic ESM-based international scientific campaigns, such as the IPCC, are recognized as central elements both in scientific research to understand the workings of the climate system, as well as to provide reasoned and fact-based guidance to global governance systems, on how to deal with the planetary scale challenge of climate change.
Balaji, V, 2012: Code Parallelisation On Massively Parallel Machines In Earth System Modelling - Volume 2 Algorithms, Code Infrastructure and Optimisation, DOI:10.1007/978-3-642-23831-4_8. Abstract
The motivation for parallel computing has now been well-known for over a decade. As we reach the physical limits of how fast an individual computation can be made to run, we seek to speed up the overall computation by running as many operations as possible in parallel. The search for concurrency becomes a key element of computational science.
In climate research, with the increased emphasis on detailed representation of individual physical processes governing the climate, the construction of a model has come to require large teams working in concert, with individual sub-groups each specializing in a different component of the climate system, such as the ocean circulation, the biosphere, land hydrology, radiative transfer and chemistry, and so on. The development of model code now requires teams to be able to contribute components to an overall coupled system, with no single kernel of researchers mastering the whole. This may be called the distributed development model, in contrast with the monolithic small-team model development process of earlier decades.
DeLuca, C, G Theurich, and V Balaji, 2012: The Earth System Modeling Framework In Earth System Modelling - Volume 3 Coupling Software and Strategies, DOI:10.1007/978-3-642-23360-9_6. Abstract
The Earth System Modeling Framework or ESMF is open source software for building modeling components and coupling them together to form weather prediction, climate, coastal, and other applications. ESMF was motivated by the desire to exchange modeling components amongst centers and to reduce costs and effort by sharing codes.
We present results for simulated climate and climate change from a newly developed high-resolution global climate model (GFDL CM2.5). The GFDL CM2.5 model has an atmospheric resolution of approximately 50 Km in the horizontal, with 32 vertical levels. The horizontal resolution in the ocean ranges from 28 Km in the tropics to 8 Km at high latitudes, with 50 vertical levels. This resolution allows the explicit simulation of some mesoscale eddies in the ocean, particularly at lower latitudes.
We present analyses based on the output of a 280 year control simulation; we also present results based on a 140 year simulation in which atmospheric CO2 increases at 1% per year until doubling after 70 years.
Results are compared to the GFDL CM2.1 climate model, which has somewhat similar physics but coarser resolution. The simulated climate in CM2.5 shows marked improvement over many regions, especially the tropics, including a reduction in the double ITCZ and an improved simulation of ENSO. Regional precipitation features are much improved. The Indian monsoon and Amazonian rainfall are also substantially more realistic in CM2.5.
The response of CM2.5 to a doubling of atmospheric CO2 has many features in common with CM2.1, with some notable differences. For example, rainfall changes over the Mediterranean appear to be tightly linked to topography in CM2.5, in contrast to CM2.1 where the response is more spatially homogeneous. In addition, in CM2.5 the near-surface ocean warms substantially in the high latitudes of the Southern Ocean, in contrast to simulations using CM2.1.
Lawrence, Bryan N., and V Balaji, et al., November 2012: Describing Earth System Simulations with the Metafor CIM. Geoscientific Model Development, 5(6), DOI:10.5194/gmd-5-1493-2012. Abstract
The Metafor project has developed a Common Information Model (CIM) using the ISO1900 series formalism to describe the sorts of numerical experiments carried out by the earth system modelling community, the models they use, and the simulations that result. Here we describe the mechanism by which the CIM was developed, and its key properties. We introduce the conceptual and application versions and the controlled vocabularies developed in the context of supporting the fifth Coupled Model Intercomparison Project (CMIP5). We describe how the CIM has been used in experiments to describe model coupling properties and describe the near term expected evolution of the CIM.
Valcke, Sophie, and V Balaji, et al., December 2012: Coupling technologies for Earth System Modelling. Geoscientific Model Development, 5(6), DOI:10.5194/gmd-5-1589-2012. Abstract
This paper presents a review of the software currently used in climate modelling in general and in CMIP5 in particular to couple the numerical codes representing the different components of the Earth system. The coupling technologies presented show common features, such as the ability to communicate and regrid data, but also offer different functions and implementations. Design characteristics of the different approaches are discussed as well as future challenges arising from the increasing complexity of scientific problems and computing platforms.
Easterbrook, S, P Edwards, V Balaji, and Reinhard Budich, November 2011: Guest Editors' Introduction: Climate Change - Science and Software. IEEE Software, 28(6), DOI:10.1109/MS.2011.141. Abstract
Climate change is likely to be one of the defining global issues of the 21st century. The past decade—the hottest in recorded history—has witnessed countries around the world struggling to deal with drought, heat waves, and extreme weather. The sheer scale of the problem also makes it hard to understand, predict, and solve. Climate science journals regularly publish special issues on specific climate models, typically timed to present results from a major new release of a given model. However, these tend to focus on the new science that the model enables, rather than to describe the software and its development. This special issue of IEEE Software magazine focuses on the "software" behind climate change models.
Guilyardi, Eric, and V Balaji, et al., May 2011: The CMIP5 model and simulation documentation: A new standard for climate modelling metadata. Clivar Exchanges, 16(2), 42-46. PDF
Griffies, Stephen M., Alistair Adcroft, V Balaji, Robert Hallberg, Sonya Legg, Torge Martin, and Anna Pirani, et al., February 2009: Sampling Physical Ocean Field in WCRP CMIP5 Simulations: CLIVAR Working Group on Ocean Model Development (WGOMD) Committee on CMIP5 Ocean Model Output, International CLIVAR Project Office, CLIVAR Publication Series No. 137, 56pp. PDF
Dunlap, R, L Mark, S Rugaber, and V Balaji, et al., November 2008: Earth system curator: metadata infrastructure for climate modeling. Earth Science Informatics, 1(3), DOI:10.1007/s12145-008-0016-1. Abstract
The Earth System Curator is a National Science
Foundation sponsored project developing a metadata
formalism for describing the digital resources used in
climate simulations. The primary motivating observation
of the project is that a simulation/model’s source code plus
the configuration parameters required for a model run are a
compact representation of the dataset generated when the
model is executed. The end goal of the project is a
convergence of models and data where both resources are
accessed uniformly from a single registry. In this paper we
review the current metadata landscape of the climate
modeling community, present our work on developing a
metadata formalism for describing climate models, and
reflect on technical challenges we have faced that require
new research in the area of Earth Science Informatics.
Zhou, S, and V Balaji, et al., April 2007: Cross-organization interoperability experiments of weather and climate models with the Earth System Modeling Framework. Concurrency and Computation: Practice and Experience, 19(5), DOI:10.1002/cpe.1120. Abstract
Typical weather and climate models need a software tool to couple sub-scale model components. The high-performance computing requirements and a variety of model interfaces make the development of such a coupling tool very challenging. In this paper, we describe the approach of the Earth System Modeling Framework, in particular its component and coupling mechanism, and present the results of three cross-organization model interoperability experiments
We
present
a
mechanism
for
exchange
of
quantities
between
components
of
a
coupled
Earth
system
model,
where
each
component
is
independently
discretized.
The
exchange
grid
is
formed
by
overlaying
two
grids,
such
that
each
exchange
grid
cell
has
a
unique
parent
cell
on
each
of
its
antecedent
grids.
In
Earth
System
models
in
particular,
processes
occurring
near
component
surfaces
require
special
surface
boundary
layer
physical
processes
to
be
represented
on
the
exchange
grid.
The
exchange
grid
is
thus
more
than
just
a
stage
in
a
sequence
of
regrid-
ding
between
component
grids.
We
present
the
design
and
use
of
a
2-dimensional
exchange
grid
on
a
horizontal
planetary
surface
in
the
GFDL
Flexible
Modeling
System
(FMS),
highlighting
issues
of
parallelism
and
performance
The formulation and simulation characteristics of two new global coupled climate models developed at NOAA's Geophysical Fluid Dynamics Laboratory (GFDL) are described. The models were designed to simulate atmospheric and oceanic climate and variability from the diurnal time scale through multicentury climate change, given our computational constraints. In particular, an important goal was to use the same model for both experimental seasonal to interannual forecasting and the study of multicentury global climate change, and this goal has been achieved.
Two versions of the coupled model are described, called CM2.0 and CM2.1. The versions differ primarily in the dynamical core used in the atmospheric component, along with the cloud tuning and some details of the land and ocean components. For both coupled models, the resolution of the land and atmospheric components is 2° latitude × 2.5° longitude; the atmospheric model has 24 vertical levels. The ocean resolution is 1° in latitude and longitude, with meridional resolution equatorward of 30° becoming progressively finer, such that the meridional resolution is 1/3° at the equator. There are 50 vertical levels in the ocean, with 22 evenly spaced levels within the top 220 m. The ocean component has poles over North America and Eurasia to avoid polar filtering. Neither coupled model employs flux adjustments.
The control simulations have stable, realistic climates when integrated over multiple centuries. Both models have simulations of ENSO that are substantially improved relative to previous GFDL coupled models. The CM2.0 model has been further evaluated as an ENSO forecast model and has good skill (CM2.1 has not been evaluated as an ENSO forecast model). Generally reduced temperature and salinity biases exist in CM2.1 relative to CM2.0. These reductions are associated with 1) improved simulations of surface wind stress in CM2.1 and associated changes in oceanic gyre circulations; 2) changes in cloud tuning and the land model, both of which act to increase the net surface shortwave radiation in CM2.1, thereby reducing an overall cold bias present in CM2.0; and 3) a reduction of ocean lateral viscosity in the extratropics in CM2.1, which reduces sea ice biases in the North Atlantic.
Both models have been used to conduct a suite of climate change simulations for the 2007 Intergovernmental Panel on Climate Change (IPCC) assessment report and are able to simulate the main features of the observed warming of the twentieth century. The climate sensitivities of the CM2.0 and CM2.1 models are 2.9 and 3.4 K, respectively. These sensitivities are defined by coupling the atmospheric components of CM2.0 and CM2.1 to a slab ocean model and allowing the model to come into equilibrium with a doubling of atmospheric CO2. The output from a suite of integrations conducted with these models is freely available online (see http://nomads.gfdl.noaa.gov/).
Manuscript received 8 December 2004, in final form 18 March 2005
The current generation of coupled climate models run at the Geophysical Fluid Dynamics Laboratory (GFDL) as part of the Climate Change Science Program contains ocean components that differ in almost every respect from those contained in previous generations of GFDL climate models. This paper summarizes the new physical features of the models and examines the simulations that they produce. Of the two new coupled climate model versions 2.1 (CM2.1) and 2.0 (CM2.0), the CM2.1 model represents a major improvement over CM2.0 in most of the major oceanic features examined, with strikingly lower drifts in hydrographic fields such as temperature and salinity, more realistic ventilation of the deep ocean, and currents that are closer to their observed values. Regional analysis of the differences between the models highlights the importance of wind stress in determining the circulation, particularly in the Southern Ocean. At present, major errors in both models are associated with Northern Hemisphere Mode Waters and outflows from overflows, particularly the Mediterranean Sea and Red Sea.
Collins, N, G Theurich, C DeLuca, M J Suarez, A Trayanov, and V Balaji, et al., 2005: Design and Implementation of Components in the Earth System Modeling Framework. International Journal of High Performance Computing Applications, 19(3), DOI:10.1177/1094342005056120. Abstract
The Earth System Modeling Framework is a component-based architecture for developing and assembling climate and related models. A virtual machine underlies the component-level constructs in ESMF, providing both a foundation for performance portability and mechanisms for resource allocation and component sequencing.
As a first step toward coupled ocean–atmosphere data assimilation, a parallelized ensemble filter is implemented in a new stochastic hybrid coupled model. The model consists of a global version of the GFDL Modular Ocean Model Version 4 (MOM4), coupled to a statistical atmosphere based on a regression of National Centers for Environmental Prediction (NCEP) reanalysis surface wind stress, heat, and water flux anomalies onto analyzed tropical Pacific SST anomalies from 1979 to 2002. The residual part of the NCEP fluxes not captured by the regression is then treated as stochastic forcing, with different ensemble members feeling the residual fluxes from different years. The model provides a convenient test bed for coupled data assimilation, as well as a prototype for representing uncertainties in the surface forcing.
A parallel ensemble adjustment Kalman filter (EAKF) has been designed and implemented in the hybrid model, using a local least squares framework. Comparison experiments demonstrate that the massively parallel processing EAKF (MPPEAKF) produces assimilation results with essentially the same quality as a global sequential analysis. Observed subsurface temperature profiles from expendable bathythermographs (XBTs), Tropical Atmosphere Ocean (TAO) buoys, and Argo floats, along with analyzed SSTs from NCEP, are assimilated into the hybrid model over 1980-2002 using the MPPEAKF. The filtered ensemble of SSTs, ocean heat contents, and thermal structures converge well to the observations, in spite of the imposed stochastic forcings. Several facets of the EAKF algorithm used here have been designed to facilitate comparison to a traditional three-dimensional variational data assimilation (3DVAR) algorithm, for instance, the use of a univariate filter in which observations of temperature only directly impact temperature state variables. Despite these choices that may limit the power of the EAKF, the MPPEAKF solution appears to improve upon an earlier 3DVAR solution, producing a smoother, more physically reasonable analysis that better fits the observational data and produces, to some degree, a self-consistent estimate of analysis uncertainties. Hybrid model ENSO forecasts initialized from the MPPEAKF ensemble mean also appear to outperform those initialized from the 3DVAR analysis. This improvement stems from the EAKF's utilization of anisotropic background error covariances that may vary in time.
for climate research developed at the Geophysical Fluid Dynamics Laboratory (GFDL) are presented. The atmosphere model, known as AM2, includes a new gridpoint dynamical core, a prognostic cloud scheme, and a multispecies aerosol climatology, as well as components from previous models used at GFDL. The land model, known as LM2, includes soil sensible and latent heat storage, groundwater storage, and stomatal resistance. The performance of the coupled model AM2–LM2 is evaluated with a series of prescribed sea surface temperature (SST) simulations. Particular focus is given to the model's climatology and the characteristics of interannual variability related to E1 Niño– Southern Oscillation (ENSO).
One AM2–LM2 integration was performed according to the prescriptions of the second Atmospheric Model Intercomparison Project (AMIP II) and data were submitted to the Program for Climate Model Diagnosis and Intercomparison (PCMDI). Particular strengths of AM2–LM2, as judged by comparison to other models participating in AMIP II, include its circulation and distributions of precipitation. Prominent problems of AM2– LM2 include a cold bias to surface and tropospheric temperatures, weak tropical cyclone activity, and weak tropical intraseasonal activity associated with the Madden–Julian oscillation.
An ensemble of 10 AM2–LM2 integrations with observed SSTs for the second half of the twentieth century permits a statistically reliable assessment of the model's response to ENSO. In general, AM2–LM2 produces a realistic simulation of the anomalies in tropical precipitation and extratropical circulation that are associated with ENSO.
Hill, C, C DeLuca, V Balaji, M J Suarez, and A Da Silva, 2004: The architecture of the Earth System Modeling Framework. Computing in Science and Engineering, 6(1), 18-28.
This paper details a free surface method using an explicit time stepping scheme for use in z-coordinate ocean models. One key property that makes the method especially suitable for climate simulations is its very stable numerical time stepping scheme, which allows for the use of a long density time step, as commonly employed with coarse-resolution rigid-lid models. Additionally, the effects of the undulating free surface height are directly incorporated into the baroclinic momentum and tracer equations. The novel issues related to local and global tracer conservation when allowing for the top cell to undulate are the focus of this work. The method presented here is quasi-conservative locally and globally of tracer when the baroclinic and tracer time steps are equal. Important issues relevant for using this method in regional as well as large-scale climate models are discussed and illustrated, and examples of scaling achieved on parallel computers provided.
Pauluis, O M., V Balaji, and Isaac M Held, 2001: Reply. Journal of the Atmospheric Sciences, 58(9), 1178-1179.
Pauluis, O M., V Balaji, and Isaac M Held, 2000: Frictional dissipation in a precipitating atmosphere. Journal of the Atmospheric Sciences, 57(7), 989-994. Abstract PDF
The frictional dissipation in the shear zone surrounding falling hydrometeors is estimated to be 2-4 W m-2 in the Tropics. A numerical model of radiative-convective equilibrium with resolved three-dimensional moist convection confirms this estimate and shows that the precipitation-related dissipation is much larger than the dissipation associated with the turbulent energy cascade from the convective scale. Equivalently, the work performed by moist convection is used primarily to lift water rather than generate kinetic energy of the convective airflow. This fact complicates attempts to use the entropy budget to derive convective velocity scales.