Open weather and climate science in the digital era
- 1Netherlands eScience center, Amsterdam, the Netherlands
- 2Information and Technology Services, Utrecht University, Utrecht, the Netherlands
- 3Geosciences, Utrecht University, Utrecht, the Netherlands
- 4CNRMSI/SMN, Direction de la Meteorologie Nationale Casablanca, Morocco
- 5German Climate Computing Centre (DKRZ), Hamburg, Germany
- 6Royal Netherlands Meteorological Institute (KNMI), De Bilt, the Netherlands
- 7Research Applications Laboratory, National Center for Atmopsheric Research, Boulder, USA
- 8Water Resources Management, Delft University of Technology, Delft, the Netherlands
- 9Centrum Wiskunde & Informatica, Amsterdam, the Netherlands
- 10Numerical methods, European Centre for Medium-Range Weather Forecasts, Reading, UK
- 11The Weather Company/IBM, Boston, MA, USA
- 12World Weather Research Division, World Meteorological Organization, Geneva, Switzerland
- 13Jülich Supercomputing Centre, Forschungszentrum Jülich, Jülich, Germany
- 14Hadley Centre for Climate Science, Met Office, Exeter, UK
Correspondence: Martine G. de Vos (email@example.com)
The need for open science has been recognized by the communities of meteorology and climate science. While these domains are mature in terms of applying digital technologies, the implementation of open science methodologies is less advanced. In a session on “Weather and Climate Science in the Digital Era” at the 14th IEEE International eScience Conference domain specialists and data and computer scientists discussed the road towards open weather and climate science.
Roughly 80 % of the studies presented in the conference session showed the added value of open data and software. These studies included open datasets from disparate sources in their analyses or developed tools and approaches that were made openly available to the research community. Furthermore, shared software is a prerequisite for the studies which presented systems like a model coupling framework or digital collaboration platform. Although these studies showed that sharing code and data is important, the consensus among the participants was that this is not sufficient to achieve open weather and climate science and that there are important issues to address.
At the level of technology, the application of the findable, accessible, interoperable, and reusable (FAIR) principles to many datasets used in weather and climate science remains a challenge. This may be due to scalability (in the case of high-resolution climate model data, for example), legal barriers such as those encountered in using weather forecast data, or issues with heterogeneity (for example, when trying to make use of citizen data). In addition, the complexity of current software platforms often limits collaboration between researchers and the optimal use of open science tools and methods.
The main challenges we observed, however, were non-technical and impact the practice of science as a whole. There is a need for new roles and responsibilities in the scientific process. People working at the interface of science and digital technology – e.g., data stewards and research software engineers – should collaborate with domain researchers to ensure the optimal use of open science tools and methods. In order to remove legal boundaries on sharing data, non-academic parties such as meteorological institutes should be allowed to act as trusted agents. Besides the creation of these new roles, novel policies regarding open weather and climate science should be developed in an inclusive way in order to engage all stakeholders.
Although there is an ongoing debate on open science in the community, the individual aspects are usually discussed in isolation. Our approach in this paper takes the discourse further by focusing on “open science in weather and climate research” as a whole. We consider all aspects of open science and discuss the challenges and opportunities of recent open science developments in data, software, and hardware. We have compiled these into a list of concrete recommendations that could bring us closer to open weather and climate science. We acknowledge that the development of open weather and climate science requires effort to change, but the benefits are large. We have observed these benefits directly in the studies presented in the conference and believe that it leads to much faster progress in understanding our complex world.
In this article we describe the main findings of a conference session on “Weather and Climate Science in the Digital Era”, with a special focus on the implementation of open science methodologies.
Meteorology and climate sciences are data- and computationally intensive areas of research by tradition. Being primarily a physical science, empirical data collection has always been important and meteorology was one of the first fields that standardized data collection from the advent of systematic instrumental observations in the mid-1800s (e.g., Maury, 1853; Quetelet, 1874). In addition, the production of meteorological forecasts was one of the first applications to be developed for electronic computers, following decades during which the calculations were performed by hand (we recall that “computer” originally meant “one who computes” and that the adjective “electronic” was introduced to distinguish the machine from the human). Numerical weather prediction (NWP) has advanced from the first operational predictions in the 1950s (Charney et al., 1950), aided by increased computing capability and the growing supply of observational data to generate initial conditions for assimilation into the model state. Climate research has benefitted from the same developments (see, e.g., Lynch, 2008, for an overview). The assimilation of observational data into NWP models has been a turning point for the development of high-resolution gridded information of the atmosphere and ocean state (e.g., Kalnay et al., 1996; Dee et al., 2011). The use of this methodology for reanalysis – that is, generating a comprehensive and physically consistent record of how the weather is changing over time – has ensured a baseline for climate research and triggered the development of downstream climate services.
Meteorologists have been using machine learning to post-process model output, blend multiple models, and optimize the weighting of models for over 20 years (Haupt et al., 2018). Neural nets were used in the 90s to speed up the calculation of outgoing longwave radiation in climate models (Chevallier et al., 1999) and for both short- and long-wave radiation parameterization in the National Center for Atmospheric Research (NCAR) Community Atmospheric Model (CAM) (Krasnopolsky et al., 2007). Present and future strategies feature an Earth system approach for assimilating environmental data into a more comprehensive coupled system including the atmosphere, ocean, biosphere, and sea ice (Penny and Hamill, 2017).
The influence and application of digital technologies have shown no sign of abatement in recent times. Three technological developments are having a strong effect on meteorology and climate research (Ruti et al., 2019). First, the increase in computing power. Exascale (i.e., 1018 operations per second) is the next proxy in the long trajectory of exponential performance increases that has continued for more than half a century (Reed and Dongarra, 2015) and provides unprecedented opportunities with regard to the finer resolution of scales in time and space and/or the coupling of more components that represent different parts of the Earth system. However, it also poses large software development and data management challenges, such as the impact of increasing numerical model resolution, increasing code complexity, and the volumes of data that are handled (Bauer et al., 2015; Sellar et al., 2020). A second development concerns the open availability of standard meteorological data and data from a variety of sources, including citizen science projects and low-cost sensors. Modern data management tools enable handling of these data sources. Thirdly, there has been increasing use of machine learning, in particular so-called deep learning. A plethora of machine learning methods have been and are being applied to problems of weather and climate prediction, from emulating unresolved processes in numerical models to calibrating forecasts produced with numerical models and the production of forecasts based on data and machine learning methods only (Huntingford et al., 2019; Schneider et al., 2017; Reichstein et al., 2019).
Digital technologies enable new research methods, accelerate the growth of knowledge, and spur the creation of new means of communicating such knowledge amongst researchers and within the broader scientific community. As such, these technologies have reshaped the scientific enterprise and are strongly connected to open science (OECD, 2015; Bourne et al., 2012). Open science methodologies such as open access publications, open source software development, and findable, accessible, interoperable, and reusable (FAIR) data (see below) stimulate the use of data and software resources and lead to more reproducible research (Wilkinson et al, 2016; Munafò et al., 2017). The need for open research practices has been recognized by the communities of meteorology and climate science. Nonetheless, whilst these domains are mature in terms of the application of digital technologies, the implementation of open science methodologies is less advanced.
In a session on “Weather and Climate Science in the Digital Era” at the 14th IEEE International eScience Conference, domain specialists and data and computer scientists discussed the road towards open weather and climate science. This paper describes the main findings and insights from this conference session.
The remainder of this paper is organized as follows: in the Methods section we describe the set-up of the conference session in detail, since the insights and claims in this paper are based on the observations made during the session. The Open science section contains a small literature review which describes the progress of open weather and climate science in the context of open science developments in general. In the section Towards open weather and climate science we discuss the challenges and opportunities of open data and open software. The last section provides a synthesis of the issues that should be addressed in order to achieve open weather and climate science.
The “Weather and Climate Science in the Digital Era” conference session examined some of the data and compute-intensive approaches which are used in weather and climate science. The session comprised 10 oral abstract presentations, one keynote talk, and six short poster pitches. Contributions were selected after a peer review on their scientific merit and innovative nature and published in the conference proceedings (Bari, 2018; Behrens et al., 2018; Bendoukha, 2018; Brangbour et al., 2018; Garcia-Marti et al., 2018; Haupt et al., 2018; Hut et al., 2018; Jansson et al., 2018; Pelupessy et al., 2018; Ramamurthy, 2018; Schultz et al., 2018; Stringer et al., 2018; van Haren et al., 2018; van den Oord et al., 2018). The 16 session participants were either presenters or involved in the organization of the session and represented disparate science domains as well as computer and data sciences.
Following the first part of the session which was dedicated to the presentations, the participants broke into three groups to discuss “challenges and opportunities regarding open weather and climate science”. The findings of each group were presented and discussed in a final plenary session, during which observations and insights were documented.
The observations in this paper are based on both the insights from the studies presented in the session and the notes made during the discussion. The majority of the participants in the session also contributed to this paper. As such, this represents a shared view of a group of experts in weather and climate science on digital and open science developments in their field.
Based on a small literature review, this section describes the progress of open weather and climate science in the context of open science developments in general.
Open science refers to open research practices and includes but is not limited to public access to the academic literature and sharing of data and code (Mckiernan et al., 2016). However, the interpretation of the concept of open science varies between different schools of thought (Fecher and Friesike, 2014). In general, open science concerns various stakeholders: besides scholars, these include institutes, research funders, librarians and archivists, publishers, and decision makers (Bourne et al., 2012; OECD, 2015; Fecher and Friesike, 2014).
It has been shown that the adoption of open research practices leads to significant benefits for researchers: specifically, increases in citations, media attention, potential collaborators, job opportunities, and funding opportunities (Mckiernan et al., 2016). Europe and the United States have made efforts to adapt legal frameworks and implement policy initiatives for greater openness in scientific research (OECD, 2015; National Science Foundation, 2018). Several countries provide digital infrastructure based on rich metadata that support the optimal re-use of resources in the research environment (Mons et al., 2017). Examples include the European Open Science Cloud in Europe (Directorate-General for Research and Innovation, 2018), NIH Data Commons projects in the United States, AARnet in Australia (AARNet, 2018), and the African Data Intensive Research Cloud in South Africa (Simmonds et al., 2016). Funders and research institutes have announced policies encouraging, mandating, or specifically financing open research practices (Mckiernan et al., 2016; Wilkinson et al, 2016) – for example, the National Science Foundation in the United States (National Science Board, 2011), CERN in Switzerland (CERN-OPEN-2014-049, 2014), the Netherlands Organization for Scientific Research (Executive board, 2019), and the United Nations Educational, Scientific and Cultural Organization (Board, 2013).
The need for open research practices has been recognized by the communities of meteorology and climate science and has even entered into the political arena. For instance, in its report on the so-called “Climatic Research Unit email controversy” in 2009 the Science and Technology Committee of the UK House of Commons stated that climate science is a matter of great importance and that the quality of the science should be irreproachable. The committee called for the climate science community to become more transparent by publishing raw data and detailed methodologies (House of Commons, 2010).
There are many examples of open access, open data, and open source software in meteorology and climate science. The United States has a long history of making meteorological observations, model source codes, and model output an open public commodity, available to all. The WRF regional model, MPAS global model, and CESM climate model (Skamarock et al., 2019; Hurrell et al., 2013) are good examples of shared numerical weather and climate model codes. Outputs from NOAA weather and climate prediction models are freely available. The European Center for Medium-range Weather Forecasts (ECMWF) provides researchers with a free and easy-to-use version of the Integrated Forecasting System (IFS), which is one of the main global NWP systems (Carver, 2019). It allows IFS to be used by a much wider community, and the academic community contributes to improving the forecast model with new developments. The UK Earth System Model (Sellar et al., 2019), a joint development between the National Environment Research Council (NERC) and the UK Met Office, has been made available to the research community in a similar fashion. In addition, co-ordinated coupled model intercomparison projects (CMIP) (Taylor et al., 2012; Eyring et al., 2016) are excellent examples of the climate modeling community working together. The construction of multi-model comparisons and statistics forces research groups to accept common input forcings, provide detailed documentation of the numerical schemes in their model, and produce open, standardized output data (Sellar et al., 2020, see, e.g.,). The result is a better understanding of climate change arising from natural, unforced variability or in response to changes in radiative forcing in a multi-model context.
The international meteorological and climate research communities have been sharing data since the 1990s, using common file and metadata formats. Besides CMIP (Taylor et al., 2012), examples include the sharing of reanalysis data, starting with NCEP/NCAR reanalysis and ECMWFs ERA reanalysis data products (Dee et al., 2011; Kalnay et al., 1996, e.g.,).
There is an ongoing debate on open science in the meteorology and climate research communities, but in the literature the individual open science practices are discussed separately. Elements have been discussed in the literature, e.g., in Ruti et al. (2019) on a strategic programming level, in Eyring et al. (2016) on a generic software tool for Earth system model data diagnostics, the open software platform PANGEO (https://pangeo.io/, last access: 15 May 2020), and a community simulation model such as the regional models WRF and CESM (Skamarock et al., 2019; Hurrell et al., 2013). Additionally, these aspects are discussed in Climate Informatics workshops (http://climateinformatics.org/, last access: 15 May 2020), workshops held as part of the European Network on Earth System Modelling (ENES, https://portal.enes.org/, last access: 15 May 2020), and workshops of operational centres such as the European Centre for Medium-Range Weather Forecasting (e.g., the biannual High Performance Computing workshop), to name a few.
The examples described above show that open research practices are growing in popularity and necessity. However, widespread adoption of these practices has not yet been achieved, which is also true for meteorology and climate science. In fact, sharing of data, software, and vocabularies is only common practice in a few fields such as astronomy and genomics (Consortium, 2004; Borgman, 2012; Shamir et al., 2013, e.g.,). Recent studies show that transparency and reproducibility are still a matter of concern to the scientific community as a whole. It requires that all stakeholders work together to create a more open and robust system (Baker, 2016; Munafò et al., 2017; Gil et al., 2016).
In the following section we present our perspective on the challenges and opportunities regarding open weather and climate science.
4.1 Open data
About 50 % of the studies reported in the proceedings of the conference session include open data from different sources in their analyses. Examples include the use of open satellite data, geolocated data via OpenStreetMap, and openly available in situ meteorological observations (Haupt et al., 2018; Garcia-Marti et al., 2018; Bari, 2018; Schultz et al., 2018, and references therein). Two studies include data that are not common in meteorological or climate research. Citizen data such as social media posts (Brangbour et al., 2018) and observations from amateur weather stations (van Haren et al., 2018) can lead to new perspectives on local conditions beyond data from traditional meteorological stations.
At least 50 % of the studies use common file formats and standard protocols to facilitate the exchange and use of data. van den Oord et al. (2018) use CF-netCDF formats. The CF conventions provide guidelines for the use of metadata in the netCDF file and are increasingly used in climate studies. Behrens et al. (2018), Pelupessy et al. (2018), Schultz et al. (2018), and Stringer et al. (2018) all use standard protocols for inter-process communication (like MPI and REST) in their numerical codes. Furthermore, the use of common file formats and standard protocols is a prerequisite for the digital collaboration platforms which were presented in the session (Ramamurthy, 2018; Hut et al., 2018; Bendoukha, 2018).
The session participants recognized that in the current weather and climate science community the focus is primarily on making data and software findable and accessible, often via web portals. Although these are necessary first steps towards open science, we acknowledge that these steps are not sufficient. Data and software that are findable and accessible may still be hard to obtain in practice or may be disseminated in a way that it is still difficult to interpret and use. Wilkinson et al (2016) defined guidelines to ensure the transparency, reproducibility, and reusability of scientific data. These state that data – and also the algorithms, tools, and workflows that led to these data – should be findable, accessible, interoperable, and reusable (FAIR). The FAIR guidelines put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals.
In order to make the output from weather and climate models open and interoperable, i.e., formatted according to standards such as CF-netCDF, including all necessary metadata, we consider performance scalability to be the foremost technological challenge. Whereas the simulation models are predominantly run on large clusters using many compute nodes, subsequent processing and analysis of the output are often still confined to a single CPU and do not scale easily with (say) increased model resolution. Thus, producing FAIR model output via traditional post-processing pipelines is quickly becoming unfeasible for advanced simulation models due to the sheer volume and complexity of their output.
For simulation models, this trend is a consequence of the advance of processor speed and model scalability compared to storage bandwidth and can be countered with two strategies. The first is removing the need for post-processing by incorporating as many steps as possible within the application itself. This will make the model more expensive, especially in terms of memory usage, but the overhead may often be mitigated by offloading the post-processing to a small extra set of dedicated high-memory compute nodes. This approach requires a technical effort from the data providers in the community, and it can only solve the data problem to a limited extent, since there will always be extra manipulations required for many scientific analyses. Hence we need a second strategy on the data users' side to increase parallelism in the climate data processing toolchain. Existing cloud computing technologies, like Apache SPARK (Zaharia et al., 2016) or Dask (Team, 2016), may provide a suitable basis, since data processing and analysis pipelines can usually be represented by task graphs with a large degree of parallelism (over grid points, over multiple variables, over ensemble members, etc.). One of the key aspects, however, is the capability of the developer, usually a meteorologist or climate scientist, to adopt a new programming paradigm which facilitates the parallel execution of the workflow on cloud infrastructure. Here, research software engineers may play a key role by – for instance – developing higher-complexity algorithms for efficient processing of distributed climate data and adopting tools like xarray (Hoyer and Hamman, 2017) and Iris (Office, 2010).
In addition to these technological issues, we observe that some important challenges for open data arise from the political or legal context and as such require additional efforts beyond the scientific domain. Weather services and commercial entities can see their data as a business advantage and be reluctant to make these open. Various resolutions by the World Meteorological Organization (e.g., Resolutions 40, 25, and 60) promote open access and exchange of data in order to better manage the risks from weather and climate-related hazards, but leave room for additional conditions. These resolutions have no legal status and national legislation may lead to restricted access to data and charges (Sylla, 2018). Also, policies to promote open data are less mature than those to promote open access to scientific publications (OECD, 2015). Another way to solve these issues is by signing nondisclosure agreements and allowing the weather services to act as trusted agents who use the data for the public good without disclosing their details. These trusted agents should be considered to occupy a new role in the scientific process.
Furthermore, data need to be hosted and maintained, and their quality should be ensured. These requirements are well-addressed for large operational data services, such as the European Copernicus program, but this is not usually the case for research data of individual scientists, despite the increasing attention being paid to data management. Currently, data providers have no clear policy (such as – for example – the FAIR principles) to follow in their hosting and management of data. Publications such as Geoscience Data Journal, Scientific Data and Earth System Data are a partial remedy as these provide open-access platforms where scientific data can be peer-reviewed and formally published. Some funding agencies – for example NWO in the Netherlands – are now requiring that, for all projects they fund, software becomes open source and the data are archived and findable unless there are strong reasons not to do so (e.g., privacy). Also, research funded by the European Commission should adhere to FAIR principles, and data management plans need to be in place.
4.2 Open software
The conference session provided excellent examples of tools and approaches that were developed and made openly available to the research community, for example, approaches to reduce the computational or post-processing costs of existing simulation models (Stringer et al., 2018; Behrens et al., 2018; van den Oord et al., 2018; Jansson et al., 2018) and approaches to integrate datasets from different sources (van Haren et al., 2018; Schultz et al., 2018). Four studies in the session presented an approach for which open data and software are a prerequisite, as these comprise a model coupling framework or a digital collaboration platform (Pelupessy et al., 2018; Jansson et al., 2018; Ramamurthy, 2018; Hut et al., 2018; Bendoukha, 2018).
We strongly support open publication of code, even if this code is under development, and especially when this code is used in a paper to support research findings. Open code can be inspected and reused by peers; this improves the reproducibility and quality of the corresponding research. Code sharing is crucial to science and to climate research in particular, since local and global policies depend on the scientific results. Open publication, however, requires the code to be documented and tested, which is a time-consuming effort. This level of documentation and testing is not yet standard practice, partially because there is no incentive to do so. There is a need for open science practices where incentives are developed to share scientific information beyond the final result in a scientific paper. Agile (Fowler and Highsmith, 2001) is a well-known approach in the software engineering community and may provide a means to achieve open scientific software in a feasible way. According to the Agile approach, software is developed in small increments every few weeks, which makes it possible to provide continuous feedback to the developers. With its focus on flexibility and communication, Agile lends itself naturally to scientific software projects which are characterized by frequent code alterations due to changing requirements, tight collaboration in small teams, and short planning horizons (Sletholt et al., 2012). Agile practices are used, for example, by the ECMWF to develop the Climate Data Store (Raoult et al., 2017) and the Met Office Hadley Centre to develop climate models (Easterbrook and Johns, 2009).Team (2016)Zaharia et al. (2016)Hoyer and Hamman (2017)Office (2010)Maidment et al. (2011)Soille et al. (2017)
In four studies that were presented in the conference, machine learning technologies are used for data analysis and prediction (Haupt et al., 2018; Garcia-Marti et al., 2018; Bari, 2018; Schultz et al., 2018). Besides using standard meteorological datasets, these studies employed additional data to infer relationships that are relevant to the end user. For example, prediction of solar power output over a future time period requires the inclusion of historical and real-time solar energy production data (Haupt et al., 2018). It was observed that the use of machine learning approaches in weather and climate science is increasing. These approaches are powerful, for instance, in emulating processes that are not resolved in simulation models (because of computational costs), in calibrating or post-processing simulation results, and in building models to describe or forecast meteorological and climatological events. The caveats, on the other hand, are that trained models are not transparent as models based on laws of physics and their results can be hard to interpret. Following the open science principle, machine learning approaches should be understandable and reusable by other researchers. Emerging fields like explainable AI and knowledge-based machine learning may provide approaches that help human experts to understand how machine learning results are produced (Adadi and Berrada, 2018). Data-driven machine learning approaches should be combined with knowledge on physical processes (Dueben and Bauer, 2018; Reichstein et al., 2019) to gain further understanding of Earth system science problems. More broadly, machine learning methods should be accompanied by proper validation and verification.
This use of software, motivated by open science principles, requires a suitable digital infrastructure. The cloud appears to be a potential avenue as it enables individual researchers to gain access to high computing resources, vast amounts of storage, and suites of software tools. In our session, three digital platforms were presented that use cloud technologies to create a virtual research environment in which scientific end users can store, analyze, and share their data (Ramamurthy, 2018; Hut et al., 2018; Bendoukha, 2018). The session participants also observed, however, that current platforms such as the Open Geospatial Consortium (Maidment et al., 2011) and JRC Earth Observation Data and Processing Platform (Soille et al., 2017) do not seem to increase the extent of scientific collaboration, particularly across disciplines. This may be partly due to the fact that these platforms have each implemented their own set of standards both for data formats and interfaces to access these data. Since scientists are required to invest time and effort in working with a specific platform, this heterogeneity can pose obstacles to their collaboration with researchers on another platform.
This paper reflects the current discourse on open science in weather and climate research and the opportunities for sharing and combining data, software, and infrastructure. Although this is an ongoing debate in the community, the individual aspects are usually discussed in isolation. Our approach in this paper takes the discourse further by focusing on “open science in weather and climate research” as a whole, a concept which has hardly received attention so far. We consider all aspects of open science, among them compute infrastructures and stakeholders, and discuss the challenges and opportunities of recent open science developments in data, software, and hardware. We are basing our claims on the insights and observations made during the conference session on “Weather and Climate Science in the Digital Era”. These observations are representative of what we are seeing in the field, although we recognize that our analysis is not complete. However, we believe that, given our experience, we have a solid view of the accomplishments of open science along with what still needs to be implemented.
The studies presented in the session show the value of sharing open data and using and developing open source software and open platforms. Scientific advances are shown, for instance, by combining datasets and including non-standard meteorological data such as social media posts and observations from amateur weather stations. The increase in accuracy and skill of forecasts at local scales shows improved consistency of data products and improved efficiency and skill of simulations, often crossing different disciplines. The utilization of machine learning and increased computational capabilities have facilitated the use of disparate sources of data. In our conference session we concluded that sharing data and code offers many opportunities for scientific progress, leads to better reproducible science, and vastly enhances the user base. However, we realized that open publication of data and code is not sufficient to achieve open weather and climate science and that there are important issues to address, which are described below.
The findability and accessibility of data are increasingly receiving attention in weather and climate research, and common file and metadata formats increase interoperability. However, for many datasets the implementation of the FAIR principles remains a challenge due to their origin, scalability issues, or legal barriers. We also acknowledge that data quality can be difficult to judge, depending on its intended use, or the reason for its generation. Addressing this data quality challenge requires continued discussion of what aspects of open data can be implemented generically and what aspects are specific.
Technologically, the promise of using modern digital technologies is not always met due to the complexity of software platforms. While this paper does not address hardware, this is true for hardware and the software run by this hardware as well. A further development of platforms should facilitate the ease of use and provenance. This also calls for more attention to research software engineering where collaboration and interaction between software engineers and domain researchers can lead to optimal use of open science tools and methods.
As mentioned before, open science concerns various stakeholders in addition to scholars. Data management and programming have become an integral part of current research practice, and these activities require specific digital skills (Akhmerov et al., 2019). It is therefore important to acknowledge and define roles, responsibilities, and mandates concerning data stewardship and research software engineering. This requires institutional change as the personnel portfolio of academic institutions needs to become more diverse, and in addition, the impact of academic work beyond scientific publications and teaching needs to be considered more broadly.
In order to remove legal boundaries on sharing data, it is important to also engage non-academic parties such as operational and commercial meteorological institutions in open science. New policies regarding open science should be developed in an inclusive way to engage all stakeholders. Open science strategies and policies facilitate a higher quality of scientific research, increased collaboration, and engagement between research and society, which in turn can lead to higher social and economic impacts of public research (OECD, 2015).
Alongside the issues and challenges regarding open weather and climate science, this paper also discusses opportunities and possible solutions for these issues. We have compiled these into the following list of concrete recommendations which will bring us closer to open weather and climate science (Table 1). Some of these recommendations are new, and others are ongoing but still hold.
Open science has implications for the stakeholders, the institutions, and the system of science as a whole. It requires effort to change, but the benefits are large. Openly sharing data, code, and knowledge vastly enhances the user base, which means manifold growth of opportunities for new discoveries. As we observed from our conference session, this can lead to an improved understanding of our complex world.
The observations in this paper are based on both the insights from the studies presented in the session and the notes made during the discussion. The notes are not publicly accessible. The studies are published in the conference proceedings of the IEEE 14th International Conference on e-Science and can be found in the reference of this study.
MGdV and WH organized the conference session and were the lead writers of the manuscript. All the authors contributed to the presentations and discussion in the conference session and to the writing of the manuscript.
The authors declare that they have no conflict of interest.
The authors would like to acknowledge both the Netherlands eScience Center and the program committee of the Weather & Climate session for their organizational efforts. The session created a unique opportunity for specialists in the domain of weather and climate science and data and computer scientists to exchange ideas and knowledge. Andreas Mueller and Jörg Behrens acknowledge the ESCAPE projects.
This research has been supported by the European Union's Horizon 2020 research and innovation programme (grant nos. 671627 (ESCAPE) and 800897 (ESCAPE2)). Sue Ellen Haupt is with the National Center for Atmospheric Research in the US, which is a major facility sponsored by the National Science Foundation under a cooperative agreement (grant no. 1852977).
This paper was edited by Sam Illingworth and reviewed by Peter Düben and John K. Hillier.
AARNet: Annual Report/2018 Data Connector for the Future, Tech. Rep., Australia's Academic and Research Network, Chatswood, Australia, 2018. a
Akhmerov, A., Cruz, M., Drost, N., Hof, C., Knapen, T., Kuzak, M., Martinez-Ortiz, C., Turkyilmaz-van der Velden, Y., and Van Werkhoven, B.: Raising the Profile of Research Software: Recommendations for Funding Agencies and Research Institutions, Tech. Rep., Netherlands eScience Center, Amsterdam, the Netherlands, Zenodo, https://doi.org/10.5281/zenodo.3378572, 2019. a
Baker, M.: Is there a reproducibility crisis? A Nature survey lifts the lid on how researchers view the crisis rocking science and what they think will help, Nature, 533, 353–366, 2016. a
Bari, D.: Visibility Prediction based on kilometric NWP Model Outputs using Machine-learning Regression, in: IEEE 14th International Conference on e-Science, p. 278, https://doi.org/10.1109/eScience.2018.00048, 2018. a, b, c
Behrens, J., Biercamp, J., Bockelmann, H., and Neumann, P.: Increasing parallelism in climate models via additional component concurrency, in: IEEE 14th International Conference on e-Science, https://doi.org/10.1109/eScience.2018.00044, 2018. a, b, c
Board, E.: Open Access Policy concerning UNESCO publications, Tech. Rep., United Nations Educational, Scientific and Cultural Organization, Executive board UNESCO, decisions adopted by the executive board at its 191st session, 191 EX/Decisions, 3–4, Paris, 2013. a
Bourne, P. E., Clark, T., de Ward, D. R., Herman, I., Hovy, E., and Shotton, D.: Force 11 White Paper: Improving the future of research communication and e-scholarship, (Dagstuhl Perspectives Workshop 11331), Dagstuhl Manifestos, Tech. Rep., 1, 41–60, https://doi.org/10.4230/DagMan.1.1.41, 2012. a, b
Brangbour, E., Bruneau, P., and Marchand-Maillet, S.: Extracting Flood Maps from Social Media for Assimilation, in: IEEE 14th International Conference on e-Science, https://doi.org/10.1109/eScience.2018.00045, 2018. a, b
Carver, G.: The ECMWF OpenIFS numerical weather prediction model release cycle 40r1: description and use cases, Geosci. Model Dev. Discuss., in preparation, 2019. a
CERN-OPEN-2014-049: Open Access Policy for CERN Physics Publication, Tech. Rep., CERN, 2014. a
Chevallier, F., Cheruy, F., Scott, N. A., and Chedin, A.: A neural network approach for a fast and accurate computation of a longwave radiative budget, J. Appl. Meteorol., 37, 1385–1397, https://doi.org/10.1175/1520-0450(1998)037<1385:annafa>2.0.co;2, 1999. a
Dee, D. P., Uppala, S. M., Simmons, A. J., Berrisford, P., Poli, P., Kobayashi, S., Andrae, U., Balmaseda, M. A., Balsamo, G., Bauer, P., Bechtold, P., Beljaars, A. C., van de Berg, L., Bidlot, J., Bormann, N., Delsol, C., Dragani, R., Fuentes, M., Geer, A. J., Haimberger, L., Healy, S. B., Hersbach, H., Hólm, E. V., Isaksen, L., Kållberg, P., Köhler, M., Matricardi, M., Mcnally, A. P., Monge-Sanz, B. M., Morcrette, J. J., Park, B. K., Peubey, C., de Rosnay, P., Tavolato, C., Thépaut, J. N., and Vitart, F.: The ERA-Interim reanalysis: Configuration and performance of the data assimilation system, Q. J. Roy. Meteor. Soc., 137, 553–597, https://doi.org/10.1002/qj.828, 2011. a, b
Directorate-General for Research and Innovation: Prompting an EOSC in practice. Final report and recommendations of the Commission 2nd High Level Expert Group on the European Open Science Cloud (EOSC), Tech. Rep., European Commission, Prompting an EOSC in practice, Final report and recommendations of the Commission 2nd High Level Expert Group on the European Open Science Cloud (EOSC), https://doi.org/10.2777/112658, 2018. a
Dueben, P. D. and Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning, Geosci. Model Dev., 11, 3999–4009, https://doi.org/10.5194/gmd-11-3999-2018, 2018. a
Executive board: Connecting Science and Society – NWO strategy 2019–2022, Tech. Rep., Netherlands Organisation for Scientific Research, Connecting science and society, in: NWO strategy 2019–2022, https://doi.org/10.21820/23987073.2019.2.44, 2019. a
Eyring, V., Bony, S., Meehl, G. A., Senior, C. A., Stouffer, R. J., and Taylor, K. E.: Overview of the Coupled MOdel Intercomparison Project Phase 6 (CMIP6) Experimental Design and Organization, Geosci. Model Dev., 9, 1937–1958, https://doi.org/10.5194/gmd-9-1937-2016, 2016. a, b
Fecher, B. and Friesike, S.: Open Science: One Term, Five Schools of Thought, in: Opening Science, edited by: Bartling, S. and Friesike, S., 1, 1–7, https://doi.org/10.1007/978-3-319-00026-8_2, 2014. a, b
Fowler, M. and Highsmith, J.: The agile manifesto, Software Development, 9, 28–35, 2001. a
Garcia-Marti, I., Noteboom, J. W., and Diks, P.: Detecting probability of ice formation on overhead lines of the Dutch railway network, in: IEEE 14th International Conference on e-Science, https://doi.org/10.1109/eScience.2018.00050, 2018. a, b, c
Gil, Y., David, C. H., Demir, I., Essawy, B. T., Fulweiler, R. W., Goodall, J. L., Karlstrom, L., Lee, H., Mills, H. J., Oh, J. H., Pierce, S. A., Pope, A., Tzeng, M. W., Villamizar, S. R., and Yu, X.: Toward the Geoscience Paper of the Future: Best practices for documenting and sharing research from data to software to provenance, Earth Space Sci., 3, 388–415, https://doi.org/10.1002/2015EA000136, 2016. a
Haupt, S. E., Cowie, J., Linden, S., Mccandless, T., Kosovic, B., and Alessandrini, S.: Machine Learning for Applied Weather Prediction, in: IEEE 14th International Conference on e-Science, https://doi.org/10.1109/eScience.2018.00047, 2018. a, b, c, d
House of Commons: The disclosure of climate data from the Climatic Research Unit at the University of East Anglia, Science and Technology Committe, available at: http://www.publications.parliament.uk/pa/cm200910/cmselect/cmsctech/387/387i.pdf (last access: 15 May 2020), 2010. a
Huntingford, C., Jeffers, E. S., Bonsall, M. B., Christensen, H. M., Lees, T., and Yang, H.: Machine learning and artificial intelligence to aid climate change research and preparedness, Environ. Res. Lett., 14, 124007, https://doi.org/10.1088/1748-9326/ab4e55, 2019. a
Hurrell, J. W., Holland, M., Gent, P., Ghan, S., Kay, J., Kushner, P., Lamarque, J., Large, W., Lawrence, D., Lindsay, K., and Lipscomb, W.: The Community Earth System Model: A framework for collaborative research, B. Am. Meteorol. Soc., 94, 1339–1360, https://doi.org/10.1175/BAMS-D-12-00121.1, 2013. a, b
Jansson, F., van den Oord, G., Siebesma, P., and Crommelin, D.: Resolving clouds in a global atmosphere model – a multiscale approach with nested models, in: IEEE 14th International Conference on e-Science, https://doi.org/10.1109/eScience.2018.00043, 2018. a, b, c
Kalnay, E., Kanamitsu, M., Kistler, R., Collins, W., Deaven, D., Gandin, L., Iredell, M., Saha, S., White, G., Woollen, J., and Zhu, Y.: The NCEP/NCAR 40-year reanalysis project, B. Am. Meteorol. Soc., 77, 437–472, https://doi.org/10.1175/1520-0477(1996)077<0437:TNYRP>2.0.CO;2, 1996. a, b
Krasnopolsky, V. M., Fox-Rabinovitz, M. S., and Belochitski, A. A.: Accurate and fast neural network emulation of full, long-and short wave, model radiation used for decadal climate simulations with NCAR CAM, in: 19th conference on climate variability and change/fifth conference on artificial intelligence applications to environmental science, 87th AMS Annual Meeting, 2007. a
Maury, M. F.: Explanations and Sailing Directions to Accompany the Wind and Current Charts, in: First International Maritime Conference Held for Devising an Uniform System of Meteorological Observations at Sea, Brussels, 54–96, 1853. a
Mckiernan, E. C., Bourne, P. E., Brown, C. T., Buck, S., Kenall, A., Mcdougall, D., Nosek, B. A., Ram, K., and Soderberg, C. K.: How open science helps researchers succeed, Elife, 5, 1–26, https://doi.org/10.7554/eLife.16800, 2016. a, b, c
Mons, B., Neylon, C., Velterop, J., Dumontier, M., Da Silva Santos, L. O. B., and Wilkinson, M. D.: Cloudy, increasingly FAIR; Revisiting the FAIR Data guiding principles for the European Open Science Cloud, Information Services and Use, 37, 49–56, https://doi.org/10.3233/ISU-170824, 2017. a
Munafò, M. R., Nosek, B. A., Dorothy V. M. Bishop, K. S. B., Christopher D. Chambers, N. P. d. S., Simonsohn, U., Wagenmakers, E.-J., Ware, J. J., and Ioannidis, J. P. A.: A manifesto for reproducible science, Nature Human Behaviour, 1, 1–9, https://doi.org/10.1038/s41562-016-0021, 2017. a, b
Task Force on Data Policies Committee on Strategy and Budget, and National Science Board, Digital Research Data Sharing and Management, available at: https://www.nsf.gov/nsb/publications/2011/nsb1124.pdf (last access: 15 July 2020), 2011. a
National Science Foundation: PROPOSAL & AWARD POLICIES AND PROCEDURES GUIDE (PAPPG), Tech. Rep. OMB Control Number 3145-0058, National Science Foundation, 2018. a
Pelupessy, I., Werkhoven, B. V., van den Oord, G., Zwart, S. P., van Elteren, A., and Dijkstra, H.: Development of the OMUSE/AMUSE modelling system, in: IEEE 14th International Conference on e-Science, https://doi.org/10.1109/eScience.2018.00092, 2018. a, b, c
Penny, S. G. and Hamill, T. M.: Coupled Data Assimilation for Integrated Earth System Analysis and Prediction, B. Am. Meteorol. Soc., 98, ES169–ES172, https://doi.org/10.1175/BAMS-D-17-0036.1, 2017. a
Quetelet, A.: Notice sur Le Capitaine M. F. Maury, in: Associé de l'Académie Royale de Belgique, published by the Academy, Brussels, 1874. a
Reichstein, M., Camps-Valls, G., Stevens, B., Jung, M., Denzler, J., Carvalhais, N., and Prabhat: Deep learning and process understanding for data-driven Earth system science, Nature, 566, 195–204, https://doi.org/10.1038/s41586-019-0912-1, 2019. a, b
Righi, M., Andela, B., Eyring, V., Lauer, A., Predoi, V., Schlund, M., Vegas-Regidor, J., Bock, L., Brötz, B., De Mora, L., Diblen, F., Dreyer, L., Drost, N., Earnshaw, P., Hassler, B., Koldunov, N., Little, B., Loosveldt Tomas, S., and Zimmermann, K.: Earth System Model Evaluation Tool (ESMValTool) v2.0-technical overview, Geosci. Model Dev., 13, 1179–1199, https://doi.org/10.5194/gmd-13-1179-2020, 2020.
Ruti, P., Tarasova, O., Keller, J., Carmichael, G., Hov, Ø., Jones, S., Terblanche, D., Anderson-Lefale, C., Barros, A., Bauer, P., Bouchet, V., Brasseur, G., Brunet, G., DeCola, P., Dike, V., Kane, M. D., Gan, C., Gurney, K., Hamburg, S., Hazeleger, W., Jean, M., Johnston, D., Lewis, A., Li, P., Liang, X., Lucarini, V., Lynch, A., Manaenkova, E., Jae-Cheol, N., Ohtake, S., Pinardi, N., Polcher, J., Ritchie, E., Sakya, A. E., Saulo, C., Singhee, A., Sopaheluwakan, A., Steiner, A., Thorpe, A., and Yamaji, M.: Advancing Research for Seamless Earth System Prediction, B. Am. Meteorol. Soc., 101, 23–35, https://doi.org/10.1175/bams-d-17-0302.1, 2019. a, b
Schneider, T., Lan, S., Stuart, A., and Teixeira, J.: Earth System Modeling 2.0: A Blueprint for Models That Learn From Observations and Targeted High-Resolution Simulations, Geophys. Res. Lett., 44, 12396–12417, https://doi.org/10.1002/2017GL076101, 2017. a
Schultz, M. G., Apweiler, S., Vogelsang, J., Kleinert, F., and Mallmann, D.: A web service architecture for objective station classification purposes, in: IEEE 14th International Conference on e-Science, https://doi.org/10.1109/eScience.2018.00051, 2018. a, b, c, d, e
Sellar, A. A., Jones, C. G., Mulcahy, J. P., Tang, Y., Yool, A., Wiltshire, A., O'Connor, F. M., Stringer, M., Hill, R., Palmieri, J., Woodward, S., de Mora, L., Kuhlbrodt, T., Rumbold, S. T., Kelley, D. I., Ellis, R., Johnson, C. E., Walton, J., Abraham, N. L., Andrews, M. B., Andrews, T., Archibald, A. T., Berthou, S., Burke, E., Blockley, E., Carslaw, K., Dalvi, M., Edwards, J., Folberth, G. A., Gedney, N., Griffiths, P. T., Harper, A. B., Hendry, M. A., Hewitt, A. J., Johnson, B., Jones, A., Jones, C. D., Keeble, J., Liddicoat, S., Morgenstern, O., Parker, R. J., Predoi, V., Robertson, E., Siahaan, A., Smith, R. S., Swaminathan, R., Woodhouse, M. T., Zeng, G., and Zerroukat, M.: UKESM1: Description and Evaluation of the U.K. Earth System Model, J. Adv. Model. Earth Syst., 11, 4513–4558, https://doi.org/10.1029/2019MS001739, 2019. a
Sellar, A. A., Walton, J., Jones, C. G., Wood, R., Abraham, N. L., Andrejczuk, M., Andrews, M. B., Andrews, T., Archibald, A. T., Mora, L., Dyson, H., Elkington, M., Ellis, R., Florek, P., Good, P., Gohar, L., Haddad, S., Hardiman, S. C., Hogan, E., Iwi, A., Jones, C. D., Johnson, B., Kelley, D. I., Kettleborough, J., Knight, J. R., Köhler, M. O., Kuhlbrodt, T., Liddicoat, S., Linova‐Pavlova, I., Mizielinski, M. S., Morgenstern, O., Mulcahy, J., Neininger, E., O'Connor, F. M., Petrie, R., Ridley, J., Rioual, J., Roberts, M., Robertson, E., Rumbold, S., Seddon, J., Shepherd, H., Shim, S., Stephens, A., Teixeira, J. C., Tang, Y., Williams, J., and Wiltshire, A.: Implementation of UK Earth system models for CMIP6, J. Adv. Model. Earth Syst., 12, 1–27, https://doi.org/10.1029/2019ms001946, 2020. a, b
Shamir, L., Wallin, J. F., Allen, A., Berriman, B., Teuben, P., Robert J. Nemiroff, J. M., Hanisch, R. J., and DuPrie, K.: Practices in source code sharing in astrophysics, Astron. Comput., 1, 54–58., 2013. a
Simmonds, R., Taylor, R., Horrell, J., Fanaroff, B., Sithole, H., van Rensburg, S., and Al., E.: The African data intensive research cloud, IST – Africa Week Conference, 1–8, available at: https://doi.org/10.1109/ISTAFRICA.2016.7530650, 2016. a
Skamarock, W. C., Klemp, J. B., Dudhia, J., Gill, D. O., Liu, Z., Berner, J., Wang, W., Powers, J. G., Duda, M. G., Barker, D. M., and Huang, X.-Y.: A Description of the Advanced Research WRF Version 4, NCAR Tech. Note NCAR/TN-556+STR, Tech. Rep., NCAR, https://doi.org/10.5065/1dfh-6p97, 2019. a, b
Sletholt, M. T., Hannay, J. E., Pfahl, D., and Langtangen, H. P.: What do we know about scientific software development's agile practices?, Comput. Sci. Eng., 14, 24–36, https://doi.org/10.1109/MCSE.2011.113, 2012. a
Soille, P., Burger, A., Hasenohr, P., Kempeneers, P., Rodriguez Aseretto, D., Syrris, V., Vasilev, V., and Marchi, D.: The JRC Earth Observation Data and Processing Platform, in: Big Data From Space, Toulouse, France, 2017. a, b
Stringer, M., Jones, C., Hill, R., Dalvi, M., Johnson, C., and Walton, J.: A Hybrid-Resolution Earth System Model, in: IEEE 14th International Conference on e-Science, https://doi.org/10.1109/eScience.2018.00042, 2018. a, b, c
Sylla, M. B.: Review of meteorological/climate data sharing policy (WMO Resolution 40) to promote their use to support Climate Information Services uptake in the African continent, in: Expert Group Meeting on data sharing policy in Africa, July, Dakar, Senegal, 10–11, 2018. a
van den Oord, G., Yepes, X., and Acosta, M.: Post-processing strategies for the ECMWF model, in: IEEE 14th International Conference on e-Science, https://doi.org/10.1109/eScience.2018.00092, 2018. a, b, c
van Haren, R., Koopmans, S., Steeneveld, G.-J., Theeuwes, N., Uijlenhoet, R., and Holtslag, A. A. M.: Weather reanalysis on an urban scale using WRF, in: IEEE 14th International Conference on e-Science, https://doi.org/10.1109/eScience.2018.00049, 2018. a, b, c
Wilkinson, M. D., Dumontier, M., Aalbersberg, Ij. J., Appleton, G., Axton, M., Baak, A., and Mons, B.: Comment: The FAIR Guiding Principles for scientific data management and stewardship, Scientific data, 3, 1–9, https://doi.org/10.1038/sdata.2016.18, 2016. a, b, c
Zaharia, M., Xin, R. S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M. J., Ghodsi, A., Gonzalez, J., Shenker, S., and Stoica, I.: Apache spark: A unified engine for big data processing, Commun. ACM, 59, 56–65, https://doi.org/10.1145/2934664, 2016. a, b