Open Weather and Climate Science in the Digital Era

. The need for open science has been recognized by the communities of meteorology and climate science. While these domains are mature in terms of applying digital technologies, the implementation of open science methodologies is less advanced. In a session on “Weather and Climate Science in the Digital Era” at the 14th IEEE International eScience Conference domain specialists and data and computer scientists discussed the road towards open weather and climate science. Roughly 80% of the studies presented in the conference session showed the added value of open data and software. These 5 studies included open datasets from disparate sources in their analyses, or developed tools and approaches that were made openly available to the research

The main challenges we observed, however, were non-technical and impact the practice of science as a whole. There is a need for new roles and responsibilities in the scientific process. People working at the interface of science and digital technology -e.g., data stewards and research software engineers -should collaborate with domain researchers to ensure the optimal use of open science tools and methods. In order to remove legal boundaries on sharing data, non-academic parties such as meteorological institutes should be allowed to act as trusted agents. Besides the creation of these new roles, novel policies 20 regarding open weather and climate science should be developed in an inclusive way in order to engage all stakeholders.
Although there is an ongoing debate on open science in the community, the individual aspects are usually discussed in isolation. Our approach in this paper takes the discourse further by focusing on 'open science in weather and climate research' as a whole. We consider all aspects of open science and discuss the challenges and opportunities of recent open science developments in data, software and hardware. We have compiled these into a list of concrete recommendations that could bring 25 us closer to open weather and climate science. We acknowledge that the development of open weather and climate science requires effort to change, but the benefits are large. We have observed these benefits directly in the studies presented in the conference and believe that it leads to much faster progress in understanding our complex world.

INTRODUCTION
In this article we describe the main findings of a conference session on "Weather and Climate Science in the Digital Era" with 30 a special focus on the implementation of open science methodologies.
Meteorology and climate sciences are data-and computationally-intensive areas of research by tradition. Being primarily a physical science, empirical data collection has always been important and meteorology was one of the first fields that standardized data collection from the advent of systematic instrumental observations in the mid-1800s (e.g. Maury, 1853;Quetelet, 1874). In addition, the production of meteorological forecasts was one of the first applications to be developed for 35 electronic computers, following decades during which the calculations were performed by hand (we recall that "computer" originally meant "one who computes", and that the adjective "electronic" was introduced to distinguish the machine from the human). Numerical weather prediction (NWP) has advanced from the first operational predictions in the 1950s (Charney et al., 1950), aided by increased computing capability and the growing supply of observational data to generate initial conditions for assimilation into the model state. Climate research has benefitted from the same developments (see e.g. Lynch, 2008, 40 for an overview).The assimilation of observational data into NWP models has been a turning point for the development of high-resolution gridded information of the atmosphere and ocean state (e.g. Kalnay et al., 1996;Dee et al., 2011). The use of this methodology for reanalysis -that is, generating a comprehensive and physically consistent record of how the weather is changing over time -has ensured a baseline for climate research and triggered the development of downstream climate services.
Meteorologists have been using machine learning to post-process model output, blend multiple models, and optimize the 45 weighting of models for over 20 years (Haupt et al., 2018). Neural nets were used in the 90s to speed up the calculation of outgoing longwave radiation in climate models (Chevallier et al., 1999), and for both short-and long-wave radiation parameterization in the National Center for Atmospheric Research (NCAR) Community Atmospheric Model (CAM) (Krasnopolsky et al., 2007). Present and future strategies feature an Earth System approach for assimilating environmental data into a more comprehensive coupled system including the atmosphere, ocean, biosphere and sea-ice (Penny and Hamill, 2017).

50
The influence and application of digital technologies has shown no sign of abatement in recent times. Three technological developments are having a strong effect on meteorology and climate research (Ruti et al., 2019). First, the increase of computing power. Exascale (i.e., 10 18 operations per second) is the next proxy in the long trajectory of exponential performance increases that has continued for more than half a century (Reed and Dongarra, 2015) and provides unprecedented opportunities with regard to the finer resolution of scales in time and space, and/or the coupling of more components that represent different parts 55 of the Earth system. However, it also poses large software development and data management challenges, such as the impact of increasing numerical model resolution, increasing code complexity, and the volumes of data that are handled (Bauer et al., 2015;Sellar et al., 2020). A second development concerns the open availability of standard meteorological data and data from a variety of sources, including citizen science projects and low-cost sensors. Modern data management tools enable handling these data sources. Thirdly, there has been increasing use of machine learning, in particular so-called deep learning. A plethora 60 of machine learning methods have been and are being applied to problems of weather and climate prediction, from emulating unresolved processes in numerical models to calibrating forecasts produced with numerical models and the production of forecasts based on data and machine learning methods only (Huntingford et al., 2019;Schneider et al., 2017;Reichstein et al., 2019).
Digital technologies enable new research methods, accelerate the growth of knowledge, and spur the creation of new means 65 of communicating such knowledge amongst researchers and within the broader scientific community. As such, these technolo-  In a session on "Weather and Climate Science in the Digital Era" at the 14th IEEE International eScience Conference, domain specialists and data and computer scientists discussed the road towards open weather and climate science. This paper describes the main findings and insights from this conference session.  The "Weather and Climate Science in the Digital Era" conference session examined some of the data and compute intensive approaches which are used in weather and climate science. The session comprised ten oral abstract presentations, one keynote talk, and six short poster pitches. Contributions were selected after a peer review on their scientific merit and innovative nature 85 and published in the conference proceedings (Bari, 2018;Behrens et al., 2018;Bendoukha, 2018;Brangbour et al., 2018;Garcia-Marti et al., 2018;Haupt et al., 2018;Hut et al., 2018;Jansson et al., 2018;Pelupessy et al., 2018;Ramamurthy, 2018;Schultz et al., 2018;Stringer et al., 2018;van Haren et al., 2018;. The sixteen session participants were either presenters or involved in the organization of the session, and represented disparate science domains, as well as computer and data sciences.

90
Following the first part of the session which was dedicated to the presentations, the participants broke into three groups to discuss "challenges and opportunities regarding open weather and climate science". The findings of each group were presented and discussed in a final plenary session, during which observations and insights were documented.
The observations in this paper are based on both the insights from the studies presented in the session, and the notes made during the discussion.The majority of the participants in the session also contributed to this paper. As such, this represents a 95 shared view of a group of experts in weather and climate science on digital and open science developments in their field. Open science refers to open research practices, and includes but is not limited to public access to the academic literature, 100 sharing of data and code (Mckiernan et al., 2016). However, the interpretation of the concept of open science varies between different schools of thought (Fecher and Friesike, 2014). In general, open science concerns various stakeholders: besides scholars, these include institutes, research funders, librarians and archivists, publishers and decision makers (Bourne et al., 2012;OECD, 2015;Fecher and Friesike, 2014).

OPEN SCIENCE
It has been shown that the adoption of open research practices leads to significant benefits for researchers: specifically, 105 increases in citations, media attention, potential collaborators, job opportunities and funding opportunities (Mckiernan et al., 2016). Europe and the United States have made efforts to adapt legal frameworks and implement policy initiatives for greater openness in scientific research (OECD, 2015;National Science Foundation, 2018). Several countries provide digital infrastructure based on rich metadata that support the optimal re-use of resources in the research environment (Mons et al., 2017). The need for open research practices has been recognized by the communities of meteorology and climate science and has even entered into the political arena. For instance, in its report on the so-called "Climatic Research Unit email controversy" in 2009 the Science and Technology Committee of the UK House of Commons stated that climate science is a matter of great importance and that the quality of the science should be irreproachable. The committee called for the climate science 120 community to become more transparent by publishing raw data and detailed methodologies (House of Commons, 2010). The international meteorological and climate research communities have been sharing data since the 1990s, using common file and metadata formats. Besides CMIP (Taylor et al., 2012), examples include the sharing of reanalysis data, starting with NCEP/NCAR reanalysis and ECMWFs ERA reanalysis data products (Dee et al., 2011;Kalnay et al., 1996, e.g.).
There adoption of these practices has not yet been achieved, which is also true for meteorology and climate science. In fact, sharing of data, software and vocabularies is only common practice in a few fields such as astronomy and genomics (Consortium, 150 2004;Borgman, 2012;Shamir et al., 2013, e.g.). Recent studies show that transparency and reproducibility are still a matter of concern to the scientific community as a whole. It requires that all stakeholders work together to create a more open and robust system (Baker, 2016;Munafò et al., 2017;Gil et al., 2016).

TOWARDS OPEN WEATHER AND CLIMATE SCIENCE
In the following section we present our perspective on the challenges and opportunities regarding open weather and climate 155 science.

OPEN DATA
About 50% of the studies reported in the proceedings of the conference session include open data from different sources in their analyses. Examples include the use of open satellite data, geolocated data via OpenStreetMap and openly available in-situ meteorological observations (Haupt et al., 2018;Garcia-Marti et al., 2018;Bari, 2018;Schultz et al., 2018, and references 160 therein). Two studies include data that are not common in meteorological or climate research. Citizen data such as social media posts (Brangbour et al., 2018) and observations from amateur weather stations (van Haren et al., 2018) can lead to new perspectives on local conditions beyond data from traditional meteorological stations.
At least 50% of the studies use common file formats and standard protocols to facilitate the exchange and use of data. Van den Oord et al. 2018 use CF-netCDF formats. The CF conventions provide guidelines for the use of metadata in the netCDF file and 165 are increasingly used in climate studies. Behrens et al. 2018, Pelupessy et al. 2018, Schultz et al. 2018and Stringer et al. (2018 all use standard protocols for inter-process communication (like MPI and REST) in their numerical codes. Furthermore, the use of common file formats and standard protocols is a prerequisite for the digital collaboration platforms which were presented in the session (Ramamurthy, 2018;Hut et al., 2018;Bendoukha, 2018).
The session participants recognized that in the current weather and climate science community the focus is primarily on 170 making data and software findable and accessible, often via web portals. Although these are necessary first steps towards open science, we acknowledge that these steps are not sufficient. Data and software that are findable and accessible may still be hard to obtain in practice or may be disseminated in a way that it is still difficult to interpret and use. Wilkinson and colleagues (2016) defined guidelines to ensure the transparency, reproducibility, and reusability of scientific data. These state that data -and also the algorithms, tools, and workflows that led to these data -should be Findable, Accessible, Interoperable and 175 Reusable (FAIR). The FAIR guidelines put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals.
In order to make the output from weather and climate models open and interoperable, i.e. formatted according to standards such as CF-netCDF" including all necessary metadata, we consider performance scalability as the foremost technological challenge. Whereas the simulation models are predominantly run on large clusters using many compute nodes, subsequent 180 processing and analysis of the output is often still confined to a single CPU and does not scale easily with (say) increased 6 model resolution. Thus, producing FAIR model output via traditional post-processing pipelines is quickly becoming infeasible for advanced simulation models due to the sheer volume and complexity of their output.
For simulation models, this trend is a consequence of the advance of processor speed and model scalability compared to storage bandwidth, and can be countered with two strategies. The first is removing the need for post-processing by incorporating 185 as many steps as possible within the application itself. This will make the model more expensive, especially in terms of memory usage, but the overhead may often be mitigated by offloading the post-processing to a small extra set of dedicated high-memory compute nodes. This approach requires a technical effort from the data providers in the community, and it can only solve the data problem to a limited extent, since there will always be extra manipulations required for many scientific analyses. Hence we need a second strategy on the data users' side to increase parallelism in the climate data processing toolchain. Existing 190 cloud computing technologies, like Apache SPARK (Zaharia et al., 2016) or Dask (Team, 2016), may provide a suitable basis, since data processing and analysis pipelines can usually be represented by task graphs with a large degree of parallelism (over grid points, over multiple variables, over ensemble members, etc.). One of the key aspects, however, is the capability of the developer, usually a meteorologist or climate scientist, to adopt a new programming paradigm which facilitates the parallel execution of the workflow on cloud infrastructure. Here, research software engineers may play a key role by -for instance 195 -developing higher-complexity algorithms for efficient processing of distributed climate data and adopting tools like xarray (Hoyer and Hamman, 2017) and Iris (Office, 2010).
In addition to these technological issues, we observe that some important challenges for open data arise from the political or legal context, and as such require additional efforts beyond the scientific domain. Weather services and commercial entities can see their data as a business advantage and be reluctant to make these open. Various resolutions by the World Meteorological Organisation (e.g. Resolution 40, 25 and 60) promote open access and exchange of data in order to better manage the risks from weather and climate-related hazards, but leave room for additional conditions. These resolutions have no legal status and national legislation may lead to restricted access to data and charges (Sylla, 2018). Also, policies to promote open data are less mature than those to promote open access to scientific publications (OECD, 2015). Another way to solve these issues is by signing nondisclosure agreements and allow the weather services to act as trusted agents who use the data for the public good 205 without disclosing their details. These trusted agents should be considered as occupying a new role in the scientific process.
Furthermore, data need to be hosted and maintained, and their quality should be ensured. These requirements are welladdressed for large operational data services, such as the European Copernicus program, but this is not usually the case for research data of individual scientists, despite the increasing attention being paid to data management. Currently, data providers have no clear policy (such as -for example -the FAIR principles) to follow in their hosting and management of data. Publica-210 tions such as Geoscience Data Journal, Scientific Data and Earth System Data, are a partial remedy as these provide open access platforms where scientific data can be peer-reviewed and formally published. Some funding agencies -for example NWO in the Netherlands -are now requiring that, for all projects they fund, software becomes open source and the data are archived and findable unless there are strong reasons not to do so (e.g. privacy). Also, research funded by the European Commission should adhere to FAIR principles and data management plans need to be in place.

OPEN SOFTWARE
The conference session provided excellent examples of tools and approaches that were developed and made openly available to the research community. For example, approaches to reduce the computational or post processing costs of existing simulation models (Stringer et al., 2018;Behrens et al., 2018;Jansson et al., 2018) and approaches to integrate data sets from different sources (van Haren et al., 2018;Schultz et al., 2018). Four studies in the session presented an approach 220 for which open data and software is a prerequisite, as these comprise a model coupling framework or a digital collaboration platform (Pelupessy et al., 2018;Jansson et al., 2018;Ramamurthy, 2018;Hut et al., 2018;Bendoukha, 2018).
We strongly support open publication of code, even if this code is under development, and especially when this code is used in a paper to support research findings. Open code can be inspected and reused by peers; this improves the reproducibility and quality of the corresponding research. Code sharing is crucial to science and to climate research in particular, since local 225 and global policies depend on the scientific results. Open publication, however, requires the code to be documented and tested, which is a time-consuming effort. This level of documentation and testing is not yet standard practice, partially because there is no incentive to do so. There is a need for open science practices where incentives are developed to share scientific information beyond the final result in a scientific paper. Agile (Fowler and Highsmith, 2001) is a well-known approach in the software engineering community, and may provide a means to achieve open scientific software in a feasible way. According to the Agile 230 approach, software is developed in small increments every few weeks, which makes it possible to provide continuous feedback to the developers. With its focus on flexibility and communication, Agile lends itself naturally to scientific software projects which are characterized by frequent code alterations due to changing requirements, tight collaboration in small teams, and short planning horizons (Sletholt et al., 2012). Agile practices are used, for example, by the ECMWF to develop the Climate Data Store (Raoult et al., 2017) and the Met Office Hadley Centre to develop climate models (Easterbrook and Johns, 2009).

235
In four studies that were presented in the conference, machine learning technologies are used for data analysis and prediction (Haupt et al., 2018;Garcia-Marti et al., 2018;Bari, 2018;Schultz et al., 2018). Besides using standard meteorological datasets, these studies employed additional data to infer relationships that are relevant to the end user. For example, prediction of solar power output over a future time period requires the inclusion of historical and real-time solar energy production data (Haupt et al., 2018). It was observed that the use of machine learning approaches in weather and climate science is increasing.

240
These approaches are powerful, for instance, in emulating processes that are not resolved in simulation models (because of computational costs), in calibrating or post-processing simulation results and in building models to describe or forecast meteorological and climatological events. The caveats, on the other hand, are that trained models are not transparent as models based on laws of physics and their results can be hard to interpret. Following the open science principle, machine learning approaches should be understandable and reusable by other researchers. Emerging fields like Explainable AI and knowledge 245 based machine learning may provide approaches that help humans experts to understand how machine learning results are produced (Adadi and Berrada, 2018). Data-driven machine learning approaches should be combined with knowledge on physical processes (Dueben and Bauer, 2018;Reichstein et al., 2019) to gain further understanding of Earth system science problems.
More broadly, machine learning methods should be accompanied by proper validation and verification. This use of software, motivated by open science principles, requires a suitable digital infrastructure. The cloud appears to be 250 a potential avenue as it enables individual researchers to gain access to high computing resources, vast amounts of storage and suites of software tools. In our session, three digital platforms were presented that use cloud technologies to create a virtual research environment in which scientific end-users can store, analyze and share their data (Ramamurthy, 2018;Hut et al., 2018;Bendoukha, 2018).The session participants also observed, however, that current platforms such as the Open Geospatial Consortium (D. Maidment et al., 2011) and JRC Earth Observation Data and Processing Platform (Soille et al., 2017), do not 255 seem to increase the extent of scientific collaboration, particularly across disciplines. This may be partly due to the fact that these platforms have each implemented their own set of standards both for data formats and interfaces to access these data.
Since scientists are required to invest time and effort in working with a specific platform, this heterogeneity can pose obstacles to their collaboration with researchers on another platform. and open platforms. Scientific advances are shown, for instance, through combining data sets and including non-standard meteorological data such as social media posts and observations from amateur weather stations.The increase in accuracy and skill of forecasts at local scales show improved consistency of data products and improved efficiency and skill of simulations, often crossing different disciplines. The utilisation of machine learning and increased computational capabilities have facilitated the use of disparate sources of data. In our conference session we concluded that sharing data and code offers many opportunities 275 for scientific progress, leads to better reproducible science and vastly enhances the user base. However, we realized that open publication of data and code is not sufficient to achieve open weather and climate science and that there are important issues to address, which are described below.

DISCUSSION
The findability and accessibility of data increasingly receives attention in weather and climate research, and common file and metadata formats increase interoperability. However, for many data sets the implementation of the FAIR principles remains 280 a challenge due to their origin, scalability issues or legal barriers. We also acknowledge that data quality can be difficult to judge, depending on its intended use, or the reason for its generation. Addressing this data quality challenge requires continued discussion on what aspects of open data can be implemented generically and what aspects are specific.
Technologically, the promise of using modern digital technologies is not always met due to the complexity of software platforms. While this paper does not address hardware, this is true for hardware and the software run by these hardware as 285 well. A further development of platforms should facilitate the ease-of-use and provenance. This also calls for more attention to research software engineering where collaboration and interaction between software engineers and domain researchers can lead to optimal use of open science tools and methods.
As mentioned before, open science concerns various stakeholders in addition to scholars. Data management and programming have become an integral part of current research practice, and these activities require specific digital skills (Akhmerov 290 et al., 2019). It is therefore important to acknowledge and define roles, responsibilities and mandates concerning data stewardship and research software engineering. This requires institutional change as the personnel portfolio of academic institutions needs to become more diverse, and in addition, a broader consideration of the impact of academic work beyond scientific publications and teaching.
In order to remove legal boundaries on sharing data, it is important to also engage non-academic parties such as operational

300
Alongside the issues and challenges regarding open weather and climate science, this paper also discusses opportunities and possible solutions for these issues. We have compiled these into the following list of concrete recommendations which will bring us closer to open weather and climate science. Some of these recommendations are new, others are ongoing, but still hold.

recommendation status
Developers should include post-processing steps in their simulation models. Requires additional compute and memory.
ongoing Users of simulation data should increase parallelism in the data processing tool chain.
Requires additional expertise in cloud computing, parallel and distributed computing. recent; see e.g. (Team, 2016;Zaharia et al., 2016;Hoyer and Hamman, 2017;Office, 2010) Individual researchers should be encouraged to publish scientific data in dedicated data journals. Open science has implications for the stakeholders, the institutions and the system of science as a whole. It requires effort to change, but the benefits are large. Openly sharing data, code, and knowledge vastly enhances the user base, which means manifold growth of opportunities for new discoveries. As we observed from our conference session, this can lead to an improved understanding of our complex world.
Author contributions. MGdV and WH organized the conference session and were lead writers of the manuscript. All authors contributed to 310 the presentations and discussion in the conference session and to the writing of the manuscript.
Competing interests. The authors declare that they have no conflict of interest.

11
Acknowledgements. The authors would like to acknowledge both the Netherlands eScience Center and the program committee of the