Xarray is an open-source Python package that makes working with complex, multi-dimensional arrays elegant, intuitive, and efficient. Real-world datasets, such as those generated by NASA, are often a collection of many related variables on a common grid. These datasets are more than just arrays of values: they have labels which describe how array values map to locations in dimensions such as space and time and metadata that describes how the data was collected and processed. Xarray embraces the complexity of real-world datasets and enables users to use metadata such as dimension names and coordinate labels to easily analyze, manipulate, and visualize their datasets. Xarray makes data analysis more intuitive and enjoyable, while describing how data was collected and processed.
A Vital Role in Handling NASA’s Evolving Data Demands
Consider for a moment that NASA’s Science Mission Directorate (SMD) collectively stores over 100 Petabytes (PB) of data and estimates doubling that to 200 PB per year within the next five years. Handling large amounts of data at scale is clearly an important consideration as the volume of data from modern sensors continues to grow. With that said, Xarray’s flexibility has played a pivotal role in NASA’s transition to cloud computing infrastructure, ensuring efficient and robust data processing for the agency’s vast repositories of information. Xarray is a common component in workflows involving NASA datasets across many domains, including physical oceanography, and glaciology.
In 2021, NASA selected Xarray as one of eight open–source projects for funding under the Open Source Tools, Frameworks, and Libraries program. This financial support has not only allowed the Xarray project to flourish, but also to expand itself for usage of NASA data through maintenance and outreach activities (see the full proposal announcement).
Committed support from NASA has been instrumental in allowing Xarray maintainers to make major progress on long-term goals such as reorganizing the code base for long term sustainability, substantially revamping the Xarray tutorial website, and spending time to implement new features that benefit a wide number of domains . NASA’s ongoing support has also allowed maintainers to spend time on day-to-day maintenance tasks and handle user support requests more quickly. Previously, such work was performed on a volunteer basis and hard to sustain.
Xarray has used funds to help build its community through the SIParCS summer internship program at NCAR (blog), participate in conferences such as SciPy 2023, and host virtual office hours. Over the next year, the team is looking to get more involved with domain specific extensions for the needs of NASA’s remote sensing data through rioxarray, continue the office hours program, and represent Xarray at a number of additional conferences.
Xarray at SciPy 2023
Thanks to NASA funding, Xarray was able to participate in SciPy 2023 in a significant way. The tutorial at SciPy 2023 was an exciting opportunity for scientists already familiar with Xarray to delve into advanced topics. The 2023 tutorial targeted intermediate-advanced level material and built on the fundamental level tutorial delivered at SciPy 2022.
Tutorial participants reported they were able to streamline their workflows by using more of Xarray’s built-in functions after gaining insight into concepts that were initially intimidating, such as parallelizing computations on very large datasets.
The team had good turnout at Scipy “sprints”, where the Xarray community worked together with allied projects like Zarr to discuss and quickly solve problems. Emma Marshall, presented a great talk building on her 2022 SIPaRCS internship work with Xarray on how to organize tidy remote sensing datasets in a manner that facilitates easy analysis in the future.
An Open and Inclusive Community
In addition to impressive technical capabilities, one of Xarray’s greatest strengths is its vibrant and inclusive community. Xarray has been publicly developed on Github since 2014 with over 270 contributors improving upon this project through open development practices. Thanks to these active GitHub contributions, conference tutorials and virtual office hours, Xarray has garnered interest from over 10,000 active users across various scientific disciplines.
NASA funding from the OSTFL program supports Xarray maintainers from historically underrepresented groups in the fields of Earth Science and open-source software development, demonstrating a commitment to inclusivity. Tutorials and virtual office hours increase the visibility of these individuals so they can serve as role models within their communities. Xarray places high value on a diverse group of users and contributors at all levels of software development expertise in order to improve the overall quality and accessibility of the software.
If you are interested in contributing your skills and enthusiasm to the Xarray project by reporting bugs, improving documentation, suggesting enhancements, and sharing any other ideas visit the contributions page today.
Xarray’s commitment to these principles of openness and inclusivity are in close alignment with NASA’s vision of open science. At NASA, 2023 is the year of open science, and one of the core ideas of open science is that by breaking down barriers and having scientists from diverse backgrounds engage with research, scientific discoveries will be accelerated. To aid in growing the open science community, NASA is developing a curriculum to train scientists, researchers, and citizen scientists to use open science tools, like Xarray, in their research. To learn more and to pre-enroll in the curriculum, visit the Transform to Open Science (TOPS) GitHub page.
Conclusion
With its powerful capabilities and inclusive community, Xarray is a compelling tool that will hopefully entice readers to explore its potential in their data science endeavors at NASA and beyond. Head over to their GitHub to explore the wealth of resources and the vibrant community that has made this project what it is today.