Open Science Analysis Working Groups: Collaborating on the Future

By Anna White

Dr. WIllian da Silveira
Dr. Willian da Silveira

It’s a closely held belief among those who practice open science that it can lead to wonderful discoveries and collaborations that can close the gap between the now and the future of science. For some in the open science community, those collaborative opportunities can take them down a path they never expected.

Early in his career in medical sciences at the University of South Carolina, Dr. Willian da Silveira jumped at the opportunity to work with NASA through a funding grant. After everything was settled and work started, he was added to an email list where he saw GeneLab, NASA’s space-biology database, which is now a part of the Open Science Data Repository, promoting their Analysis Working Groups.

Established in 2018, NASA’s GeneLab project created the Analysis Working Groups (AWGs) to optimize data processing and improve the effectiveness of the group’s data system through the extensive use of analytics. These groups are comprised of volunteer researchers, principal investigators, professors, and students who band together to establish analytical processes and generate higher-order data from data housed in the GeneLab Data System with relevance to one or more specific application areas.

It is interdisciplinary communication like this that allows for science to grow at breakneck speeds. Mathematicians can offer new perspectives to biologists, biologists to geologists, geologists to astronomers, and so on in a never-ending loop of infinite possibilities.

“I never worked on anything so close to science fiction,” recalled da Silveira. “I was once in a conference room with a person that was giving a presentation about new concepts on starships. During a coffee break, he asked me my opinion about starships…I’m a pharmacist!”

But in addition to furthering science across the globe, the AWGs have also served as simply “a good platform for meeting people,” according to da Silveira.

After connecting with other members of the Multi-Omics AWG, da Silveira and his fellow AWG members wrote a publication that ended up on the cover of Cell. From there, his career blasted off; he earned his first lecturer position at Staffordshire University in England and started teaching space omics as an adjunct professor at the International Space University, where he still uses GeneLab data in his lessons. Without the AWGs, he would likely never have had these experiences that were essential to the building of his career.

“It started as a side project that I did after hours or on the weekends, but as time passed this started to move to the center of my career, something I never planned,” said da Silveira. “GeneLab was essential to what I have now in my professional career.”

Beyond helping with careers, open science binds people from all around the world together in community rather than competition. This free sharing of ideas and projects allows scientific minds from all experiences and backgrounds to assist in things larger than themselves, and in turn, potentially receive assistance with their own projects and gain new perspectives.

“When I entered the AWG, I was quite happy just to be part of it, and now I’m using AWG data literally around the world,” says da Silveira. “I’ve said to people, ‘Oh you think this is boring? NASA thinks this is NOT boring. At least if NASA thinks this is important, I don’t have to fight too much to show how it can be important for them.’”

This is the goal of Analysis Working Groups, and by extension, open science. The freer knowledge is, the easier collaboration is, and therefore the faster the future becomes the present.

If you are interested in learning more about Open Science’s many AWGs, which includes GeneLab, click here.

Anna White is an intern from the University of Alabama, supporting NASA’s Marshall Space Flight Center’s Office of Strategic Analysis & Communications.

NEON: Combining Open Science and Wildfire Studies

By Anna White

At the University of Colorado Boulder, a Fire Ecologist by the name of Dr. Jennifer Balch is studying how forests recover after wildfires and how they contribute to atmospheric carbon levels – work that is vital to climate change studies and could ultimately create more accurate models of forest fire carbon release for future studies. 

When forests burn, the massive quantities of carbon stored within the vegetation is rapidly released into the atmosphere. As forests recover, trees and undergrowth mature and gradually re-absorb carbon, storing it away once again. With carbon being one of the dreaded greenhouse gases, this topic is becoming increasingly relevant to our warming planet and is being facilitated by the efforts of the open science community.  

Seasonal prescribed burn at Konza Prairie Biological Station.
Seasonal prescribed burn at the Konza Prairie Biological Station, location of a NEON field site. Photo Credit: NEON/Battelle

The National Science Foundation’s National Ecological Observatory Network (NEON), a major proponent of open science, has provided Balch with test and control sites that are in various stages of wildfire recovery. To observe and gather data from these plots, Balch uses drones, NEON’s Airborne Observation Platform (AOP), and light detection and ranging (LiDAR) data from NASA’s Global Ecosystem Dynamics Investigation (GEDI). Through the combination of all this information, Balch and her colleague, Dr. Nayani Ilangakoon, have access to data on over 600 fires that have burned more than 1,000 acres; data that has led to extensive studies on wildfire recovery in western U.S. forests. 

A joint mission between NASA and the University of Maryland, GEDI acquires data using an instrument installed aboard the International Space Station (ISS) that utilizes lasers to construct detailed three-dimensional forest maps. By accurately measuring these forests in 3D, GEDI data helps scientists understand Earth’s carbon cycle and how much is stored and lost when forests are disturbed.  

High-resolution LiDAR data
High-resolution LiDAR data gathered during 2012 flyovers of NEON’s Harvard Forest site. Photo Credit: NEON/Battelle

Thanks to NASA’s vast collection of satellite and LiDAR data and NEON’s facilities, drones and AOP, the data gathered for this project can be accurate on scales as large as kilometers and as small as centimeters. If fed into improved models, this data could provide extreme detail to scientists studying climate change, demonstrating a true win for open science.  

Anna White is an intern from the University of Alabama, supporting NASA’s Marshall Space Flight Center’s Office of Strategic Analysis & Communications.

Embracing Celestial Insights: Exploring Eclipses, Open Science, and Heliophysics Big Year

You are invited to our upcoming NASA Transform to Open Science (TOPS) Community Forum on September 14 at 1 p.m. ET for a webinar on eclipse events and their connection to the world of open science. We will discuss how eclipse research can contribute to collective research efforts, leading to innovative findings and a deeper comprehension of these phenomena. 

We are excited to have two distinguished speakers, Mitzi Adams, assistant manager of the Heliophysics and Planetary Science Branch at NASA’s Marshall Space Flight Center and Dr. Kelly Korreck, program manager for the 2023 and 2024 Solar Eclipses and program scientist for the Heliophysics Division in the Science Mission Directorate at NASA HQ, will share their expertise and insights.

Adams will explore the data-driven side of eclipses and how open science principles facilitate the sharing and accessibility of data, which can lead to innovative findings and a deeper comprehension of eclipse events. 

In addition, Dr. Korreck will discuss eclipse events and how they relate to the world of open science, with an introduction to NASA’s Heliophysics Big Year. This is a global celebration of science and the Sun’s influence on Earth and the entire solar system.

Register now to secure your spot. This event is open to the public and will include an interactive Q&A following the presentations.

Webinar contains imagery of eclipses.

Xarray: Empowering Scientific Data Analysis within the NASA community

Xarray is an open-source Python package that makes working with complex, multi-dimensional arrays elegant, intuitive, and efficient. Real-world datasets, such as those generated by NASA, are often a collection of many related variables on a common grid. These datasets are more than just arrays of values: they have labels which describe how array values map to locations in dimensions such as space and time and metadata that describes how the data was collected and processed. Xarray embraces the complexity of real-world datasets and enables users to use metadata such as dimension names and coordinate labels to easily analyze, manipulate, and visualize their datasets. Xarray makes data analysis more intuitive and enjoyable, while describing how data was collected and processed. 

A Vital Role in Handling NASA’s Evolving Data Demands

Consider for a moment that NASA’s Science Mission Directorate (SMD) collectively stores over 100 Petabytes (PB) of data and estimates doubling that to 200 PB per year within the next five years. Handling large amounts of data at scale is clearly an important consideration as the volume of data from modern sensors continues to grow. With that said, Xarray’s flexibility has played a pivotal role in NASA’s transition to cloud computing infrastructure, ensuring efficient and robust data processing for the agency’s vast repositories of information. Xarray is a common component in workflows involving NASA datasets across many domains, including physical oceanography, and glaciology.

In 2021, NASA selected Xarray as one of eight open–source projects for funding under the Open Source Tools, Frameworks, and Libraries program. This financial support has not only allowed the Xarray project to flourish, but also to expand itself for usage of NASA data through maintenance and outreach activities (see the full proposal announcement).

Committed support from NASA has been instrumental in allowing Xarray maintainers to make major progress on long-term goals such as reorganizing the code base for long term sustainability, substantially revamping the Xarray tutorial website, and spending time to implement new features that benefit a wide number of domains . NASA’s ongoing support has also allowed maintainers to spend time on day-to-day maintenance tasks and handle user support requests more quickly. Previously, such work was performed on a volunteer basis and hard to sustain. 

Xarray has used funds to help build its community through the SIParCS summer internship program at NCAR (blog), participate in conferences such as SciPy 2023, and host virtual office hours. Over the next year, the team is looking to get more involved with domain specific extensions for the needs of NASA’s remote sensing data through rioxarray, continue the office hours program, and represent Xarray at a number of additional conferences.

Xarray at SciPy 2023

Xarray at SciPy 2023! (top left) Deepak Cherian (NCAR), Scott Henderson (UW), Yuta Norden (U. Hawaii; 2023 SIParCS Intern), Maxwell Grover (Argonne) enjoying themselves at SciPy 2023. (bottom left) Participants “sprinting” and collaborating on Xarray and related projects. (right) Negin Sobhani (NCAR) delivers a part of the Xarray tutorial.

Thanks to NASA funding, Xarray was able to participate in SciPy 2023 in a significant way. The tutorial at SciPy 2023 was an exciting opportunity for scientists already familiar with Xarray to delve into advanced topics. The 2023 tutorial targeted intermediate-advanced level material and built on the fundamental level tutorial delivered at SciPy 2022. 

Tutorial participants reported they were able to streamline their workflows by using more of Xarray’s built-in functions after gaining insight into concepts that were initially intimidating, such as parallelizing computations on very large datasets.

The team had good turnout at Scipy “sprints”, where the Xarray community worked together with allied projects like Zarr to discuss and quickly solve problems. Emma Marshall, presented a great talk building on her 2022 SIPaRCS internship work with Xarray on how to organize tidy remote sensing datasets in a manner that facilitates easy analysis in the future.

An Open and Inclusive Community

In addition to impressive technical capabilities, one of Xarray’s greatest strengths is its vibrant and inclusive community.  Xarray has been publicly developed on Github since 2014 with over 270 contributors improving upon this project through open development practices. Thanks to these active GitHub contributions, conference tutorials and virtual office hours, Xarray has garnered interest from over 10,000 active users across various scientific disciplines.

NASA funding from the OSTFL program supports Xarray maintainers from historically underrepresented groups in the fields of Earth Science and open-source software development, demonstrating a commitment to inclusivity. Tutorials and virtual office hours increase the visibility of these individuals so they can serve as role models within their communities. Xarray places high value on a diverse group of users and contributors at all levels of software development expertise in order to improve the overall quality and accessibility of the software.

If you are interested in contributing your skills and enthusiasm to the Xarray project by  reporting bugs, improving documentation, suggesting enhancements, and sharing any other ideas visit the contributions page today. 

Xarray’s commitment to these principles of openness and inclusivity are in close alignment with NASA’s vision of open science. At NASA, 2023 is the year of open science, and one of the core ideas of open science is that by breaking down barriers and having scientists from diverse backgrounds engage with research, scientific discoveries will be accelerated. To aid in growing the open science community, NASA is developing a curriculum to train scientists, researchers, and citizen scientists to use open science tools, like Xarray, in their research. To learn more and to pre-enroll in the curriculum, visit the Transform to Open Science (TOPS) GitHub page.

Conclusion

With its powerful capabilities and inclusive community, Xarray is a compelling tool that will hopefully entice readers to explore its potential in their data science endeavors at NASA and beyond. Head over to their GitHub to explore the wealth of resources and the vibrant community that has made this project what it is today.

Explore Xarray’s Github

Unveiling the Power of Open Science for Indigenous Communities

Join us on August 10, 1 p.m. EST, for the NASA Transform to Open Science (TOPS) monthly forum,  “Unveiling the Power of Open Science for Indigenous Communities,” in honor of World Indigenous Peoples Day.  

Featuring speakers from NASA’s Indigenous People Initiative, SERVIR Amazonia, and Dr. Sierra Brown of Million Concepts LLC; a TOPS curriculum writer and member of the Shinnecock Nation, this forum will delve into the incredible ways open science is empowering indigenous communities. The speakers will also discuss how these communities benefit from open science practices and share insights into the remarkable work being done by these teams and the open science community at large. 

Open science plays a crucial role in empowering indigenous communities by providing them with access to scientific knowledge and data, as well as tools that can support their sustainable development and cultural preservation efforts. By sharing their unique perspectives, traditional knowledge and valuable insights, indigenous communities can contribute to scientific research and policy discussions, ensuring that their voices and perspectives are included and valued in decision-making processes.

Register now to secure your spot and participate in the enlightening Q&A session that follows the presentations. This event is open to the public, so spread the word and be a part of this transformative conversation.

Join Us Virtually for the “Accelerating the Adoption of Open Science” NASA/CERN Summit!

In celebration of the Year of Open Science initiative, NASA and CERN are hosting a week-long hybrid summit titled  “Accelerating the Adoption of Open Science,”  at CERN in Geneva, Switzerland, from July 10-14.  We invite you to participate virtually in this remarkable gathering where experts and stakeholders from the open science community gather to advance open science policies and practices.

The primary aim of the “Accelerating the Adoption of Open Science” summit is to foster collaboration, innovation, and the exchange of ideas among attendees. As a virtual participant, you will have the opportunity to listen to keynote presentations by speakers from NASA, CERN, and other renowned scientific institutions from around the world, while engaging in panel discussions on various aspects of open science. These talks will provide valuable insights into the importance of open science and its potential impact on the scientific community.

Event Highlights:

Keynote Presentations: Gain inspiration and knowledge as leading experts discuss the significance of open science and its implications for scientific research. 

Panel Discussions: Engage in panel sessions where experts will address challenges and opportunities in the adoption of open science. Gain valuable insights and perspectives on various aspects of open science, including data sharing, collaboration, and reproducibility. As a virtual participant, you can submit questions in advance or join live Q&A sessions.

To join the “Accelerating the Adoption of Open Science” summit virtually, visit the event website. Register today and be a part of the future of scientific research. Together, let’s accelerate the adoption of open science and shape the scientific landscape for years to come.