Open Science: Contributing to a Culture of Inclusivity for the Next Generation

By Kennedi White 

 To me, open science represents the easily available, worldwide access to data and science for future generations to expand on. Acknowledging that fact, it’s hard to separate open science from cultural inclusivity. 

Before I was introduced to NASA’s Transform to Open Science (TOPS) initiative, I attended a high school that had a scientific research program. Oftentimes conducting or replicating methodology proved to be difficult because we weren’t allowed access to needed information, partly because we were considered too young to engage in science. With open science, those barriers are dismantled so that anyone, regardless of their background, has access to scientific journals to find more literature on a topic, the datasets necessary to replicate any methodology, and access to instrumentation and remote computers. What ultimately cemented this belief in my head was seeing a Canadian high school student present her research at NASA’s Exploration Science Forum, which was achieved due to the resources and opportunities presented to her because of open science. Had the drive towards open science been available at that time, I’m sure younger Kennedi would have had a field day.   

From a collaborative standpoint, open science encourages a more diverse set of researchers to engage with and build upon existing knowledge. As a lover of creative writing, I’ve often found in literature that certain concepts about space and celestial bodies have been shared across a myriad of cultures at different periods of times; with some concepts and methodology for studying the stars coming long before Galileo. In the end, access to new scientific discoveries, especially as a means of recording them historically, is made easier with open science. A collaborative environment filled with people from different cultural, educational, and economic backgrounds will allow for new perspectives to drive us forward in meaningful ways.  

I believe open science promotes inclusivity by democratizing access to knowledge and fostering diverse participation. By encouraging global collaboration, engaging with communities and providing open educational resources, open science contributes to creating a more inclusive and equitable scientific culture for the next generation, one I am more than happy to be a part of. 

Kennedi White is an intern from Howard University, supporting NASA’s Jet Propulsion Laboratory’s Planetary and Space Sciences   

NASA’s Science Discovery Engine Transforms Science Accessibility

As a key component of NASA’s Open Source Science Initiative (OSSI), the Science Discovery Engine (SDE) offers a new, powerful discovery capability for all of NASA’s open science data and information. The SDE beta version launch was announced a little over a year ago at the 2022 American Geophysical Union conference. Since then, the SDE team has continued to improve the interface to enable more insightful, relevant search results and to provide an engaging user experience.

Building the SDE 

When NASA  announced a strategic focus on open-source science in 2018, one major recommendation from the scientific community was to develop an integrated search portal. Scientists advocated for an interface that would create simultaneous access to content from all of NASA’s five science topic areas: Astrophysics, Biological and Physical Science, Earth Science, Heliophysics, and Planetary Science.

A team of data scientists, subject matter experts, and developers began formulating a plan in early 2020 to meet this ambitious objective. Given the diversity, distribution, and vast size of NASA science data and information, many challenges emerged. 

For instance, each topic area uses its own metadata standards and vocabularies, making it difficult to create comprehensive and accurate metadata across disciplines. In addition, much of the content relevant to understanding and using data is dispersed across many websites and code repositories, making identification and curation of information sources a time-consuming task. 

Nevertheless, the SDE team has successfully corralled over 600,000 science documents, creating a pathway for researchers to pursue transformative, interdisciplinary science.

Fine-tuning the Engine

The SDE user interface is customized to meet the needs of the scientific community. Users can perform free-text searches and apply text-based facets to refine results by their chosen parameters. Users can also filter by information type such as data, images, or documentation. Dataset landing pages are standardized to ensure a consistent, cohesive user experience.

A primary goal of SDE is making the scientific research process more efficient by helping scientists more quickly find and gain access to necessary data and information. To that end, the SDE has served as a pathfinder for adopting and operating an emerging technology: an insight engine. Insight engine software applies relevancy methods to discover, analyze, describe, and organize content and data from diverse sources. An advantage that insight engines offer over traditional search engines is the ability to incorporate natural language processing and machine learning. Infusing search processes with these artificial intelligence techniques helps calibrate retrievals with context enrichment, providing users with more accurate and relevant results.

In early January, the SDE team rolled out an updated user interface to facilitate even greater ease and expediency in search processes. Some of the improvements include additional filtering options that allow users to search within a single science topic area such as planetary science or heliophysics and a new acronym search feature. With these upgrades and more, the SDE is promoting research efficiency through a commitment to open science principles.

Looking to the Future

In the coming months, the team will complete the initial round of curating NASA science content and incorporating it into the SDE. In addition, the SDE team is also planning a full roll-out of the tool in the fall of 2024. Finally, the SDE is beginning to prototype tailored search applications that meet the needs of individual scientific audiences. For example, a specialized search interface designed for users who are interested in environmental justice information is already in development.

The SDE team is also testing advanced search techniques made possible through emerging technologies like large language models (LLMs). These models are changing the way search is conducted, and search capabilities like the SDE will need to adapt to enable both conversational and keyword search in the future. The SDE team is also interested in exploring how LLMs can enhance the efficiency and accuracy of curation workflows.

From its inception, the SDE has been a beacon on the path to open science, illuminating the opportunities that emerge when science information is fundamentally accessible to all. This comprehensive, nimble search interface will continue to impress and inspire as researchers explore how to maximize scientific progress through information discovery and collaboration.

NASA Science Explorer (SciX) Accelerates Open Science Discovery

The launch of the NASA Science Explorer (SciX) at the upcoming American Geophysical Union conference marks a significant leap in open science discovery. This new literature portal, an expansion of the NASA Astrophysics Data System (ADS), is set to revolutionize how science is found, accessed, and utilized. The beta release, accessible at SciXplorer.org starting December 11, is a testament to NASA’s commitment to open science.

The Origin of SciX

Back in 2019, NASA’s Science Mission Directorate envisioned an interdisciplinary literature portal that would span across its divisions (Earth Science, Planetary Science, Astrophysics, Heliophysics, and Biological and Physical Sciences), bolstering open science. The ADS, with its decades long support for open science goals facilitating discovery and dissemination of open access publications, data, and software by aggregating and linking them, was the natural choice for this expansion. This initiative was more than just creating a repository; it was about weaving a digital tapestry that connects publications, data, and software in a meaningful and accessible way.

What Does SciX Offer?

SciX stands out as more than just a digital library; it’s a purposefully designed portal tailored for the NASA community’s diverse requirements.What sets SciX apart is its ability to aggregate research content relevant to all SMD divisions, creating a unified platform for exploring scientific literature. SciX provides authoritative coverage of the research literature in Earth Science, Planetary Science and Heliophysics in addition to its core collection of astrophysics material inherited from ADS. It goes beyond mere aggregation by capturing the unique nuances and semantics of various scientific disciplines through intelligent use of relevant taxonomies. This approach significantly enriches the user search experience, allowing for more precise and relevant results.

Moreover, SciX excels in creating meaningful connections. It links research papers to an extensive array of related resources, such as datasets, software, and notebooks. This integration provides a comprehensive view of research work, offering users a more holistic understanding of scientific studies. Tailoring its capabilities and analytic services, SciX caters to the specific needs of different research communities within NASA, demonstrating its flexibility and user-centric approach.

The real power of SciX lies in how these features empower its users. With diverse search options that include topics, people, organizations, or objects, SciX offers versatile and comprehensive search pathways. Users can gain rich insights into each paper, exploring metrics, collaborations, and topics to get a thorough understanding of the research landscape. Additionally, SciX champions the principles of open science. It facilitates access to open-access versions of papers and enriches the research experience by providing citations, co-readership, similar papers, and metrics. The platform goes a step further by linking to related software and datasets, seamlessly connecting all aspects of NASA Science. This holistic approach makes SciX an invaluable tool for researchers, enhancing accessibility, and fostering a deeper engagement with scientific content.

The Future of SciX

SciX is set to become more than a portal; it’s evolving into a community cornerstone for NASA scientists and the broader research community. Its open, trustworthy, complete, innovative, and interdisciplinary nature sets it apart, making it not just a tool but a partner in scientific discovery. Developed by scientists for scientists, SciX exemplifies what it means to be a part of the NASA community: it’s about exploring the unknown, connecting dots across disciplines, and sharing knowledge openly and freely.

As SciX continues to develop, its impact on the scientific community will only grow. It’s not just about accessing information; it’s about engaging with it in ways that propel research forward. SciX is a gift to open science, a resource that underscores NASA’s dedication to fostering an environment where knowledge is accessible, connections are intuitive, and discoveries are shared for the greater good.

SciX is more than just a digital library; it’s a dynamic, evolving platform that reflects NASA’s ongoing commitment to open science. By bridging gaps, connecting diverse resources, and offering tailored tools, SciX is poised to become an indispensable asset for researchers, echoing the spirit of exploration and discovery that defines NASA.

Elevating Science Communication: Build a Better Poster Workshop

Join NASA’s Transform to Open Science (TOPS) team for a Special Topic Webinar on November 9 at 1 p.m. EST with Dr. Mike Morrison, founder of #BetterPoster Movement. Dr. Morrison has a mission to refine how scientists communicate their work. He will lead an engaging virtual session with a distinctive approach to science communication using evidence-based design principles.

Throughout the one-hour workshop, Dr. Morrison will share live demonstrations, interactive games, and graphics designed to make a lasting impact and change your approach to science communication. This particular focus on presentation slides and posters will have you ready for the winter conference circuit.

If you are curious about what makes his presentation style unique, past attendees describe it as providing valuable insights for improving scientific communication. Get ready to acquire new knowledge, broaden your perspective, and leave with a deeper understanding of how to engage your audience.

Renowned for his YouTube series, #BetterPoster, which introduced innovative design techniques for scientific research presentations, Dr. Morrison’s work has influenced the style of thousands of scientific presentations.

In addition to his work with the #BetterPoster Movement, Dr. Morrison serves as the Lead User Experience Designer at Curvenote, a scientific software startup, where he is at the forefront of reshaping the authoring and publishing of scientific journal articles. His dedication to enhancing scientific communication is evident, and his vision for the future is inspiring. We are looking forward to your participation in this educational journey with us. 

Register now to secure your spot. This event is open to the public and will include an interactive Q&A session following the presentation.

Embracing Celestial Insights: Exploring Eclipses, Open Science, and Heliophysics Big Year

You are invited to our upcoming NASA Transform to Open Science (TOPS) Community Forum on September 14 at 1 p.m. ET for a webinar on eclipse events and their connection to the world of open science. We will discuss how eclipse research can contribute to collective research efforts, leading to innovative findings and a deeper comprehension of these phenomena. 

We are excited to have two distinguished speakers, Mitzi Adams, assistant manager of the Heliophysics and Planetary Science Branch at NASA’s Marshall Space Flight Center and Dr. Kelly Korreck, program manager for the 2023 and 2024 Solar Eclipses and program scientist for the Heliophysics Division in the Science Mission Directorate at NASA HQ, will share their expertise and insights.

Adams will explore the data-driven side of eclipses and how open science principles facilitate the sharing and accessibility of data, which can lead to innovative findings and a deeper comprehension of eclipse events. 

In addition, Dr. Korreck will discuss eclipse events and how they relate to the world of open science, with an introduction to NASA’s Heliophysics Big Year. This is a global celebration of science and the Sun’s influence on Earth and the entire solar system.

Register now to secure your spot. This event is open to the public and will include an interactive Q&A following the presentations.

Webinar contains imagery of eclipses.

Xarray: Empowering Scientific Data Analysis within the NASA community

Xarray is an open-source Python package that makes working with complex, multi-dimensional arrays elegant, intuitive, and efficient. Real-world datasets, such as those generated by NASA, are often a collection of many related variables on a common grid. These datasets are more than just arrays of values: they have labels which describe how array values map to locations in dimensions such as space and time and metadata that describes how the data was collected and processed. Xarray embraces the complexity of real-world datasets and enables users to use metadata such as dimension names and coordinate labels to easily analyze, manipulate, and visualize their datasets. Xarray makes data analysis more intuitive and enjoyable, while describing how data was collected and processed. 

A Vital Role in Handling NASA’s Evolving Data Demands

Consider for a moment that NASA’s Science Mission Directorate (SMD) collectively stores over 100 Petabytes (PB) of data and estimates doubling that to 200 PB per year within the next five years. Handling large amounts of data at scale is clearly an important consideration as the volume of data from modern sensors continues to grow. With that said, Xarray’s flexibility has played a pivotal role in NASA’s transition to cloud computing infrastructure, ensuring efficient and robust data processing for the agency’s vast repositories of information. Xarray is a common component in workflows involving NASA datasets across many domains, including physical oceanography, and glaciology.

In 2021, NASA selected Xarray as one of eight open–source projects for funding under the Open Source Tools, Frameworks, and Libraries program. This financial support has not only allowed the Xarray project to flourish, but also to expand itself for usage of NASA data through maintenance and outreach activities (see the full proposal announcement).

Committed support from NASA has been instrumental in allowing Xarray maintainers to make major progress on long-term goals such as reorganizing the code base for long term sustainability, substantially revamping the Xarray tutorial website, and spending time to implement new features that benefit a wide number of domains . NASA’s ongoing support has also allowed maintainers to spend time on day-to-day maintenance tasks and handle user support requests more quickly. Previously, such work was performed on a volunteer basis and hard to sustain. 

Xarray has used funds to help build its community through the SIParCS summer internship program at NCAR (blog), participate in conferences such as SciPy 2023, and host virtual office hours. Over the next year, the team is looking to get more involved with domain specific extensions for the needs of NASA’s remote sensing data through rioxarray, continue the office hours program, and represent Xarray at a number of additional conferences.

Xarray at SciPy 2023

Xarray at SciPy 2023! (top left) Deepak Cherian (NCAR), Scott Henderson (UW), Yuta Norden (U. Hawaii; 2023 SIParCS Intern), Maxwell Grover (Argonne) enjoying themselves at SciPy 2023. (bottom left) Participants “sprinting” and collaborating on Xarray and related projects. (right) Negin Sobhani (NCAR) delivers a part of the Xarray tutorial.

Thanks to NASA funding, Xarray was able to participate in SciPy 2023 in a significant way. The tutorial at SciPy 2023 was an exciting opportunity for scientists already familiar with Xarray to delve into advanced topics. The 2023 tutorial targeted intermediate-advanced level material and built on the fundamental level tutorial delivered at SciPy 2022. 

Tutorial participants reported they were able to streamline their workflows by using more of Xarray’s built-in functions after gaining insight into concepts that were initially intimidating, such as parallelizing computations on very large datasets.

The team had good turnout at Scipy “sprints”, where the Xarray community worked together with allied projects like Zarr to discuss and quickly solve problems. Emma Marshall, presented a great talk building on her 2022 SIPaRCS internship work with Xarray on how to organize tidy remote sensing datasets in a manner that facilitates easy analysis in the future.

An Open and Inclusive Community

In addition to impressive technical capabilities, one of Xarray’s greatest strengths is its vibrant and inclusive community.  Xarray has been publicly developed on Github since 2014 with over 270 contributors improving upon this project through open development practices. Thanks to these active GitHub contributions, conference tutorials and virtual office hours, Xarray has garnered interest from over 10,000 active users across various scientific disciplines.

NASA funding from the OSTFL program supports Xarray maintainers from historically underrepresented groups in the fields of Earth Science and open-source software development, demonstrating a commitment to inclusivity. Tutorials and virtual office hours increase the visibility of these individuals so they can serve as role models within their communities. Xarray places high value on a diverse group of users and contributors at all levels of software development expertise in order to improve the overall quality and accessibility of the software.

If you are interested in contributing your skills and enthusiasm to the Xarray project by  reporting bugs, improving documentation, suggesting enhancements, and sharing any other ideas visit the contributions page today. 

Xarray’s commitment to these principles of openness and inclusivity are in close alignment with NASA’s vision of open science. At NASA, 2023 is the year of open science, and one of the core ideas of open science is that by breaking down barriers and having scientists from diverse backgrounds engage with research, scientific discoveries will be accelerated. To aid in growing the open science community, NASA is developing a curriculum to train scientists, researchers, and citizen scientists to use open science tools, like Xarray, in their research. To learn more and to pre-enroll in the curriculum, visit the Transform to Open Science (TOPS) GitHub page.

Conclusion

With its powerful capabilities and inclusive community, Xarray is a compelling tool that will hopefully entice readers to explore its potential in their data science endeavors at NASA and beyond. Head over to their GitHub to explore the wealth of resources and the vibrant community that has made this project what it is today.

Explore Xarray’s Github

TOPS Community Panel – June 14-16, 2023

TOPS logoThe June TOPS Monthly Community Forum will be integrated into the bi-annual Community Panel, held from June 14 – June 16 in a hybrid format broadcasted live from NASA HQ in Washington, D.C.

The TOPS Community Panel reviews and provides input on NASA’s strategy for transitioning to open-source science. The panel meeting will bring together leaders from the open science, open source software, and data science communities with the NASA TOPS team for a detailed review of TOPS plans. The meeting will be public and have tools for the public to submit questions.

We want to hear from you!
During the panel, we’ll dedicate an entire hour to answering your questions live on air. We invite you to register to attend virtually and submit your questions during the event, using our IO tool.

To participate in the community panel, please register here

Where: Public (virtual) meeting
When: 14-16 June 2023, 12-4 EST (9-11 PST), each day. (Full Agenda)
Questions? Submit questions before and during panel using our IO tool here
Discussion topics? Start a discussion on GitHub

Success Stories of Open Science Series: With a large network of people, you can do a lot of fun things in science: Q&A with Dr. Afshin Beheshti on open science practices


Dr. Afshin Beheshti is a bioinformatician and principal investigator at KBR at NASA’s Ames Research Center. With a background in physics, he works on space biology projects related to microRNA (miRNA) and mitochondrial changes. However, he refuses to be a one-dimensional researcher, whichled him to work on various topics including COVID-19/SARS-CoV-2, cancer, high altitude impact on biology, and traumatic brain injury. He leads the Multi-Omics Analysis Working Group (AWG) at NASA’s Gene Lab, where everyone collaborates to develop guidelines and ways to process and analyze Big Data. Through this model, his team discovered that mitochondrial dysfunction, when the mitochondria do not work as well as they should, is a key biological issue during spaceflight. In addition, he is the president of a nonprofit called COVID-19 International Research Team (COV-IRT).

What is your definition of open science?
Open science is open data and open community. First, open science is when data are free and made publicly available to the community and the public. For example, if I generate a dataset for a National Institute of Health (NIH)-funded project, it should become public because the taxpayers paid for this—it’s not like a private company. Even if you can’t publish the results, you should still make it public after a period of time so that someone else will be able to come up with some useful analysis; you’re wasting taxpayers’ money if you don’t share the data and use it to its full potential. And sharing is not just about the data; it could be tissues from animal experiments—anything to share with the public.

Open science is also the idea of the scientific community coming together—sharing ideas and working together. I think that’s the main part of open science that people in biology don’t do as much because they are afraid of backstabbing. A lot can be done when people come together and share everything, including grants. You’re not taking away from someone else. I would consider open science as “the socialism of science.” Some people think socialism is a bad word, but it’s actually a good word. COV-IRT is a nonprofit with $0 in its bank account, but we all shared our resources, which include effort and resources that allowed us as a group to do amazing work.

So, open science is that whole grassroots idea of everyone coming together. You can’t just have open data without people sharing or having an open community. If no one shares anything, you need to continually reinvent the wheel—it will be a waste of time. Without an open mindset, open data will go nowhere.

Why do you practice open science?
Actually, I’ve always done open science coming from physics. High energy physics involves billions of dollars, with hundreds of people working on experiments with synchrotrons or high energy colliders. Usually, you can find a list of 200 authors on a paper. There’s no last author or the senior author—everyone knows what they’re doing and gets credit for what they do. That’s the whole physics mentality.

What steps are you taking to accelerate open science?
Since I work at KBR at NASA’s Ames Research Center remotely, I don’t have a traditional lab situation. I set up a structure where I have a community of co-investigators and collaborators with the lab infrastructure. From my grants, we account for costs and resources to be shared and give credit to everyone who has worked on the projects. For my experiments, I typically have lots of biological tissues in my freezers that I originally did not intend to use for my grants, but I save them so I can share them with others with minimal red tape so that they don’t have to pay for those experiments. Once they get the tissues, they work on their own dime, but we all get credit—whoever helped with those experiments.  Such things expedite science. That’s how collaborations are built. The whole “my lab” mentality should be tossed out—it’s outdated.

What challenges have you faced while practicing open science and open scholarship? What strategies did you use to overcome these challenges?
The biggest challenge is people with closed mindsets. Some people never change their minds. Just show by example. Once you start collaborating with people, you see how well it works. You realize that no one’s backstabbing anyone or stealing data from anyone because everyone starts trusting each other. With my colleagues at COV-IRT, we’re putting out a lot of nice papers together in high-impact journals. I have people who are not as active over a period of time but will start participating more after seeing our results. I always say yes to people participating—if you want to change your mind, you can come and play with us. I’m sure some people might take advantage of it. But I don’t cut them out completely. I just make it completely transparent to everyone involved. If you lose one person who doesn’t want to play with the open science rules, that’s fine—it’s their loss. That person just lost the entire community because everyone knew what was happening.

Funding is another hurdle, but that’s not been the main issue because people usually share resources within the larger community. Eventually, we all apply for grants together and get funding together. The more you do that, the more funding opportunities come. But, a lot of governmental red tape slows down open science. Of course, you don’t want to break any rules that are set for good reason. You still have to protect yourself because not everyone’s always as open or honest, even though you do open science.

Have you made mistakes while practicing open science? How would you address them differently if you were to do it again?
For any kind of research, you’re going to make mistakes. If you don’t make mistakes, you don’t learn. I think sometimes you can be too open. For example, if you pitch your idea to 200 people, they start chiming in. And, some people may misconstrue what you might have meant to do. Then, the project may seem to them as something you didn’t actually propose. Those people jump ship without actually discussing what the issues are and clarifying the misunderstanding for the final product. To avoid a situation like that, you want a subset of 10-15 people before passing important things by the bigger group. I think that’s probably some of the biggest mistakes I made because I would be too eager to share my ideas.

Another thing is to note that some people might have nefarious reasons to take full credit for your data. Most people who you give tissues to are very grateful. They’re excited, and we’re excited. But some people take those ideas, especially if you don’t publish them. I heard a story about a professor who published a paper using open data that was shared in a group years ago when they were a postdoc. But, in their paper, there’s no reference to the person who originally shared the data. So, this professor had other goals to advance their own career and basically took advantage of the open science platform. Mistakes like that don’t really hinder your open science, but it does make people more cautious about open science. People do that, unfortunately. The silver lining is that in situations like that, people in the open science community are aware of the truth.

So, I try to deploy Confidential Disclosure Agreements (CDA) or Non-Disclosure Agreements (NDA)to prevent all that. Also, publish early. Don’t sit on it for 10 years. Your papers are your key to showing that you’ve done it first. When you submit a paper, put a preprint out, too. People even do that before they submit it to any journal, just so that the preprint is out there. Sometimes, the preprint never becomes a peer-reviewed paper. But it’s still documented online to show people that you did it. Whether it was peer-reviewed or not doesn’t matter.

How has open science improved your research? Are there other benefits you have experienced from practicing open science?
The more you’re known, the more your network grows. The more you do open science, the more people want to collaborate with you. They know that’s the guy that gives tissues away for free or has 50 people on their papers and supports students. It’s like that little light with the flies coming in. It’s kind of like a beacon. With a large network of people, by default, you can do a lot of fun things in science. You get more publications. People promote different research ideas and build a nice comprehensive story together. All the pieces of the puzzle start to come together. You start learning the whole story that connects to everything. It’s not just factor X. What about everything around that which you’ve ignored? When you’re isolated, you never see that. Open science makes science more interesting.

What would you say to early career researchers who want to practice open science?
For students, leave the advisor if they are not doing open science. I know it’s easier said than done—when you’re a student, it’s tough for you. So, the way to get around that is to talk to other labs and other graduate students in other departments as soon as you can. Talk to the people outside of that lab, and you can get the real stories.

For young faculty, don’t pigeonhole to just one subject. Think broadly so that you can apply your ideas to many different fields. You will eventually find people with the same mindset. And join communities. If you want to get a lot of publications, you should collaborate. I’ve had one publication at least once a month, if not more than one. And there’s more in the pipeline pending right now. Of course, you could be stuck in a place that doesn’t promote open science. Then, just leave that place—it should become so dry that they’ll have to open up their rules and their old mindsets. There are plenty of people out there who have an open science mindset. But you have to find them. If there’s a will, there’s always a way. Again, joining big networks opens up opportunities. You could approach people and say you’re considering leaving the current institution. You could ask if people know about any positions in other institutions.

What are some of your favorite open science tools or resources that you’d like to share?
There’s a whole bunch of them. But, for citizen scientists, NASA’s Gene Lab provides visualization tools. Even someone who doesn’t have any data skills can play around with raw data. The National Cancer Institute also has an interactive data portal where you can access the data from their Cancer Genome Atlas (TCGA) program even if you’re a non-scientist. Another cancer-related data website is cBioPortal. Cancer researchers have thought about how to make this friendly for people for many years. And the COVID Data Tracker at the Centers for Disease Control and Prevention (CDC) makes data easily accessible. GISAID is also a nice tool that a non-scientist could use for different kinds of viruses.

When it comes to tools, the bioinformatic community is all about open science because no one wants to pay for the proprietary software because they’re expensive unless your university or your institution has a license to pay for them or you have grant money. These people are smart enough that they could create their own software and algorithms and then make them available. It might not be as good as the expensive software, but it’s still good enough. And you could do a lot with it. Cytoscape is a free tool funded by an NIH project that you can download on your computer and plot networks. You can find examples on the homepage. It’s a nice tool because you don’t need to worry about programming. You can enter data and make neat networks out of it. There’s another tool called STRING, which you can use to look at protein interactions. These are just two examples of many.

Lastly, is there anything you would like to share regarding open science?
I think more people should do open science. Join my groups! I’ve been doing open science before it was a fad. I’m more than happy to collaborate with anyone from different creeds of life. As long as you are breathing and have a good idea, why not?

By Steffie S. Kim [Twitter]
Digital Marketing Intern at NASA Transform to Open Science
*Session Mentor: Cynthia R. Hall

Success Stories of Open Science Series: Q&A with Dr. Leo Singer on open science practices

“Get involved as early as possible and contribute to others’ projects.” – Dr. Leo Singer

Dr. Leo Singer has been a research astrophysicist at NASA’s Goddard Space Flight Center since 2015, working on the General Coordinates Network(GCN)—NASA’s next-generation time domain and multi messenger astronomy alert system. With a background in gravitational waves, his research projects concern ground-based optical telescopes and multi wavelength follow-up. He also works on real-time analysis of the Laser Interferometer Gravitational-Wave Observatory (LIGO) data, signal processing, Bayesian inference, and synoptic optical transient surveys. He is actively participating in the Astropy Project, a community effort to develop Python packages for astronomy. He co-authored and co-maintains LIGO’s open source public alerts pipeline and science outreach material for astronomers, the LIGO/Virgo/KAGRA Public Alerts User Guide.

What is your definition of open science?
When dealing with data, software, and publications, reproducibility and having complete descriptions of the experiment or the analysis you’ve done are important so that someone else can repeat it. That’s what open science is all about. My whole career has been dedicated to more openness because of the way many NASA missions operate. When the Neil Gehrels Swift Observatory launched, its data policy was revolutionary—it had no proprietary period. All the data was public. The science team at NASA didn’t have any privileged access to the data that the public didn’t have. And that’s been a role model. All of the projects that I’ve worked on have been open-source. If someone doesn’t release their code, you have to wonder what bugs they are hiding. In astronomy, there are proprietary periods for data, but the data eventually becomes public. If the data or code isn’t ultimately public, I’m not sure it’s science because it may not be reproducible. You need to ensure that there is enough of the derivation there that someone else who comes along can follow it. Also, whatever code you’ve developed to write a paper should be released along with the paper. I think that packaging and distributing software is almost as important a tool for scientists as being able to write manuscripts in LaTeX.

What are the steps you are taking to accelerate open science?
When I was a graduate student, I wanted to use several software packages that weren’t easy to install on the computing cluster. So, I volunteered with the MacPorts Project and Debian to help package some open-source astronomy software for various distributions. I also started using NumPy, Matplotlib, and SciPy so intensely that occasionally I’d find a bug or a faster way to do things. I started contributing little fixes and pull requests. One of the projects I worked on as a graduate student was a gravitational wave signal processing pipeline that used GStreamer. Because this software had not been designed for science applications, there were types of filters that needed a little bit more improvement. So I contributed some code to Gstreamer to improve the signal processing. That was my first significant open-source contribution to a really general-purpose project rather than a project that applied narrowly to my own research. And, around the time that I came to NASA’s Goddard Space Flight Center, I got involved in the Astropy Project because I think that they are writing some of the highest quality astronomy software out there. Also, they’re just fantastic people to work with and to learn from.

How has open science improved your research? Are there other benefits you have experienced from practicing open science?
Most of my papers have a source code component, and people come across packages I’ve developed and cite those papers. I kind of pride myself on how I use my open-source contributions to promote my papers and get citations, which is one of the measures by which people get promotions. People don’t get promotions because of the software they develop—they get promotions based on how well-cited their journal articles are. Same thing for meeting collaborators. I’ll encounter people at conferences, and they’ll say, “Oh, I’ve heard your name.” They’ve heard my name because they used my software. So it’s a great way to make connections with people. Also, I think a lot of soft skills are involved in crafting open-source contributions so that a stranger can understand, review, and accept them. Learning how to write code for someone else’s project in a way that it’s easy to understand and having technical discussions with strangers about code makes it easier for me to affect the changes I want to make. Through years and years of practice with submitting pull requests and working with people on GitHub, I’m usually pretty confident that when I find a bug in Astropy, NumPy or Matplotlib, I can fix it. And the fix will likely go into production and benefit everyone. It’s just so empowering.

What challenges have you faced while practicing open science? What strategies did you use to overcome these challenges?
My biggest challenge is how to transfer expertise to first-time contributors. I often have some idea of what the final product might look like. But, the skill set of the contributor might not be where they can produce that final product. This is one of the difficult things about developing open-source software. It does the first-time contributor no good if the maintainer tells them exactly what to do. You have to teach best practices without taking the agency away from the contributor—you have to do it in a way that’s respectful and helps that person grow. Also, anyone in science who deals with software eventually needs to become very proficient with Git. Git is hard to learn—it’s hard to get to the point where you can solve problems with it on your own. It’s a fundamental skill for working on any project moving with some substantial velocity. It’s just a complex skill to teach, and it takes time for these skills to percolate through the community.

What would you say to early career researchers who want to practice open science?
I think prospective graduate students selecting a research group to work for should ask these questions: What is their open-source output? Does this lab have relevant software skill sets, or are they 20 years behind the times? Do they put their code on GitHub? Do they publish code along with their papers? Are the data and code produced by this group having an impact? If the answer is no, you’re unlikely to learn open science practices by working in that group. But also, these questions require a lot of awareness and savviness on the student’s part. Not everyone has enough information to make informed decisions. So I think this is mainly on the senior people—they need to know how to promote open science and develop a recruitment pipeline that keeps their group’s software and data skills current.

Other than that, if you’re lucky enough to work in a group that values open science, my next suggestion would be: Don’t assume that the tools you use are immutable. In astronomy, at least, almost everyone uses NumPy, SciPy, and Astropy. Those are all open-source packages. They are among the best projects in terms of the quality of the implementation, but their developer communities are also very welcoming. For example, it can be as simple as contributing a pull request if you notice a typo in the NumPy documentation. Get involved as early as possible because it takes years to get really good at contributing to other people’s projects.

What are the most urgent things that should be addressed in the field to accelerate open science further?
As a NASA civil servant, the most urgent thing is to reform NASA Procedural Requirements (NPR) 2210, abolish the NASA Open Source Agreement and NASA Contributor License Agreements, and make it clear that NPR 7150 does not apply to science software intended mainly for the public. Then, I can do science for open data and open software much more effectively.

What are some of your favorite open science tools or resources that you’d like to share with us?
I recommend Docker, GitHub, Astropy, and Codecov. I’m also a huge Visual Studio Code fan.

Lastly, is there anything you would like to add regarding open science?
My colleagues and I at NASA’s Goddard Space Flight Center are working on this new science data portal called General Coordinates Network. This is a system that takes gamma rays and other high-energy transients that are detected by space missions, physics experiments, and observatories around the world. And it sends real-time alerts publicly to a community of thousands of astronomers worldwide, who follow up on the sources we’ve detected from space using their telescopes on the ground. And then, they publish these astronomical bulletins called Circulars to share their observations. This system has been going on since the 90s. But, we’re modernizing this and converting it from its antiquated network protocol stack to Apache Kafka, a modern streaming framework for data distribution and analysis. We’re doing a soft launch right now, and I’m really excited about this. Our client software is open-source, but our website is also open-source, which I’m quite proud of.

 

Success Stories of Open Science Series: Equitable science for all: Q&A with Dr. Flavio Azevedo on open science practices

Dr. Flavio Azevedo is a political psychologist and an associate researcher at the University of Cambridge. Primarily, his research investigates the role of ideology and identity in supporting policies that perpetuate social andeconomic injustices. Having come from a low socioeconomic status in Brazil, he fell in love with the promise of academia as a great equalizer. This passion has led him to co-found and direct the Framework for Open and Reproducible Research Training (FORRT), an award-winning interdisciplinary community of almost 500 early-career scholars, aiming to integrate open scholarship principles into higher education and advance research transparency, reproducibility, rigor, and ethics through pedagogical reform and meta-science. In addition, he is participating in the NASA TOPS’ effort to build the ScienceCore curriculum.

What is your definition of open science?
While definitions vary, there are usually six components to open science: open methodology, open data, open access, open peer review, open education, and open collaboration. But, for me, open science is about social justice—the inclusion of disenfranchised folks and building a coalition. It paves the way toward a more transparent, rigorous, robust, inclusive, diverse, accessible, and equitable science.

Essentially, open science means better science for all, including people of low and middle-income countries. Currently, there is a huge gap between how science benefits the Global North and Global South. For a rather recent example, we witnessed this during the pandemic, especially with the distribution of the COVID-19 vaccines, but examples abound across time and space. It is about breaking down barriers in accessing science, not only its outputs but its many benefits. It is about breaking that glass ceiling that pervades everything in science and research: That veneer of exclusiveness, of a selected club or that of the genius scholar, that only the best and brightest can ever dare to dream of becoming a researcher and succeeding in it. Taken together, open science is about social justice, and I practice it to leave behind a world that’s a tiny bit better than the one I found. So, I think open science is science done right.

At FORRT, we use the phrase “open scholarship” instead of open science. Folks talk about open scholarship and open science as synonyms. But I think open scholarship is more of a redefinition and reframing. Open scholarship is more inclusive in that it extends open science to all knowledge systems, including those not traditionally identified as science. Open scholarship also includes mentoring, teaching, and producing educational materials. And open scholarship particularly makes explicit the importance of inclusion, diversity, equity, and accessibility as necessary conditions for improving the way we practice science. This is very important to me because open scholarship becomes a tool to develop strategies for addressing structural disadvantages faced by minoritized groups. So, open scholarship is a way of looking at open science from a more humanistic view.

What steps are you taking to accelerate open science and open scholarship?
As the director of FORRT, I try to steer the organization’s ethos alongside other like-minded folks to achieve three things. The first is to respond to calls to consider open scholarship as inclusive scholarship. The second is to raise awareness of the pedagogical implications of open science/open scholarship and its associated challenges (i.e., curricular reform, epistemological uncertainty, methods of education, epistemological pluralism). The third is to reframe the “methodological reform” debate in academia as an opportunity to consider individual and systemic factors when evaluating research and the norms that sustain it.

Let me give you a few examples. We believe there is a need to reform how we teach and mentor our students. In social sciences, we rely too much on teaching the facts of science instead of focusing almost exclusively on the processes by which a given knowledge was acquired. In light of the reproducibility crisis (or credibility revolution, to put a positive spin on it), it is likely better for students if higher education focused more on scientific literacy (i.e., what does robust research look like?) and less on the facts of science. Another idea to consider is epistemological pluralism. There are a plethora of methods for acquiring knowledge; quantitative methods are not the only way. There are also qualitative ways. We need to recognize that there are multiple ways to go about knowledge acquisition and accumulation.

Aside from the initiatives at FORRT, as an independent researcher, I also conducted research using preregistrations and registered reports. I am part of several open science organizations, such as the Center for Open Science (COS) as an ambassador, Psychological Science Accelerator as a funding and finance committee member and researcher, and Berkeley Initiative for Transparency in the Social Sciences as a catalyst for open science. I also participated in big-team science like COS’ Systematizing Confidence in Open Research and Evidence (SCORE)International Consortium for Social and Moral Psychology, the Crowdsourced Replication Initiative, and more recently at ManyLabs on Climate Change attitudes and Trust in Science. I think my approach to open scholarship is not only FORRT or reproducing things but also “interrogating the questions we pose,” as my dear colleague and feminist scholar, Madeleine Pownall, says.

What challenges have you faced while practicing open science or open scholarship? What strategies did you use to overcome these challenges?
I have faced a lack of funding, both for my research and for producing open educational resources as the director of FORRT. We have folks who provide unpaid contributions to the public good. We are constantly seeking institutional and financial support, but, unfortunately, we are often faced with a scarcity mindset across several academic and funding institutions. This is a big challenge in big-team science, especially when it comes to working with volunteer-based communities.

The other issue has to do with equitable authorship and credit. Credit is usually given in an unfair or non-credible way. In big-team science, for example, you see research projects where only Americans are the leaders, and they get most of the credit. And then, authorship becomes embedded into the current system of incentives and not a tool for mitigating inequalities in research, education, and funding schemes. Even in open science, there are not enough initiatives that try to include the Global South and low-income countries, or even minoritized groups in the U.S. You see the same patterns when it comes to credit-giving—it’s not equitable. I find that troubling, and it is a challenge for me as an immigrant and a non-native English speaker. My work is devalued based on my identity. (A lot of the identities that we have are not internal. They come from external stereotypes.) Thankfully, there are at least two newly founded organizations aiming to bring light to these issues: Advancing Big-team Reproducible science with Increased Representation (ABRIR), led by Dr. Nadia Corral-Frías and colleagues, and NowhereLab, led by Dr. Priya Silverstein and colleagues.

At FORRT, we try to use a contributorship model based on the CRediT system as a means to document fairly every contribution to our research outputs. FORRT projects also try to include everybody by being proactive in recruiting contributors from the Global South and minoritized groups. Some of these contributors, depending on their conditions, might not be able to contribute a lot but we still appreciate the time that they can give to our projects. Using this inclusive model, FORRT produced over 15 large-scale Open Educational Resources (OER) and several peer-reviewed publications, including a paper in Nature Human Behavior with 112 authors, where we defined over 250 Open Science terms (see OA postprint here). While we have papers with few authors, the majority have close to 50 authors. I’m not saying that all papers should have that many authors. However, often in big-team science and regular science as well, there’s lots of invisible work that is not credited, and we’re tracking and giving credit to every meaningful contributor because big and meaningful educational and scientific projects need contributions from a lot of folks and it is important (and just!) to recognize it.

Can you elaborate on the credit-giving system you are using at FORRT?
We usually use Tenzing, which is essentially a Google Sheets document where everybody can enter their contact information and their contributor roles. There are 14 types of contributions as per the CRediT system—sometimes we include a few more—which tracks how much people have participated in a particular project. For example, we track idea formulation, conceptualization, or project administration in each column. It’s a very nifty tool that enables you to give credit for what people did.

Another thing I want to share is how we went about our work that aimed to bridge neurodiversity and open scholarship (see page 86 in this hyperlinked citation and the figure below). What we essentially tried to do here was to find a different way to think of authorship. The authorship for this paper was not decided based on the authors’ contributions but rather on their privileges. We put the folks who are least privileged first, as this brings the most prestige in academia. We adopted a critical lens about how we give authorship because, often, there is a privilege in your ability to contribute—you need to have had successive opportunities, time, and funding, for example. So, this is a way to provide an alternative way to give credit, contingent upon one’s privileges. Maybe this is not the right way, but when all you have is the current system—which is gamified, unfair, and not transparent—we thought that providing an alternative is the first step forward.

Academic Wheel of Privilege, a color wheel made up of many factors. Least privileged factors (homeless, dark-skinned, trans, etc) are on the outside and most privileged (rich, white, hetero, etc) are on the inside.

Have you experienced failure while practicing open science? How would you address them differently if you were to do it again?
I came to open science from a methodological, statistical, and even a little dogmatic point of view. Having a community, exchanging knowledge, and taking on other people’s perspectives allowed me to see further. If I could start FORRT again, I would make it an early focus to find funding for marginalized people and those who are trying to give their best to the community. I was not aware of the institutions that I had to contact or the grant applications I had to apply for so that we would have ways to compensate folks. I wish I had been better at communicating and understanding our community members and the system of incentives. Shout out to Sam Parsons, Amy Orben and CERES, Thomas Rhys Evans, Madeleine Pownall, Jackie Thompson, and folks at Invest in Open Infrastructure (IOI), who are helping FORRT to do better in this regard!

How has open science improved your research? Are there other benefits you have experienced from practicing open science?
I have benefited immensely from getting to know folks around the globe. Being open to learning from different perspectives has been the most valuable benefit. Open Science also helped me connect my “regular” research, which is essentially on justice, with better research practices. I’m a political psychologist by training, and the main question driving my research is: Why do some believe that a nation, people, race, gender, or species is justified in dominating, controlling, and exploiting another? Practicing open science has also opened my eyes to the inequities of academia and how we conduct research in a very non-inclusive and non-participatory way. And I hope to contribute to changing this.

What are your recommendations for practicing open science? What are the most important things that should be addressed in the field to accelerate open science further?
One aspect of open science is often absent from the discussion: the pedagogical consequences of how we teach, mentor, and supervise students through open scholarship. The future of education requires open scholarship principles to be integrated into research training. I think that pedagogical communities in that integration play a significant role, especially in this super-connected world. Pedagogical communities can help co-create materials available for everyone while fostering an inclusive community across all career stages, diverse disciplines, and different regions. Pedagogical communities also offer a low entry point for research and better practices in a much-needed environment. People can identify common hurdles by exchanging opinions. Pedagogical communities are definitely something that we should pay more attention to and recognize. FORRT folks have published a manuscript entitled, Towards a culture of open scholarship: The role of pedagogical communities, discussing this further. The benefit of pedagogical communities and the role they play in fostering an inclusive culture of open scholarship and calling for greater collaboration with pedagogical communities to pave the way for a much-needed integration of top-down and grassroots open scholarship initiatives.

What would you say to early career researchers who want to practice open science?
Admittedly, what I’m about to say is a very narrow and personal view because I think there are so many more qualified folks that could provide a plethora of helpful answers. But, I would say that joining a community that speaks to your heart and your science would be the best way to go. That community can be any open science community, whether it is about sharing data, creating education materials, or conducting research projects. Through community, we learn how to best conduct open scholarship on a day-to-day basis. Especially for early career scholars from the Global South, having a community to talk about issues, ask for help, and see other people asking for help can have a positive effect.

What are some of your favorite open science tools or resources that you’d like to share with us?
From a social science perspective, I like pedagogical communities because they are essentially trying to provide a common good for others. The Turing Way is one of my favorites—it is a very well-run organization. Also, Open Life Science, The Carpentries, and Berkeley Initiative for Transparency in Social Sciences (BITSS) are great. The RIOT Science Club offers seminars where people can learn about open science, open scholarship, and new methodologies. A few projects try to implement replication, such as The Institute for Replication. These are major institutions, but there are so many great researchers like Charlotte Pennington, Julia Strand, Gilad Feldman, Katie Button, Lisa DeBruine, and Jordan Wagge that I try to mirror.

Regarding open tools, I use OSF, Zenodo, and SHARE. CRediT and Tenzing are a really good way to give fair credit to people based on what they did on a given paper. You can use it on a Shiny app that exports all the information neatly, which helps a lot when writing papers with dozens of co-authors. As I mentioned about communities, the R Studio Community is great. They are amazing folks with amazing open products. I also want to give a shout-out to R-Ladies, which is an amazing group.

Lastly, is there anything you would like to share regarding open science?
A few weeks ago, I shared with my friends and colleagues on Twitter that I am going through a medical problem. I got overwhelming supportive reactions from folks. But, the thing that struck my heart the most was the messages I got from people who shared their own stories about their disability and how in academia, it is a reason for shame—something that should be hidden. And even people that are extremely well-known in my field sent me direct messages about their struggles. This made me sad. We internalize the invisibleness and powerlessness across every tier of academia. On top of dealing with a debilitating condition, disability becomes a source of shame and weakness. We need to do more in this regard, be more open, and normalize these struggles we go through in life. We need to talk more about disability in academia.

 

By Steffie S. Kim [Twitter]
Digital Marketing Intern at NASA Transform to Open Science
*Session Mentor: Isabella B. Martinez