NASA’s Science Discovery Engine Transforms Science Accessibility

As a key component of NASA’s Open Source Science Initiative (OSSI), the Science Discovery Engine (SDE) offers a new, powerful discovery capability for all of NASA’s open science data and information. The SDE beta version launch was announced a little over a year ago at the 2022 American Geophysical Union conference. Since then, the SDE team has continued to improve the interface to enable more insightful, relevant search results and to provide an engaging user experience.

Building the SDE 

When NASA  announced a strategic focus on open-source science in 2018, one major recommendation from the scientific community was to develop an integrated search portal. Scientists advocated for an interface that would create simultaneous access to content from all of NASA’s five science topic areas: Astrophysics, Biological and Physical Science, Earth Science, Heliophysics, and Planetary Science.

A team of data scientists, subject matter experts, and developers began formulating a plan in early 2020 to meet this ambitious objective. Given the diversity, distribution, and vast size of NASA science data and information, many challenges emerged. 

For instance, each topic area uses its own metadata standards and vocabularies, making it difficult to create comprehensive and accurate metadata across disciplines. In addition, much of the content relevant to understanding and using data is dispersed across many websites and code repositories, making identification and curation of information sources a time-consuming task. 

Nevertheless, the SDE team has successfully corralled over 600,000 science documents, creating a pathway for researchers to pursue transformative, interdisciplinary science.

Fine-tuning the Engine

The SDE user interface is customized to meet the needs of the scientific community. Users can perform free-text searches and apply text-based facets to refine results by their chosen parameters. Users can also filter by information type such as data, images, or documentation. Dataset landing pages are standardized to ensure a consistent, cohesive user experience.

A primary goal of SDE is making the scientific research process more efficient by helping scientists more quickly find and gain access to necessary data and information. To that end, the SDE has served as a pathfinder for adopting and operating an emerging technology: an insight engine. Insight engine software applies relevancy methods to discover, analyze, describe, and organize content and data from diverse sources. An advantage that insight engines offer over traditional search engines is the ability to incorporate natural language processing and machine learning. Infusing search processes with these artificial intelligence techniques helps calibrate retrievals with context enrichment, providing users with more accurate and relevant results.

In early January, the SDE team rolled out an updated user interface to facilitate even greater ease and expediency in search processes. Some of the improvements include additional filtering options that allow users to search within a single science topic area such as planetary science or heliophysics and a new acronym search feature. With these upgrades and more, the SDE is promoting research efficiency through a commitment to open science principles.

Looking to the Future

In the coming months, the team will complete the initial round of curating NASA science content and incorporating it into the SDE. In addition, the SDE team is also planning a full roll-out of the tool in the fall of 2024. Finally, the SDE is beginning to prototype tailored search applications that meet the needs of individual scientific audiences. For example, a specialized search interface designed for users who are interested in environmental justice information is already in development.

The SDE team is also testing advanced search techniques made possible through emerging technologies like large language models (LLMs). These models are changing the way search is conducted, and search capabilities like the SDE will need to adapt to enable both conversational and keyword search in the future. The SDE team is also interested in exploring how LLMs can enhance the efficiency and accuracy of curation workflows.

From its inception, the SDE has been a beacon on the path to open science, illuminating the opportunities that emerge when science information is fundamentally accessible to all. This comprehensive, nimble search interface will continue to impress and inspire as researchers explore how to maximize scientific progress through information discovery and collaboration.