What's the Deal with Big Data?

Big Data affects nearly all of us in NASA and it is exploding– the average annual growth rate is 60% and by end of 2012, the digitaluniverse is estimated to be 2.7 zettabytes.  Goddard manages enormous volumes of datarelated to building and maintaining satellites, analytical simulations, and supportfunctions.

So what is Big Data? In the IT industry, Big Data isdefined by four V’s: volume, velocity, variety, and veracity. Volume is the sheer amount of data. Velocity is the speed with which newdata is created and existing data modified. Varietyis the management of various data formats and types. Veracity is a concern of many business leaders; it is estimatedthat one in three CEO’s don’t trust the information they use. Also, I’ll add afifth V, for Value, or the considerableusefulness of Big Data. Itallows us to see data patterns and anomalies and shiftsour decision-making from being reactive to proactive. So how is NASA managing the many challengesof Big Data? The NASA Open Government Plan outlines many of ourapproaches such as: managing and processing; archiving and distribution; and sharingdata.

On managing and processing, here’san example. The MissionData Processing and Control System (MPCS) was recently usedby the Curiosity rover on Mars. MPCS interfaces with NASA’s deep-space network,and in turn the Mars Reconnaissance Orbiter, to relay data to and fromCuriosity and process the raw data in real time, a process which previouslytook hours, if not days, to accomplish.

For archiving and distribution, consider the Atmospheric Science Data Center (ASDC) at Langley, which isprocesses, archives and distributes Earth science data, and the Planetary Data System (PDS), which contains considerable planetary science data. PDS offers accessto over 100 TB of space images, telemetry, models, etc. associated withplanetary missions from the past 30 years. 

NASAis a leader at sharing Big Data. TheEarth Observing System Data and Information System (EOSDIS) manages and sharesEarth science data from various sources – satellites, aircraft, fieldmeasurements, etc. The EOSDIS science operations are performed within 12interconnected Distributed Active Archive Centers (DAACs), each with specificresponsibilities for producing, archiving, and distributing Earth science dataproducts.    

To enhance our ability to manage BigData, I believe that the IT industry should adopt the Predictive Model Markup Language (PMML), an XML-based,vendor-agnostic markuplanguage that provides an easier way to share predictiveanalytical model data. With PMML, proprietary issues and incompatibilities are no longer abarrier to the exchange of data and models between applications.  

One real world example of how NASAleverages its expertise in Big Data, and directly affects your life, is in thefield of airline safety. NASA analyzes data from planes to study safetyimplications, which in turn helps to improve the maintenance procedures ofcommercial airlines and potentially prevent equipment failures. Using advancedalgorithms, the agency helps sift through mountains of unstructured data tofind key information that helps predict and prevent safety problems.