What is NASA doing with Big Data?

Nicholas Skytland
10 min readOct 30, 2014

--

In the time it took you to read this sentence, NASA gathered approximately 1.73 gigabytes of data from our nearly 100 currently active missions! We do this every hour, every day, every year – and the collection rate is growing exponentially. Handling, storing, and managing this data is a massive challenge. Our data is one of our most valuable assets, and its strategic importance in our research and science is huge. We are committed to making our data as accessible as possible, both for the benefit of our work and for the betterment of humankind through the innovation and creativity of the over seven billion other people on this planet who don’t work at NASA.

What is Big Data?

The whole idea of big data is still relatively new and not well understood and most discussions around the subject start off with a definition of what big data really is. Definitions are certainly helpful, so we’ll start there oursevles. Big data is very simply a collection of data sets so large and complex that your legacy IT systems can not handle them. When organizations get to the point where their volume, velocity, variety and veracity of data exceed storage or computing capacity, there are some big challenges that need to be addressed. You know you have a big data challenge when your traditional data management systems and analysis tools are overwhelmed and it becomes difficult to process your data using the analytic or visualization tools you currently have.

Approaching the big data challenge often necessitates advanced algorithms, infrastructure and frameworks – and it can all seem very daunting for those just starting out – but the reality for information-age-based organizations is that your success is throttled by your ability to rapidly and comprehensively navigate the big data universe.

But of course, big data is relative. In the end, big data by itself has no value – it’s meaningless. It’s what you do with the data that matters most. Today’s big data discussion is often centered around how to target advertisements or customize a user experience, which makes sense given that the growth in the market place is so closely tied to fact that how we interact with the physical world is more and more dependent on the pervasive use of mobile devices that are connect to the work through sensors. Having the ability to leverage our rich history of data and combine it with new data we are receiving is a huge asset in making our missions successful.

If you are still trying to wrap your head around the difference between petabytes, exabytes, zetabytes, and yottabytes, check out this overview presentation titled What is Big Data and why does it matterby Tom Soderstrom, the Chief Technology Officer for IT at NASA JPL.

NASA’s Big Data Challenge

NASA’s big data challenge is not just a terrestrial one and it goes beyond the stereotypical challenge. Many of our “big data” sets are described by significant metadata, but on a scale that challenges current and future data management practice. We regularly engage in missions where data is continually streaming from spacecraft on Earth and in space, faster than we can store, manage, and interpret it. NASA has two very different types of spacecraft. We have deep space spacecraft that sends back data in the order of MB/s. Then we have earth orbiters that can send back data in GB/s per second. In our current missions, data is transferred with radio frequency, which is relatively slow. In the future, NASA will employ technology such as optical (laser) communication to increase the download and mean a 1000x increase in the volume of data. This is much more then we can handle today and this is what we are starting to prepare for now.

We are planning missions today that will easily stream more then 24TB’s a day. That’s roughly 2.4 times the entire Library of Congress – EVERY DAY. For one mission.

It’s still very expensive to transfer just one bit down from a spacecraft so we want to make sure we collect what is most important. Once the data makes its way to our data centers, storing, managing, visualizing and analyzing it becomes an issue. To give you an idea of what we are dealing with, the size of the Climate Change data repositories alone are projected to grow to nearly 350 Petabytes by 2030. 5 PB’s is equivalent to the total number of letters delivered by the US Postal Service in one year!

One great example of the unique challenge that we face with managing space data is just starting to be demonstrated by the Australian Square Kilometer Array Pathfinder (ASKAP) project which is a large array made up of 36 antennas, each 12 meters in diameter, spread out over 4,000 square meters but working together as a single instrument to unlock the mysteries of our universe. The array, which will officially be turned on and open for business tomorrow Friday, October 5, 2012, is able to survey the whole sky very quickly and offers an ability to perform research that could never have been done before. Check out this great time lapse video showing off the new telescopes capabilities! The array is a precursor for the larger Square Kilometre Array telescope that will open in 2016 and will combine the signals received from thousands of small antennas spread over a distance of more than 3000 km. When operational, as much as 700TB/second of data will flow from the Square Kilometre Array! This is a big data challenge.

And of course, spacecraft are not the only source of our data, thanks to an ever-growing supply of mobile devices, low-cost sensors, and online platforms. As an article in Harvard Business Review put it, “each of us is now a walking data generator.” The scale of the big data challenge for NASA, like many organizations, is daunting.

Each of us is now a walking data generator.

As you can probably imagine, the increasing data volumes are not our only challenges. As our wealth of data increases, the challenge of indexing, searching, transferring, and so on all increase exponentially as well. Additionally, the increasing complexity of instruments and algorithms, increasing rate of technology refresh, and the decreasing budget environment, all play a significant factor in our approach. Fortunately, the entire federal government has turned their attention towards the growing challenge. In March 2012, the Obama administration announced the Big Data Research and Development Initiative to “greatly improve the tools and techniques needed to access, organize, and glean discoveries from huge volumes of digital data.” The goal is to transform government’s ability to use big data for scientific discovery, environmental and biomedical research, education, and national security.

Current Approaches

Developing new approaches to understanding, analyzing, and visualizing the data we have en masse is of vital interest to NASA. Within government, there is a push to get ahead of big data from both the top down and bottom up. Below are six world-class examples of how we manage, store, archive, analyze, visualize, and apply our big data efforts.

Managing and Processing Data

NASA’s approach to managing and processing data is demonstrated by the Mission Data Processing and Control System (MPCS) which was used by the Curiosity rover on Mars. MPCS interfaces with NASA’s deep-space network, and in turn the Mars Reconnaissance Orbiter, to relay data to and from Curiosity and process the raw data in real time, a process which previously took hours if not days to accomplish. The system produces custom data visualizations that are used by the flight operations team.

Data Storage

The NASA Center for Climate Simulation (NCCS), which is primarily used by NASA’s Global Modeling and Assimilation Office and the Goddard Institute for Space Studies, demonstrates the Agency’s approach to storing big data. The NCCS focuses on climate and weather data and currently houses 32 petabytes of data, with a total capacity of 37 petabytes (source). The center also has advanced visualization tools, such as it’s 17-by-6-foot visualization wall which allows for one high-resolution surface on which scientists can display still images, video and animated content from data housed in the system.

Archiving and Distribution of Data

Two examples of how NASA approaches processing and archiving are demonstrated by the Atmospheric Science Data Center (ASDC), which is focused on Earth science, and the Planetary Data System (PDS), which is focused on planetary science. The Atmospheric Science Data Center at NASA Langley Research Center is responsible for processing, archiving, and distribution of NASA Earth science data. It specializes in atmospheric data important to understanding the causes and processes of global climate change and the consequences of human activities on the climate and includes petabytes of climate data collected over decades. The Planetary Data Systems archives and distributes scientific data into one website from NASA planetary missions, astronomical observations, and laboratory measurements. It offers access to over 100 TB of space images, telemetry, models, and anything else associated with planetary missions from the past 30 years.

Data Analysis

NASA’s Pleiades supercomputer is used to help analyze the challenging projects, from solar flare and space weather scenarios to detailed space vehicle designs. Pleiades was recently used to process massive amounts of star data gathered from NASA’s Kepler spacecraft, leading to the discovery of new Earth-sized planets in the Milky Way galaxy. More than 1,200 users across the country rely on the system to perform large, complex calculations. It was also used to generate the Bolshoi cosmological simulation which explores how galaxies and the large-scale structure of the universe has formed over billions of years.

Data Visualization

The NASA Earth Exchange (NEX) is a virtual laboratory that integrates supercomputer, data system, data visualization, large amount of online data, models and algorithms, with social network and collaborative technology. Prior to NEX, scientists were required to invest tremendous amounts of time and effort to develop high-end computational methods rather than focus on important scientific problems. Now, scientists can use the supercomputer to visualize large Earth science data sets as well as run and share modeling algorithms and collaborate on new or existing projects. Recently, a research team from around the U.S. used the NEX environment to adjoin an atmospherically correct mosaic of 9,000 Landsat Thematic Mapper scenes and retrieve global vegetation density at a 30- meter resolution. The entire processing of the nearly 340 billion pixels in the the composite took just a few hours on the Pleiades supercomputer, allowing the team to experiment with new algorithms and approaches with ease. We’ve also invested in a number of collaboration and knowledge-sharing platform for the Earth science community that combine supercomputing, Earth system modeling, workflow management and NASA remote sensing data feeds to enable a holistic view of our work for researchers. More information on NEX.

Commercial cloud computing services

The Mars Science Laboratory mission demonstrates how NASA is modernizing its approach to Big Data by utilizing cloud computing and commercially available cloud storage solutions. In less than four months, NASA engineered and migrated legacy content management system and websites to Amazon Web Services. MSL relied heavily on mission-critical applications that could sustain failure of over a dozen data centers, while delivering over 150 Gigabits per second of traffic to a global community of operators, scientists, and general public. The team developed a solution that would download raw images and telemetry directly from Curiosity and place them into Amazon S3 storage buckets. As the data streamed in, every image from Mars was uploaded, processed, stored, and delivered from the cloud. The data was then catalogued in highly available and scalable databases and exposed to applications and users via a Restful interface. This allowed the content managers for the Mars Web sites to easily create informative Web pages with powerful real–time images. This modern approach allowed NASA to deliver 120 TB of dynamic content and 30 TB of static content the first night, and meet the demands when over 8 million hits were requested of their websites in less than one minute. It also allowed the team to take advantage of the JPL Galaxy and JPL Nebula supercomputers which ran close to 200 24-hour Monte Carlo simulations at 20 GB each during the mission.

NASA’s Center for Climate Simulation

Real World application of what NASA is doing with Big Data

The benefits of what NASA is doing in big data are not limited to just the government! In fact, this work has very real implications for you. One real world example of how NASA leverages its expertise in big data, and directly affects your life, is in the field of airline safety. NASA is involved in analyzing data collected from planes to study safety implications, which in turn will help with commercial airlines’ maintenance procedure improvements and potentially prevent equipment failures. Using advanced algorithms, the agency helped tease out relevant information from a mountain of unstructured data to help predict and prevent safety problems. Using the open-source Multiple Kernel Anomaly Detection (MKAD) algorithm, the agency determined how two continuous data streams or networks are similar, and then analyzed them using a single framework to detect patterns to automatically discovering precursors related to adverse events while an airplane is in flight.

A Big Data Opportunity

From analyzing the real-time solar plasma ejections and monitoring global climate change to optimizing large scale engineering designs and modernizing the way we approach mission operations, NASA is a leader in the application of big data. At NASA we are continuing to experiment with new ways to harness this shifting environment and tackle the many challenges it poses to government and the way we do business. We are currently developing exploring new ways for NASA to help drive innovations in technology around big data. Although we are just in the beginning stages of exploring the big data universe, the opportunities are truly limitless.

If you are looking for NASA data, check out data.nasa.gov as a starting point to engage with the many different datasets and easy to use tools that we make available. For those looking for more technical information, check out a great study titled “Frontiers in Massive Data Analysis” recently published by the National Research Council that dives into the technical issues — computational and inferential — that surround massive data.

This post was originally published at open.nasa.gov on 4 October 2012 and recently updated for medium.com. To follow along with NASA is doing with data, visit @nasadata or data.nasa.gov

--

--

Nicholas Skytland
Nicholas Skytland

Written by Nicholas Skytland

Hacker, Rocket Scientist and Ironman

No responses yet