Credit: Courtesy of Jer Thorp
More data are being created, consumed, and transported than ever before, and in all areas of society, including business, government, health care, and science. The hope and promise is that this influx of information—known as big data—will be transformative: armed with insightful information, businesses will make more money while providing improved customer service, governments will be more efficient, medical science will be better able to prevent and treat diseases, and science will make discoveries that otherwise would not be possible.
But to do all that, people have to be able to make sense of the data. Scientists and engineers are employing new computational techniques and algorithms to do just that, but sometimes, to gain significant insights from data, you have to see it—and as the data become increasingly complex and bountiful, traditional bar graphs and scatter plots just won't do. And so, not only will scientists and engineers have to overcome the technical challenges of processing and analyzing such large amounts of complex data—often in real time, as the data are being collected—but they also will need new ways to visualize and interact with that information.
In a recent symposium hosted at Caltech in collaboration with the Jet Propulsion Laboratory (JPL) and Art Center College of Design in Pasadena, computer scientists, artists, and designers gathered to discuss what they called the "emerging science of big-data visualization." The speakers laid out their vision for the potential of data visualization and demonstrated its utility, power, and beauty with myriad examples.
Data visualization represents a natural intersection of art and science, says Hillary Mushkin, a visiting professor of art and design in mechanical and civil engineering at Caltech and one of the co-organizers of the symposium. Artists and scientists alike pursue endeavors that involve questions, research, and creativity, she explains. "Visualization is another entry point to the same practice—another kind of inquiry that we are already engaged in."
Traditionally, data visualization tends to be reductionist, said self-described data artist Jer Thorp in his talk at the symposium. Charts and graphs are usually used to distill complex data into a simpler message or idea. But the promise of big data is that it contains hidden insight and knowledge. To gain that deeper understanding, he explained, we must embrace the inherent complexity of data. Data visualization, therefore, should be revelatory instead of just reductionist—not simply a way to convey information or find answers, Thorp said, but to generate and cultivate questions. Or, as he put it, the goal is question farming rather than answer farming.
For example, Thorp used Twitter to generate a model of worldwide travel—a model that, he said, was inspired by the desire to actually create a model that describes how viruses are spread. He searched for tweets that included the phrase, "just landed in" and recorded the tweeted destinations. Combining that information with the original locations of the travelers, as listed in their Twitter profiles, Thorp created an animated graphic depicting air travel. Since one way that diseases are spread globally is through air travel, the graphic—while rudimentary, he admitted—could be a starting point for epidemiological models of disease outbreaks.
Data visualization also may be interactive, allowing users to manipulate the data and peel back multiple layers of information. Thorp created such a graphic to represent the first four months of data from NASA's Kepler mission—the space telescope that has discovered thousands of possible planets. A user not only can visualize all the planet candidates from the data set but also can reorganize the planets in terms of size, temperature, or other variables.
Artist Gola Levin demonstrated how art and data can be used to provoke thoughts and ideas. One example is a project called The Secret Life of Numbers, in which he counted the number of pages that Google returned when searching for each integer from 0 to 1,000,000. He used that data to create an interactive graphic that shows interesting trends—the popularity of certain numbers like 911, 1234, or 90210, for example.
Anja-Silvia Goeing, a lecturer in history at Caltech, described the history of data visualization, highlighting centuries-old depictions of data, including maps of diseases in London; drawings and etchings of architecture, mechanical devices, and human anatomy; letter collections and address books as manifestations of social networks; and physical models of crystals and gemstones.
Data visualization, Goeing noted, has been around for many generations. What's new now is the need to visualize lots of complex data, and that, the symposium speakers argued, means we need to change how we think about data visualization. The idea that it is simply a technique or a tool is limiting, said designer Eric Rodenbeck during his presentation. Instead, we must think of visualization as a medium through which data can be explored, understood, and communicated.
This summer, Mushkin, along with the other co-organizers of the symposium—Scott Davidoff, the manager of the human interfaces group at JPL, and Maggie Hendrie, the chair of interaction design at Art Center—are mentoring undergraduate students from around the country to work on data-visualization research projects at Caltech and JPL. One group of students will work with two Caltech researchers—Ralph Adolphs, Bren Professor of Psychology and Neuroscience and professor of biology, and director of the Caltech Brain Imaging Center, and Mike Tyszka, associate director of the Caltech Brain Imaging Center—to visualize neural interactions deep inside the brain. Another will work with Caltech professor of aeronautics Beverley McKeon to visualize how fluid flows over walls. The third project involves systems engineering at JPL, challenging students to create a tool to visualize the complex process by which space missions are designed. Mushkin, Davidoff, and Hendrie also plan to invite more speakers to Caltech to talk about data visualization later in the summer.
You can watch all of the symposium talks on Caltech's YouTube channel.