Astronomy Colloquium
Starting with the Sloan Digital Sky Survey and the Hubble Deep Field, astronomy has entered the Era of Surveys. Today we have covered a substantial fraction of the sky in multiple wavelengths. Much of this data is now available on-line, as an easy-to-use virtual telescope. The data sets are interoperable and it is easy to cross-correlate between surveys. Astronomers became proficient in databases, and today they use these not as tools but rather like musical instruments. Over the centuries science has gone through several paradigms, starting with the "empirical", followed by "theoretical" and "computational" approaches to science. Today, the large surveys have led us to the so-called Fourth Paradigm of Science, where discoveries are "data-driven". Astronomers were early adopters, as we can only observe the sky, but cannot undertake experiments which change the behavior of celestial objects. This data-intensive approach to astronomy has resulted in disruptive changes, both technological and sociological. This talk will discuss the journey over the last 20 years, and where these changes have led us, and what may lie ahead. The Large Synoptic Survey Telescope, LSST, will open up the time domain and will produce the largest dataset astronomers will encounter. Such data sets will bring new challenges, as systematic errors will increasingly dominate over statistical noise. We already see how machine learning is turning new detections into discoveries. But the most interesting changes are still ahead: just as in self-driving cars, algorithms are making the decisions, and soon we will see AI tools setting adaptive choices about survey strategies, like target selection. This may be the beginning of the Fifth Paradigm of Science, where computers decide objectively which experiments will yield the biggest gain in our knowledge. Finally, I will also discuss structural and organizational changes that should happen, to make sure that legacy data sets, which have cost hundreds of millions to acquire, can be safely preserved and analyzed throughout their useful lifetime. This will require a fresh look at long term data curation - how to be FAIR (Findable, Accessible, Interoperable, Reusable) and how to be open, free and sustainable, all at the same time.Starting with the Sloan Digital Sky Survey and the Hubble Deep Field, astronomy has entered the Era of Surveys. Today we have covered a substantial fraction of the sky in multiple wavelengths. Much of this data is now available on-line, as an easy-to-use virtual telescope. The data sets are interoperable and it is easy to cross-correlate between surveys. Astronomers became proficient in databases, and today they use these not as tools but rather like musical instruments. Over the centuries science has gone through several paradigms, starting with the "empirical", followed by "theoretical" and "computational" approaches to science. Today, the large surveys have led us to the so-called Fourth Paradigm of Science, where discoveries are "data-driven". Astronomers were early adopters, as we can only observe the sky, but cannot undertake experiments which change the behavior of celestial objects. This data-intensive approach to astronomy has resulted in disruptive changes, both technological and sociological. This talk will discuss the journey over the last 20 years, and where these changes have led us, and what may lie ahead. The Large Synoptic Survey Telescope, LSST, will open up the time domain and will produce the largest dataset astronomers will encounter. Such data sets will bring new challenges, as systematic errors will increasingly dominate over statistical noise. We already see how machine learning is turning new detections into discoveries. But the most interesting changes are still ahead: just as in self-driving cars, algorithms are making the decisions, and soon we will see AI tools setting adaptive choices about survey strategies, like target selection. This may be the beginning of the Fifth Paradigm of Science, where computers decide objectively which experiments will yield the biggest gain in our knowledge. Finally, I will also discuss structural and organizational changes that should happen, to make sure that legacy data sets, which have cost hundreds of millions to acquire, can be safely preserved and analyzed throughout their useful lifetime. This will require a fresh look at long term data curation - how to be FAIR (Findable, Accessible, Interoperable, Reusable) and how to be open, free and sustainable, all at the same time.