Einstein Online: An Interview with Diana Kormos-Buchwald

The Einstein Papers Project, housed at Caltech since 2000, has worked in collaboration with Princeton University Press, the Hebrew University of Jerusalem, and the digital publishing platform Tizra to produce a digital edition of The Collected Papers of Albert Einstein. This new edition presents the world-renowned physicist's annotated writings and correspondence through 1923 on a free and publicly accessible website.

Upon its launch today, the digital papers will contain all 13 published volumes of The Collected Papers, in Einstein's original German and translated into English, along with an index volume. Additional volumes will be added to the site about 18 months after each new volume is published. The 14th print volume, covering the period from April 1923 through May 1925 and including Einstein's trip to South America, is scheduled for publication in February 2015.

We recently sat down with Diana Kormos-Buchwald, professor of history at Caltech and director and general editor of the Einstein Papers Project, to talk about the project's new digital endeavor.


The digital edition makes it so that anyone with access to the Internet can read Einstein's papers and correspondence from the first 44 years of his life for free. Why have you and your colleagues undertaken this massive project?

The Collected Papers of Albert Einstein is a unique project in and of itself. Einstein is the most revolutionary and famous scientist of the 20th century, and there is no similar integrated project that compiles and annotates a scientist's writings and correspondence. These scholarly volumes are addressed, in a way, to a specialist audience—the historian of science, the philosopher of science, the physicist who wants to read Einstein in his own words.

But Einstein is and always has been of great interest to the general public as well. His is the most recognized face on the Internet in all cultures. People are attracted to him because of his creativity, maybe because of his image as an unconventional scientist.

So we are now making available these volumes that have explanations and footnotes in English, introductions in English, bibliographies, plus full translations, along with the ability to see some of the original manuscripts in high-definition scans through links to the Einstein Archives Online, another project that we launched a few years ago in collaboration with the Hebrew University's Einstein Archives. We are presenting all of this in an integrated platform in which the user can search for words and phrases in both English and German.

Biographers and historians need to focus their attention and highlight a selection of documents. But we can present everything—his scientific papers, his letters to his children, his travel diaries, his impressions of foreign lands and cultures, etc.

I think it's a great achievement that we were able to put these volumes up without putting them behind a pay wall. The Press has done a wonderful job. Each volume is equivalent to something like 100 scientific papers, plus the translations. And we're making them free and open. This is a joint effort, and it furthers what I think of as an authoritative way of doing digital humanities.


What do you hope readers will take away from reading Einstein's papers?

What I would hope the reader would find is how extraordinarily hard working Einstein was. Things didn't happen with flashes of insight. In the famous year 1905, when he publishes his papers on the special theory of relativity, quantum theory, Brownian motion, and E = mc2, he also publishes 20 reviews of other people's work.

We're putting up 5,000 documents. Einstein is known for 5 or 10, maybe 15 major papers; the 5,000 documents provide a context for those well-known papers.  He was an extremely productive scientist who wrote two to three pieces per month for the rest of his career, between 1905 and the late 1930s. We have 1,000 writings, many of them unpublished. So the beauty of these volumes is also that they include drafts and writings on a variety of topics that were never published during his lifetime.

Also, Einstein was interested in a lot of fields of science. He started with great interest in physical chemistry and mastered that literature. And he continued through his entire career to be interested in applied physics, theoretical physics, experimental physics, chemistry, biochemistry. He has exchanges with doctors about physiology. So while Einstein is not a Renaissance figure the way let's say Helmholtz was—he is a specialized physicist—nevertheless, he is very curious.

We also hope to demolish some outstanding myths: Einstein was not the isolated theoretician working by himself in an attic with pen and paper. He was a modern, professional scientist, who earned his living through his work as a scientist and as a professor. He was not wealthy. He was the exemplar of the transformation, if you want, in academia at the end of the 19th century and early 20th century, when science expanded a lot in universities. And the correspondence shows he has this ever-growing circle of friends and colleagues in science and engineering, and young people whom he shepherds and advises.


How long have you been working on this digital project with Princeton University Press, Tizra, and the Hebrew University of Jerusalem?

We have been planning this for several years. We wanted to present an accurate rendering of our volumes, which are highly specialized. And we wanted to make these volumes searchable—not only the scholarly annotations but also the scans, facsimiles, and reproductions.


Einstein famously spent several winter terms here at Caltech in the early 1930s, but the published volumes of The Collected Papers only cover his life through 1923. Are there items referencing Caltech in those volumes that we can look for in the digital edition?

Yes, Einstein visited Caltech in 1931, '32, and '33, but his correspondence with scientists at Caltech goes back much further. For example, in 1913, Einstein wrote a letter to George Ellery Hale asking whether the deflection of sunlight in the sun's gravitational field could be observed in the daytime. Hale wrote back saying no, we cannot see that.

He also had contacts with Robert A. Millikan quite early on. In 1922, Millikan officially informed Einstein that the National Academy of Sciences had elected him as a foreign associate. They also discuss scientific work quite a bit, and Millikan and Einstein both serve on the Intellectual Committee for International Cooperation of the League of Nations.

Einstein was instrumental in recommending several prominent scientists for recruitment very early in the founding of the Institute. The volumes also show correspondence between Einstein, Millikan, and Richard Tolman, professor of physical chemistry and mathematical physics, who was one of the earliest relativists.

Einstein knows, right at the beginning, in the early 1920s, that Caltech is going to be an exciting place.


Was Einstein unusual in the size of his correspondence?

Yes, his correspondence is very large for a scientist. It amounts to about 30,000 items to and from Einstein. It's of the size of Napoleon's papers—orders of magnitude larger than any other modern scientist.

This amount of correspondence testifies to Einstein's centrality in the scientific life of Europe in the 1920s. He does become a nexus, at least in physics. And he is flooded by requests—everything from requests from indigent students up to requests from very famous people that he should endorse this or that appeal, contribute to this or that volume, or participate in this or that conference. He gets to be in great demand.

He also gets a lot of inquiries from the general public about general relativity.


Does he answer them?

Yes, he tries to respond to every letter he gets. He was extremely disciplined. He spent quite a lot of time answering correspondence.


Have any of your team's discoveries been particularly exciting for you?

I was excited when, a few years ago, we discovered some new letters from Croatia—from a Croatian physicist dating back to early in Einstein's career. These were letters dating to 1911 and '12, before Einstein finished general relativity. I'm always very pleased when we find material prior to 1915 or '16 because Einstein's path from special relativity to general relativity is one of the most exciting intellectual journeys. Whenever we uncover new material from that decade, it is quite significant, because we have so little material for the young Einstein compared to the older Einstein. Later, his correspondence grows exponentially.

Kimm Fesenmaier
Exclude from News Hub: 
News Type: 
Research News
Tuesday, December 2, 2014
Guggenheim 101 (Lees-Kubota Lecture Hall) – Guggenheim Aeronautical Laboratory

PUSD: Annual Open Enrollment

New Center Supports Data-Driven Research

With the advanced capabilities of today's computer technologies, researchers can now collect vast amounts of information with unprecedented speed. However, gathering information is only one half of a scientific discovery, as the data also need to be analyzed and interpreted. A new center on campus aims to hasten such data-driven discoveries by making expertise and advanced computational tools available to Caltech researchers in many disciplines within the sciences and the humanities.

The new Center for Data-Driven Discovery (CD3), which became operational this fall, is a hub for researchers to apply advanced data exploration and analysis tools to their work in fields such as biology, environmental science, physics, astronomy, chemistry, engineering, and the humanities.

The Caltech center will also complement the resources available at JPL's Center for Data Science and Technology, says director of CD3 and professor of astronomy George Djorgovski.

"Bringing together the research, technical expertise, and respective disciplines of the two centers to form this joint initiative creates a wonderful synergy that will allow us opportunities to explore and innovate new capabilities in data-driven science for many of our sponsors," adds Daniel Crichton, director of the Center for Data Science and Technology at JPL.

At the core of the Caltech center are staff members who specialize in both computational methodology and various domains of science, such as biology, chemistry, and physics. Faculty-led research groups from each of Caltech's six divisions and JPL will be able to collaborate with center staff to find new ways to get the most from their research data. Resources at CD3 will range from data storage and cataloguing that meet the highest "housekeeping" standards, to custom data-analysis methods that combine statistics with machine learning—the development of algorithms that can "learn" from data. The staff will also help develop new research projects that could benefit from large amounts of existing data.

"The volume, quality, and complexity of data are growing such that the tools that we used to use—on our desktops or even on serious computing machines—10 years ago are no longer adequate. These are not problems that can be solved by just buying a bigger computer or better software; we need to actually invent new methods that allow us to make discoveries from these data sets," says Djorgovski.

Rather than turning to off-the-shelf data-analysis methods, Caltech researchers can now collaborate with CD3 staff to develop new customized computational methods and tools that are specialized for their unique goals. For example, astronomers like Djorgovski can use data-driven computing in the development of new ways to quickly scan large digital sky surveys for rare or interesting targets, such as distant quasars or new kinds of supernova explosions—targets that can be examined more closely with telescopes, such as those at the W. M. Keck Observatory, he says.

Mary Kennedy, the Allen and Lenabelle Davis Professor of Biology and a coleader of CD3, says that the center will serve as a bridge between the laboratory-science and computer-science communities at Caltech. In addition to matching up Caltech faculty members with the expertise they will need to analyze their data, the center will also minimize the gap between those communities by providing educational opportunities for undergraduate and graduate students.

"Scientific development has moved so quickly that the education of most experimental scientists has not included the techniques one needs to synthesize or mine large data sets efficiently," Kennedy says. "Another way to say this is that 'domain' sciences—biology, engineering, astronomy, geology, chemistry, sociology, etc.—have developed in isolation from theoretical computer science and mathematics aimed at analysis of high-dimensional data. The goal of the new center is to provide a link between the two."

Work in Kennedy's laboratory focuses on understanding what takes place at the molecular level in the brain when neuronal synapses are altered to store information during learning. She says that methods and tools developed at the new center will assist her group in creating computer simulations that can help them understand how synapses are regulated by enzymes during learning.

"The ability to simulate molecular mechanisms in detail and then test predictions of the simulations with experiments will revolutionize our understanding of highly interconnected control mechanisms in cells," she says. "To some, this seems like science fiction, but it won't stay fictional for long. Caltech needs to lead in these endeavors."

Assistant Professor of Biology Mitchell Guttman says that the center will also be an asset to groups like his that are trying to make sense out of big sets of genomic data. "Biology is becoming a big-data science—genome sequences are available at an unprecedented pace. Whereas it took more than $1 billion to sequence the first genome, it now costs less than $1,000," he says. "Making sense of all this data is a challenge, but it is the future of biomedical research."

In his own work, Guttman studies the genetic code of lncRNAs, a new class of gene that he discovered, largely through computational methods like those available at the new center. "I am excited about the new CD3 center because it represents an opportunity to leverage the best ideas and approaches across disciplines to solve a major challenge in our own research," he says.

But the most valuable findings from the center could be those that stem not from a single project, but from the multidisciplinary collaborations that CD3 will enable, Djorgovski says. "To me, the most interesting outcome is to have successful methodology transfers between different fields—for example, to see if a solution developed in astronomy can be used in biology," he says.

In fact, one such crossover method has already been identified, says Matthew Graham, a computational scientist at the center. "One of the challenges in data-rich science is dealing with very heterogeneous data—data of different types from different instruments," says Graham. "Using the experience and the methods we developed in astronomy for the Virtual Observatory, I worked with biologists to develop a smart data-management system for a collection of expression and gene-integration data for genetic lines in zebrafish. We are now starting a project along similar methodology transfer lines with Professor Barbara Wold's group on RNA genomics."

And, through the discovery of more tools and methods like these, "the center could really develop new projects that bridge the boundaries between different traditional fields through new collaborations," Djorgovski says.

Exclude from News Hub: 
News Type: 
Research News

Using Simulation and Optimization to Cut Wait Times for Voters

No one ever likes long lines. Waiting in line may be inconvenient at the coffee shop or the bank, but it's a serious matter at voting centers, where a long wait time can discourage voters—and can be seen as an impediment to democracy.

However, with millions of Americans showing up at the polls, can long lines really be avoided on Election Day? By developing a tool to help better prepare polling places, Caltech sophomore Sean McKenna is using his Summer Undergraduate Research Fellowship (SURF) project as an opportunity to address that problem.

Over the summer, McKenna, an applied and computational mathematics major who works with Professor of Political Science Michael Alvarez, has been building a mathematics-informed tool that will predict busy times in precincts on Election Day and allocate voting machines in response to those predictions. This information could help election administrators minimize wait times for millions of voters.

"My project is based on a report from the Presidential Commission on Election Administration, which asserted that no American should ever have to wait more than 30 minutes to vote," McKenna says. "And so we're trying to see if we can help reach that goal by allocating voting machines in a new way."

McKenna's work is part of the Caltech/MIT Voting Technology Project (VTP), which has been working on voting technology and election administration since the 2000 election. At a June workshop for the collaborative VTP project, which aims to improve the voting process through research, McKenna met with academics and election administrators who suggested how he might apply his background in mathematics to create a tool for voting administrators to use on the VTP's website.

The tool he is developing uses a branch of applied mathematics called queueing theory to quantify the formation of lines on Election Day. "Queueing theory assumes that arrivals to a system like a polling place have a random, memoryless pattern. Under this assumption, the fact that one person just showed up to the precinct doesn't tell us whether the next person will show up two seconds from now or two minutes from now," he says. "Furthermore, queueing theory predicts line lengths and wait times as long-term averages, which scientists might call a steady-state approximation."

Although queueing theory provided a good jumping off point, there were a few real-world problems that an analytical model on its own couldn't address, McKenna says. For example, voter arrival behavior is not completely random on Election Day; early morning and late afternoon spikes in arrivals are the norm. In addition, polls are usually only open for 12 or 13 hours, which is not considered to be enough time for steady-state queueing approximations to be applicable.

"These challenges led us to review the literature and determine that running a simulation with actual data from administrators, as opposed to attempting to adjust strictly analytical models, was the best way to represent what actually happens in an election," McKenna explained.

The goal of the research is to create a simulation of an entire jurisdiction, such as a county with multiple polling places. The simulation would estimate wait times on Election Day based on information election administrators enter about their jurisdiction into the web-based tool. Administrators would then receive a customized output prior to Election Day, suggesting how to allocate voting machines across the jurisdiction and detailing the anticipated crowds—information that could both predict the severity of long lines and prompt new strategies for allocating voting machines to preempt long waits.

Several other Caltech undergraduates in Alvarez's group also have been working on alternative ways to improve the voting process. Senior physics major Jacob Shenker has been developing a system for more secure and user-friendly postal voting, and recent graduates Eugene Vinitsky (BS '14, physics) and Jonathan Schor (BS '14, biology and chemistry) produced a prototype of a mobile phone app that could help voters determine if there is a long line at their polling place.

While these projects were completed separately, McKenna says there may be room for collaboration in the future. "One thing that we're hoping my tool will be able to do is to predict for administrators what times are going to be busiest, and we could also export this information to the app for voters," he says. "For example, the app could alert someone that their polling place is very likely to have long lines in the morning so they should try to go in the afternoon."

The technologies that McKenna and his student colleagues are developing could change the way that millions of Americans participate in democracy in the future—which would be an impressive accomplishment for a young student who has yet to experience the physical aspect of lining up to vote.

"So that's one kind of sticky situation about my working on this project: I've never actually been in to vote in person. I've only been able to vote once, and since I'm from Minnesota, it had to be absentee by mail," he says.

Exclude from News Hub: 
Wednesday, October 29, 2014
Center for Student Services 360 (Workshop Space) – Center for Student Services

Meet the Outreach Guys: James & Julius

Wednesday, October 29, 2014
Avery Courtyard – Avery House

Fall Family Festival

Friday, October 17, 2014
Center for Student Services 360 (Workshop Space) – Center for Student Services

TA Training: fall make-up session

The Risk and Reward of Venture Capital: An Interview with Michael Ewens

Michael J. Ewens recently joined the faculty at Caltech as associate professor of finance and entrepreneurship after four years at the Tepper School of Business at Carnegie Mellon University. A native of Wisconsin, Ewens attended Washington University in St. Louis, majoring in mathematics and economics before moving on to UC San Diego for graduate studies in economics.

Ewens explains how he discovered venture capital through a summer job in graduate school, and shares his ambitions for his future at Caltech.


What field do you specialize in?

Entrepreneurial finance. I study the financing and development of high-growth start-ups such as Twitter, biotech start-ups, or new clean-energy firms. I study how money and investors get matched to start-ups and what value is created after they are financed. What are the factors that lead to them receiving the right money at the right prices, or failing to? How is capital raised?

Entrepreneurship is a fascinating area because it is at the extreme of many problems that come up in economics. A classic issue in economics is what happens in a situation where one person knows a lot more than the other—information asymmetry—and can take advantage. This is often the case in entrepreneurship, where you have people who are new to the business world seeking venture capital from people who have expertise in finance and money.


Is your interest in what goes into making start-ups successful purely theoretical?

No, it's a very important issue in practical terms. For example, Caltech allocates a part of its endowment toward the private equity asset class, which includes venture capital. So understanding how investments in start-ups behave in terms of risk and return is fundamentally important.

And, of course, it's important for entrepreneurs and policy makers. Most government officials think it's good to have more start-ups, and they think they know how to set policy to lead to more start-ups. But every economist who studies entrepreneurship comes from the position that we really don't know how to encourage start-ups and make them more profitable. Take the example of health care. It is thought that one reason people don't leave large companies to start new ones is that they are locked into their health insurance plans. Now, with the introduction of the Affordable Care Act ("Obamacare"), we can begin to look at the data and see if this supposition is correct.


How did you get interested in venture capital?

It was happenstance. I was a graduate student in economics at UC San Diego, studying international trade. I wanted to live close to campus, near the beach, but not in graduate housing. To do that, I needed to earn more than my research-assistant salary. So I started consulting for a venture-capital firm called Correlation Ventures. They are a unique firm. They introduced a different kind of econometrics into venture capital, the sort of techniques used in Moneyball, which revolutionized the business of creating a winning baseball team. I fell in love with the idea.

I had initially planned to work for them just over the summer, but they offered me access to a wonderful set of data that I could use in my graduate studies, so I stayed on as their "data guy." I continue to work as a part-time advisor to the fund.

What inspires you to choose particular topics in venture capital for further research?

Venture capital is a very dynamic field, so new research topics are not hard to come by. The challenge is collecting rich data and using quality empirical strategies.  For example, changes on the legislative side over the last couple of years have provided unique research opportunities. The JOBS Act [Jumpstart Our Business Startups Act], passed by Congress in 2012, significantly alters the way start-ups are financed, who can invest in them, and how such firms can eventually go public. These policy changes provide what economists call natural experiments. For example, the legislative changes make it possible for us to test theories concerning the types and magnitudes of financing frictions facing start-ups.

The underlying assumption behind such policies is that having many new small businesses is great, because, as everyone says, they create the most jobs. But what people forget is that new small businesses also destroy the most jobs, because most small businesses fail. So that's part of my research: to shine a light on what makes start-ups succeed or fail.


Are there other important issues for venture capital that you study besides changes in legislation?

Yes. For example, I'm working on a paper now with some coauthors that investigates the impact of new cloud software that has grown rapidly in use since 2005. Think the Amazon cloud. This software has made it possible for individuals to start certain types of businesses with very little money: information technology businesses, say, but obviously not something like developing new drugs, for which you need laboratory space. Then we can ask how this changes the venture capital investment choice. For example, if an investor can give you ten thousand dollars rather than a million to get your company up and running, how does that affect the investor's selection of entrepreneurs and the fate of start-ups generally?

It's also becoming easier and easier to collect disparate sources of economic data from the web. So questions that economists have studied in the past using small datasets can now be checked against much larger datasets of hundreds of thousands of observations.


What will you be teaching at Caltech?

Next January I'm going to teach a graduate course in applied econometrics, and in the spring I will be teaching a class in venture capital finance [BEM 110] that mirrors a class I taught to MBA students at Carnegie Mellon. I'm not worried about the undergrads at Caltech handling the course though. In fact, I'm looking forward to being able to throw more mathematics into the course. This course will give students background on how investors and entrepreneurs behave through the lens of economics and finance.


What attracted you to Caltech?

I liked my time at Carnegie Mellon, because in a business school you have a very close connection to industry and the "real world." But Caltech is "research first" in a way that a business school cannot be. Writing as many quality papers as possible and teaching the kind of things I was taught as a PhD student is what I'm best suited for, I think, and Caltech is the perfect place for that. In 15 years, I want to look back and say that I took on some risk and made a small but significant impact, changing the way people think about economics. Caltech shares that interest.

Exclude from News Hub: 
News Type: 
In Our Community
Tuesday, October 7, 2014
Red Door Cafe – Winnett Student Center

Samba and Salsa Exhibition

Tuesday, October 7, 2014
Center for Student Services 360 (Workshop Space) – Center for Student Services

Thirty Meter Telescope Groundbreaking and Blessing