skip to main content
Caltech

CD3/CDST Special Seminar

Thursday, July 7, 2016
5:30pm to 8:00pm
Add to Cal
Annenberg 105
Big-Data Technology Innovation: Hadoop, Real-time, and Machine Learning
Andy Feng, VP of Architecture, Yahoo,
**REGISTRATION REQUIRED (Deadline: July 5)** Attendance is free for Caltech/JPL staff and students To register, and for more information, visit: http://tinyurl.com/incose0716 In this talk, we walk through Yahoo use cases (search, advertising, personalization, and Flickr) where our big-data technologies are best exemplified. We explain how Yahoo leverages these technologies to perform real-time processing and advanced machine learning against 600 petabytes of data, and describe the system architecture of our heterogeneous clusters of 40,000 servers for supporting a variety of workloads. We provide an overview of open source technologies (Apache Storm, Apache HBase, Apache Omid, and Yahoo CaffeOnSpark) and our in-house technology for large-scale machine learning. We discuss how academic researchers and industry technologists can help advance big-data technologies further. Abstract: Yahoo started developing big-data technology with Hadoop MapReduce and File System in 2006, and made it an Apache open source project in 2009. Since then, big data has become a major component of the global tech industry, and Yahoo is leading the way. In the past three years, Yahoo has been a leading contributor to Apache Storm for event processing, Apache HBase for distributed NoSQL stores, Apache Spark for faster processing, and Druid for sub-second analytics. We have created new open source projects such as Apache Omid for transactional support of NoSQL stores, Yahoo Data Sketches for approximate analytics, and Yahoo CaffeOnSpark for distributed deep learning. Bio: Dr. Andy Feng is a VP of Architecture at Yahoo leading the architecture and design of big data and machine learning initiatives. He's architected major platforms for personalization, ad serving, NoSQL, and cloud infrastructure. Prior to Yahoo, he was a Chief Architect at Netscape/AOL, and Principal Scientist at Xerox.