【6.28】Academic Lecture: Toward HPC/Big Data convergence: the YML+Hadoop 2.x opportunity

2016-06-26

Title: Toward HPC/Big Data convergence: the YML+Hadoop 2.x opportunity

Speaker:Dr.Laurent Bobelin

Time:15:00, June 28

Place: Meeting Room at the second of the Computing Center

About the speaker:

Dr.Laurent Bobelin has a fifteen year experience in distributed system research, both in industry and academy. He obtained his Ph.D. from the Université de la Méditerranée, Marseilles, France, in 2008. He held academic positions in France at CNRS, INRIA, ENS Lyon, and in distributed computing teams at Grenoble, Bourges and Tours. In industry, he worked on grid research projects funded by Europe as well as French Ministry of Research. As an academic he worked on various subject related to network place in distributed systems architecture and operation, such as automatic discovery and modeling of distributed systems networks, architecture and design of computing grids, automatic network security in cloud platform, distributed system modeling for simulation and Hadoop scheduling. His research interests include distributed systems architecture and their security, networking and big data.

Abstract:

One of the biggest challenges that focus attention of large scale systems researchers these days is the convergence of High Performance and Big Data Computing. But anyone who tries to tackle this challenge will face many problems, ranging from hardware inefficiency to major software incompatibility.

Hadoop is one of the major frameworks dedicated to big data distributed computing. Initially dedicated to MapReduce programming framework in version 1, it has evolved since version 2 in a generic middleware able to handle any kind of distributed application. In order to do so, Hadoop has known tremendous changes in its architecture and design since version 1.x. However, its performances remains highly dependent from Map Reduce initial assumptions about the underlying architecture.

YML is a workflow framework for HPC applications, developed since more than 10 years by the French research community that has been used on many supercomputers in France and abroad, such as K computer at RIKEN. It turns out that Hadoop v2 and YML schedulers share a lot in terms of architecture: there is then an opportunity to mix big data and HPC worklow so that an application can use methods coming from both worlds concurrently by intergrating those two tools. This seminar presents some of major locks of HPC and Big Data convergence as well as the ongoing work of integrating Hadoop and YML.