Muhammad Mustafa Rafique

 

Title Energy-aware Scheduling in Hadoop
Abstract

Cloud computing has become a paradigm of choice to host large-scale applications on scalable infrastructures, and Hadoop has become the software infrastructure of choice for supporting a variety of big data applications in cloud environment. In cloud computing, it is typical to assumed homogeneous resources, and data locality is often the only scheduling constraint considered in scheduling the workloads. However, specialized energy-efficient architectures and regular system upgrades lead to hardware heterogeneity such that a data center may have multiple clusters providing different performance and energy characteristics. In this talk, I will present the design and the implementation of a workflow management system to effectively schedule jobs to multiple heterogeneous clusters while considering their energy and performance goals. The workflow manager tracks, records, and analyzes the performance and the energy characteristics of Hadoop applications on available heterogeneous clusters and schedules the jobs on the cluster based on their energy characteristics. When evaluated using representative Hadoop applications and benchmarks, the workflow manager yields significantly improved system performance as workloads are scheduled on the appropriate clusters that match the computing and energy profile of the scheduled application.

Bio

M. Mustafa Rafique is a Research Scientist in the High Performance Systems Group at IBM Research Dublin. His research focuses on designing and developing accelerator- and coprocessor-based heterogeneous clusters for high-performance computing (HPC). His research interest includes resource management for cloud computing, resource management for asymmetric clusters, programming models for heterogeneous systems, and I/O techniques for asymmetric multiprocessors. Prior to joining IBM, Mustafa has worked at NEC Labs and Qatar Computing Research Institute (QCRI) on designing innovative solutions for adaptive and efficient resource management in massively parallel, distributed, and high-performance computing systems. Mustafa received his MS and Ph.D. degrees in Computer Science from Virginia Tech in 2010 and 2011 respectively.