A contention aware hybrid evaluator for schedulers of big data applications in computer clusters
Large enterprises use clusters of computers to process Big Data workloads that are heterogeneous in terms of the type of jobs and the nature of their arrival processes. The scheduling of jobs from such workloads has a significant impact on their execution times. This paper presents a Trace Driven Analytic Model (TDAM) methodology to assess the impact of different scheduling schemes on job execution times. The analytic models used by this method consist of closed queuing network methods that estimate congestion at the various nodes of the cluster. The paper demonstrates the usefulness of this approach by showing how four different types of common schedulers affect the execution times of jobs derived from well-known benchmarks. This method is then implemented inside of a popular Hadoop job-trace simulator called Mumak, making Mumak contention-aware. The original Mumak tool completely ignores contention for processors and I/O at each node of the cluster. Our contentiion-aware Mumak predicts job completion times at a significantly higher level of accuracy.