Wednesday, December 3, 2008

Blog: How to Run a Million Jobs; megajobs, processes that involve thousands to millions of similar or identical, though still independent, jobs using different processors

How to Run a Million Jobs
International Science Grid This Week (12/03/08) Heavey, Anne; Williamson, Amelia; Abramson, David

Experts at the recent SC08 conference held a session to discuss emerging solutions for dealing with the challenges of running megajobs, processes that involve thousands to millions of similar or identical, though still independent, jobs using different processors. Researchers want to be able to easily specify and manage such tasks, and to readily identify successful and failed jobs. The University of Chicago's Ben Clifford says that as tools and resources change, people describe their computing jobs differently. Some established job management solutions contain a variety of features, but they tend to have a high overhead in scheduling and they are inefficient at executing many short jobs on numerous processors. Other systems are designed specifically for the data-intensive, loosely coupled, high-throughput computing grid model, which works well for many thousands of jobs, both short and long. Ioan Raicu and Ian Foster, both from the University of Chicago and Argonne National Laboratory, have designed a class of applications called Many Tasks Computing (MTC), which is an application composed of many tasks, both independent and dependent, that are "communication-intensive but not naturally expressed in Message Passing Interface," Foster says. Unlike high throughput computing, MTC uses numerous computing resources over short periods of time to process tasks. Some computer systems are being altered to run megajobs, including IBM's new throughput, grid-style mode on the Blue Gene/P supercomputer. The University of Chicago's Ben Clifford says if users can break an application into separately schedulable, restartable, relocateable "application procedures," then they only need a tool to describe how the pieces connect, making the jobs easier to run.

View Full Article

No comments:

Blog Archive