Is it possible to pick specific machines to run a particular type of hadoop jobs?

Tag: hadoop Author: wygf2009 Date: 2010-07-28

As far as I understand hadoop architecture considers all machines to be equal with any task/job being able to run on all and any of the machines in the cluster.

Is there a way to change this model to tag certain machines as having certain capabilities and then only pick machines that have capabilities required by a job to run that job?

Best Answer

Figured this one out. Since I am using the FairScheduler there is an extensibility point there that allows me to achieve my goal by writing a simple class implementing LoadManager interface

According to http://hadoop.apache.org/common/docs/current/fair_scheduler.html, FairScheduler uses instance of a class specified in mapred.fairscheduler.loadmanager config property (CapBasedLoadManager by default). The LoadManager interface provides convinient method

boolean canLaunchTask(TaskTrackerStatus tracker, JobInProgress job,  TaskType type)

which allows me to have custom logic to allow or deny particular job to run on a particular task tracker. Problem solved.

Lesson learned: reading source code is useful.

Other Answer1

Well. this seems useful but the data may not be local right ? One could also run two Jobtrackers, each managing a different pool of tasktrackers. You can submit the job to the right jobtracker.