Hadoop dfs.include file

Tag: hadoop Author: zchongyu Date: 2011-09-02

Please explain what's dfs.include file purpose and how to define it. I've added a new node to the Hadoop cluster but it's not identified by the namenode. In one of the posts I found that dfs.include can resolve this issue. Thank you in advance, Vladi

Other Answer1

Just including the node name in the dfs.include and mapred.include is not sufficient. The slave file has to be updated on the namenode/jobtracker. The tasktracker and the datanode have to be started on the new node and the refreshNodes command has to be run on the NameNode and the JobTracker to make them aware of the new node.

Here are the instructions on how to do this.

According to the 'Hadoop : The Definitive Guide'

The file (or files) specified by the dfs.hosts and mapred.hosts properties is different from the slaves file. The former is used by the namenode and jobtracker to determine which worker nodes may connect. The slaves file is used by the Hadoop control scripts to perform cluster-wide operations, such as cluster restarts. It is never used by the Hadoop daemons.

comments:

thank you Praveen! Your comment was very helpful.