Why does the Hadoop incompatible namespaceIDs issue happen?

Tag: hadoop Author: W473959566 Date: 2010-07-21

This is a fairly well-documented error and the fix is easy, but does anyone know why Hadoop datanode NamespaceIDs can get screwed up so easily or how Hadoop assigns the NamespaceIDs when it starts up the datanodes?

Here's the error:

2010-08-06 12:12:06,900 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /Users/jchen/Data/Hadoop/dfs/data: namenode namespaceID = 773619367; datanode namespaceID = 2049079249
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:233)
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:148)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:298)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:216)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1283)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1238)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1246)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1368)

This seems to even happen for single node instances.

Best Answer

Namenode generates new namespaceID every time you format HDFS. I think this is possibly to differentiate current version and previous version. You can always rollback to previous version if something is not proper which may not be possible if namespaceID is not unique for every formatted instance.

NamespaceID also connects namenode and datanodes. Datanodes bind themselves to namenode through namespaceID

Other Answer1

this problem is well explained and helped in the following fine guide

comments:

It's a fine guide, and it explains how to fix the problem, but not what causes it nor how to avoid it...though it does link to the bug report issues.apache.org/jira/browse/HDFS-107 formerly known as issues.apache.org/jira/browse/HADOOP-1212

Other Answer2

I was getting this too, and then I tried putting my configuration in hdfs-site.xml instead of core-site.xml.

Seems to stop and start without that error now.

[EDIT, 2010-08-13]

Actually this is still happening, and it is caused by formatting.

If you watch the VERSION files when you do a format, you'll see (at least I do) that the namenode gets assigned a new namespaceID, but the data node does not.

Quick solution is to delete the VERSION for the datanode before format.

[TIDE, 2010-08-13]

comments:

very interesting.... and it has no error after a namenode format and restart?
thanks man, removing VERSION helped
where is defined VERSION?

Other Answer3

When I formatted my HDFS I also encountered this error. Apart from datanode not getting started, the jobtracker also won't start. For the datanode I manually changed the namespaceid; but for the jobtracker one has to create the /mapred/system (as hdfs user) directory and change its owner to mapred. The jobtracker should start running then after the format.