Wiping out DFS in Hadoop

Tag: hadoop Author: suyan945 Date: 2009-08-25

How do I wipe out the DFS in Hadoop?

Other Answer1

bin/hadoop namenode -format

comments:

Watchout: existing old datanodes won't work with this newly formatted dfs. See issues.apache.org/jira/browse/HDFS-107

Other Answer2

You need to do two things:

  1. Delete the main hadoop storage directory from every node. This directory is defined by the hadoop.tmp.dir property in your hdfs-site.xml.

  2. Reformat the namenode:

hadoop namenode -format

If you only do (2), it will only remove the metadata stored by the namenode, but won't get rid of all the temporary storage and datanode blocks.

comments:

deleting main hadoop storage directory from every single node is not feasible!

Other Answer3

hdfs dfs -rm -r "/*"

(the old answer was deprecated)

comments:

I get a delete failed error when I try this, I can delete subdirectories, but not the root
hdfs dfs -rmr is now deprecated and also won't work for /. You should try hdfs dfs -rm -r "/*" instead.

Other Answer4

You may issue

hadoop fs -rmr /

This would delete all directories and sub-directories under DFS.

Another option is to stop your cluster and then issue:

hadoop namenode -format

This would erase all contents on DFS, and then start the cluster again.

Other Answer5

So this is what I have had to do in the past.

1. Navigate to your hadoop directory on your NameNode, then stop all the hadoop processes. By running the default stop-all script. This will also stop DFS. e.g.

cd myhadoopdirectory
bin/stop-all.sh

2. Now On every machine in your cluster (Namenodes, JobTrackers, datanodes etc.) delete all files in your main hadoop storage mine is set to the temp folder in the root folder. Yours can be found in the conf hdfs-site.xml file under hadoop.tmp.dir property e.g.

cd /temp/
rm -r *

3. Finally go back to your name node, and format it by going to your hadoop directory and running 'bin/hadoop namenode -format' e.g.

cd myhadoopdirectory
bin/hadoop namenode -format

4. Start up your cluster again by running the following command. It will also startup DFS again.

bin/start-all.sh

5. And it should work.

Other Answer6

  1. You need to call bin/stop-all.sh to stop dfs and mapreduce.
  2. Delete data dir which is configured in conf/hdfs-site.xml and conf/mapred-site.xml.
  3. Make sure that you have deleted some temporary files existing in /tmp dir.

After all above steps, you can call bin/hadoop namenode -format to regenerate a dfs.

Other Answer7

  1. Stop you cluster

    ${HADOOP_HOME}/bin/stop-mapred.sh

    ${HADOOP_HOME}/bin/stop-dfs.sh

    or if its pseudo distributed, simply issue:

    ${HADOOP_HOME}/bin/stop-all.sh

  2. Format your hdfs

    hadoop namenode -format