How to check whether a file exists or not using hdfs shell commands

Tag: hadoop Author: d1s2c3 Date: 2011-09-04

am new to hadoop and a small help is required.

Suppose if i ran the job in background using shell scripting, how do i know whether the job is completed or not. The reason am asking is, once the job is completed my script has to move output file to some other location. How can i check whether job completed or outfile exists or not using hdfs.

Thanks MRK

Other Answer1

You need to be careful in the way you are detecting the job is done in this way, because there might be output before your job is completely finished.

To answer your direct question, to test for existence I typically do hadoop fs -ls $output | wc -l and then make sure the number is greater than 0.

My suggestion is you use && to tack on the move:

hadoop ... myjob.jar ... && hadoop fs -mv $output $new_output &

This will complete the job, and then perform the move afterwards.

Other Answer2

You can use JobConf.setJobEndNotificationURI() to get notified when the job gets completed.

I think you can also check for the pid of the process that started the Hadoop job using the ps command.

comments:

HI, I am running the job in shell script, and adding another hdfs command to move the output file to local file system once the job completed. Now i would like to submit the in backgrounbd by using &. How i will know when to move the outfile. I mean is there any HDFS command to check whether the o/p file exists or not.
You can also use a file scheme (something like 'file://location') in the URI. Create a shell script which copies the file from HDFS to local and whatever is required and then give the URI of the script to the setJobEndNotificationURI(). When the job is completed (success or failure) the shell script is invoked automatically. The URI can contain 2 special parameters: $jobId and $jobStatus.