hadoop- determine if a file is being written to

Tag: hadoop Author: wangyunzhongbaihe Date: 2009-11-03

Is there a way to determine if a file in hadoop is being written to? eg- I have a process that puts logs into hdfs. I have another process that monitors for the existence of new logs in hdfs, but I'd like it to make sure the file has been completely uploaded into hdfs before processing. Is something like this possible?

Other Answer1

The Hadoop filesystem API doesn't appear to provide any information if a file is currently being written to or not. However, as a workaround you could check the modification time of the file in question - if no write has occurred in some time (for example, 20 minutes), then it is probably safe to assume the copy has either completed or has died.