Hadoop- Basic question regarding input to the mapper function

Tag: hadoop Author: lanbaihe525 Date: 2010-08-30

We can provide input files to the mapper as

FileInputFormat.setInputPaths(conf, inputPath);

Is it possible to pass a reference to memory say a DOM tree constructed using a DOM parser after parsing an xml file as an input to mapper function of Hadoop framework.

What other possibilities are there ?

Thanks, L

Best Answer

No, you can't specify memory (RAM) based information.

The reason is that in general Hadoop applications will be distributed over a lot of physically separated systems. The current version of Hadoop "only" supports distributed data using HDFS ... which is a file system.

What you can do is add the DOM parser as a preprocessing step to your mapper and simply specify your input test file as the input. You can most easily do that by creating your own derivative of FileInputFormat.

HTH

comments:

i have been doing that only.. just wanted to make sure that Hadoop take only files as input
Thank you very much for the answer