How to run a Hadoop program?

Tag: hadoop Author: cao_hui88888 Date: 2010-09-10

I have set up Hadoop on my laptop and ran the example program given in the installation guide successfully. But, I am not able to run a program.

[email protected]:~/hadoop/ch2$ hadoop MaxTemperature input/ncdc/sample.txt output
Exception in thread "main" java.lang.NoClassDefFoundError: MaxTemperature
Caused by: java.lang.ClassNotFoundException: MaxTemperature
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
Could not find the main class: MaxTemperature.  Program will exit.

The book said that we should set a Hadoop Classpath by writing

[email protected]:~/hadoop/ch2$ export HADOOP_CLASSPATH=build/classes

The main class is defined in MaxTemperature.java file that I am executing. How do we set the Hadoop Classpath? Do we have to do it for all program execution or only once? Where should I put the input folder. My code is at /home/rohit/hadoop/ch2 and my Hadoop installation is at /home/hadoop.

Best Answer

You should package your application into a JAR file, that's much easier and less error-prone than fiddling with classpath folders.

In your case, you must also compile the .java file. You said it's MaxTemparature.java, but there must also be a MaxTemperature.class before you can run it.

comments:

The book I am referring did say that we should package the files in a jar file for easy distribution over a cluster. But, for simplicity they are using individual files at the start.

Other Answer1

I found this problem as well when going thru the Hadoop Book (O'Reilly). I fixed it by setting the HADOOP_CLASSPATH variable in the hadoop-env.sh file in your configuration directory.

Other Answer2

here is the ansewer in 3 steps:

1:

javac -verbose -classpath C:\\hadoop\\hadoop-0.19.2-core.jar MaxTemperature*.java -d build/classes

2:

put *.class in  build/classes

3:

export HADOOP_CLASSPATH=${HADOOP_HOME}/path/to/build/classes

(you have to create the build/classes directory)

Best Regards walid

comments:

+1 for step by step instructions

Other Answer3

  1. First compile the Java files as told by walid:

    javac -classpath path-to-hadoop-0.19.2-core.jar .java-files -d folder-to-contain-classes
    
  2. Create jar file of application classes:

    jar cf filename.jar *.classes
    

    In either of the, whether you are exporting the classes into jar file or using specific folder to store class files , you should define HADOOP_CLASSPATH pointing to that particular class file or folder containing class file. So that at the time of running Hadoop command it should know where to look specified for main class.

  3. set HADOOP_CLASSPATH

    export HADOOP_CLASSPATH=path-to-filename.jar
    

    or

    export HADOOP_CLASSPATH=path-to-folder-containing-classes
    
  4. Run using Hadoop command:

    hadoop main-class args
    

Other Answer4

You do not necessarily need a jar file, but did you put MaxTemperature in a package?

If so, say your MaxTemperature.class file is in yourdir/bin/yourpackage/, all you need to do is:

export HADOOP_CLASSPATH=yourdir/bin
hadoop yourpackage.MaxTemperature

comments:

Thank you for clean answer

Other Answer5

after you make your class a jar file:

hadoop jar MaxTemperature.jar MaxTemperature

basicly :

hadoop jar jarfile main [args]