Hadoop Unable to Find Mapper

Tag: hadoop , mapreduce Author: erming324 Date: 2012-07-04

I get ClassNotFoundException when I run my Hadoop job (new API - 1.0.3). I have a Main class that contains static MapClass and ReduceClass as nested classes.

I configure my job as follows:

    Job job = new Job();

    job.setJarByClass(Main.class);
    job.setJobName("My Job");

    job.setMapperClass(Main.MapClass.class);
    job.setReducerClass(Main.ReduceClass.class);

Main class is as follows:

public class Main {
    //Nested static Mapper
public static class MapClass extends Mapper<Text, Text, Text, Text> {

    @Override
    public void map(Text key, Text value, Context context) {
            ...
    }
}

    // Nested static Reducer
public static class ReduceClass extends Reducer<Text, Text, Text, Text> {

    @Override
    public void reduce(Text key, Iterable<Text> values, Context context) {
        ...
    }
}

I did not export jar from the project as Eclipse creates Main.class, Main$MapClass.class and Main$ReduceClass.class files inside the bin directory of the project folder, which I think should be part of classpath. However the job is unable to find the Mapper class:

java.lang.RuntimeException: java.lang.ClassNotFoundException: MapClass
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:867)
at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:199)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:719)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

What is the problem?

Best Answer

Try exporting a jar file of your project. Then use this jar in the following command:

hadoop jar "jar name" package.subpackage.DriverClass [-conf configXML] 
inputDir outputDir

The Driver class is the MainClass. The conf option is optional but recommended and it specifies a file where you set specific configuration options.

comments:

That works. If I export the Eclipse project as a jar and add it back to the project as an external jar, the program runs. However Main.class, Main$MapClass.class and Main$ReduceClass.class files are sitting in bin folder of Eclipse project, why does it not pick it from there? And when I tried adding this bin folder as an external library, the program gave the same exception. Looks like adding jar is the only option working.
Because you say this: job.setJarByClass(Main.class); And there is no such jar

Other Answer1

If you are running as a Java application in Eclipse, Eclipse doesn't ship all the needed files i.e. Mapper and Reducer to Hadoop. Use Eclipse Plugin for Hadoop to run applications directly in Hadoop from Eclipse.