M
M
MarkizaSckuza2016-04-20 01:50:22
MapReduce
MarkizaSckuza, 2016-04-20 01:50:22

How to run MapReduce2 program remotely from Windows?

Hello.
I want to write a simple example on MapReduce v2 (Hadoop YARN) and run it remotely.
What was done:
1. Installed hortonworks sandbox on my VirtualBox. The connection works, when you try to go to http:\\localhost:8888, the Hadup start page opens.
2. Wrote a simple "word count" example:

public class WordCount {

    public static void main(String[] args) throws IOException {
        JobConf job = new JobConf(WordCount.class);


        job.set("yarn.resourcemanager.address", "hdfs://localhost:8032");
        job.set("yarn.nodemanager.address", "hdfs://localhost:8041");
        job.set("yarn.nodemanager.localizer.address", "hdfs://localhost:8040");
        job.set("mapreduce.jobhistory.address", "hdfs://localhost:10020");

        job.set("fs.defaultFS", "hdfs://localhost:8020");
        job.set("hbase.zookeeper.quorum", "hdfs://localhost:2888");

        job.setJarByClass(WordCount.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        job.setMapperClass(Map.class);
        job.setReducerClass(Reduce.class);

        job.setInputFormat(TextInputFormat.class);
        job.setOutputFormat(TextOutputFormat.class);

        FileInputFormat.addInputPath(job, new Path("text"));
        FileOutputFormat.setOutputPath(job, new Path("output.txt"));

        JobClient.runJob(job);
    }

    public static class Map implements Mapper<LongWritable, org.apache.hadoop.io.Text, org.apache.hadoop.io.Text, IntWritable> {
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(LongWritable longWritable, Text text, OutputCollector<Text, IntWritable> outputCollector, Reporter reporter) throws IOException {
            String line = text.toString();
            StringTokenizer tokenizer = new StringTokenizer(line);

            while (tokenizer.hasMoreTokens()) {
                word.set(tokenizer.nextToken());
                outputCollector.collect(word, one);
            }
        }

        public void close() throws IOException {}

        public void configure(JobConf jobConf) {}
    }

    public static class Reduce implements Reducer<Text, IntWritable, org.apache.hadoop.io.Text, IntWritable> {

        public void reduce(Text text, Iterator<IntWritable> iterator, OutputCollector<Text, IntWritable> outputCollector, Reporter reporter) throws IOException {
            int sum = 0;

            while (iterator.hasNext()) {
                sum += iterator.next().get();
            }

            outputCollector.collect(text, new IntWritable(sum));
        }

        public void close() throws IOException {}

        public void configure(JobConf jobConf) {}
    }
}

And I get this Error: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
I found some solutions on StackOverFlow, the bottom line is that you need to set the Environment Variable. But the problem is that I want to connect to hadup on a remote machine, but it doesn't exist on my Windows.
What am I doing wrong? Maybe I missed something?
Hadoop version is 2.7.1.2.4.0.0-169
HDP 2.4.0.0-169
Java 8
Windows 10

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question