Hadoop: Why can't I run a MapReduce task?

F

facha2013-12-12 01:53:13

Hadoop

facha, 2013-12-12 01:53:13

Hello.
I'm taking my first steps in learning Hadoop. Set up a "cluster" of a pair of virtual machines. Can't run MapReduce task.

sudo -uhdfs yarn jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 1 1

In the ResourceManager log it comes to this:

13/12/10 23:30:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1386714123362_0001
13/12/10 23:30:02 INFO client.YarnClientImpl: Submitted application application_1386714123362_0001 to ResourceManager at master/192.168.122.175:8032
13/12/10 23:30:02 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1386714123362_0001/
13/12/10 23:30:02 INFO mapreduce.Job: Running job: job_1386714123362_0001

... and silent. The task remains hanging forever. There are no new messages in the NodeManager log

#hadoop job -list
...
                  JobId	     State	     StartTime	    UserName	       Queue	  Priority	 UsedContainers	 RsvdContainers	 UsedMem	 RsvdMem	 NeededMem	   AM info
 job_1386790587985_0001	      PREP	 1386801628645	        hdfs	     default	    NORMAL	              0	              0	      0M	      0M	        0M	master:8088/proxy/application_1386790587985_0001/

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

A

Anton Martsen, 2013-12-15
@facha

Here are working configs for a hadoop 2.2.0 cluster with master and slave nodes:
etc/hadoop/core-site.xml

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://master:9000</value>
  </property>
</configuration>

etc/hadoop/hdfs-site.xml

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>2</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/usr/local/hadoop/tmp/hdfs/namenode</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/usr/local/hadoop/tmp/hdfs/datanode</value>
  </property>
</configuration>

etc/hadoop/mapred-site.xml

<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
</configuration>

etc/hadoop/yarn-site.xml

<configuration>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>master:8030</value>
  </property>
  <property>
    <name>yarn.resourcemanager.address</name>
    <value>master:8032</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>master:8088</value>
  </property>
  <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>master:8031</value>
  </property>
  <property>
    <name>yarn.resourcemanager.admin.address</name>
    <value>master:8033</value>
  </property>
</configuration>

Also make sure you have everything in order with the /etc/hosts file

A

Anton Martsen, 2013-12-15
@martsen

Faced the same problem. Did you immediately create a cluster of several virtual machines? Have you tried running in pseudo-distributed mode with a single node? That's how it works for me.
Now I'm reconfiguring the cluster a little and in the evening I will write to you in more detail if I was able to overcome this problem. Well, I'll post the solution.