How to perform a mapreduce task using hadoop-streaming?

D

denislysenko2021-11-27 22:14:29

Python

denislysenko, 2021-11-27 22:14:29

I created a cluster in google cloud.
this is how I am trying to do a mapreduce task

[email protected]:~$ $HADOOP_HOME/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar \
> -D mapred.map.tasks=1 \
> -D mapred.reduce.tasks=1 \
> -input /movies.csv \
> -output /result \
> -file ~/mapreduce_hadoop/homework5/mapreduce_hadoop/mapper.py ~/mapreduce_hadoop/homework5/mapreduce_hadoop/reducer.py \
> -mapper "python mapper.py" -reducer "python reducer.py"
JAR does not exist or is not a normal file: /usr/lib/hadoop-mapreduce/hadoop-streaming.jar

but the error pops up JAR does not exist or is not a normal file: /usr/lib/hadoop-mapreduce/hadoop-streaming.jar

how to fix it?

If it matters, here is what hdfs outputs dfs -ls /

[email protected]:~$ hdfs dfs -ls /
Found 4 items
-rw-r--r--   2 denislysenko0001 hadoop     484688 2021-11-27 18:58 /movies.csv
drwxrwxrwt   - hdfs             hadoop          0 2021-11-26 20:30 /tmp
drwxrwxrwt   - hdfs             hadoop          0 2021-11-27 15:15 /user
drwx-wx-wx   - hive             hadoop          0 2021-11-26 20:30 /var
[email protected]:~$