N
N
Nikolay Baranenko2017-10-14 03:09:56
Python
Nikolay Baranenko, 2017-10-14 03:09:56

What is an example of a file directory for a python module that can connect remotely to HDFS?

Hello.
I am trying to use this code

from pywebhdfs.webhdfs import PyWebHdfsClient
from pprint import pprint

hdfs = PyWebHdfsClient(host='192.168.0.70',port='50070', user_name='hadoop')  # your Namenode IP & username here
my_dir = 'logs'
pprint(hdfs.list_dir(my_dir))

returns an error stating that there is no such directory
Traceback (most recent call last):
  File "D:/Server/Repositories/projects/um/templates/Test/hdfs.py", line 6, in <module>
    pprint(hdfs.list_dir(my_dir))
  File "C:\Python36\lib\site-packages\pywebhdfs\webhdfs.py", line 482, in list_dir
    _raise_pywebhdfs_exception(response.status_code, response.content)
  File "C:\Python36\lib\site-packages\pywebhdfs\webhdfs.py", line 718, in _raise_pywebhdfs_exception
    raise errors.FileNotFound(msg=message)
pywebhdfs.errors.FileNotFound: b'{"RemoteException":{"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"File /app/dfs/name/data does not exist."}}'

dfs folder is here
/app/dfs/name/data
webhdfs service activated
<configuration>
# Add the following inside the configuration tag
<property>
        <name>dfs.data.dir</name>
        <value>/app/dfs/name/data</value>
        <final>true</final>
</property>
<property>
        <name>dfs.name.dir</name>
        <value>/app/dfs/name</value>
        <final>true</final>
</property>
<property>
        <name>dfs.replication</name>
        <value>1</value>
</property>
<property>
    <name>dfs.webhdfs.enabled</name>
    <value>true</value>
</property>
</configuration>

What is an example of a file directory for a python module that can connect remotely to HDFS?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
N
Nikolay Baranenko, 2017-10-17
@drno-reg

first you need to create a folder for example /examples and put the Reutov_mos_obl.csv file there and then access it

from pywebhdfs.webhdfs import PyWebHdfsClient
from pprint import pprint


hdfs = PyWebHdfsClient(host='hadoop01',port='50070', user_name='hadoop')  # your Namenode IP & username here
my_dir = '/examples/Reutov_mos_obl.csv'
pprint(hdfs.list_dir(my_dir))

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question