M
M
malvinfch2019-03-30 12:16:51
Scrapy
malvinfch, 2019-03-30 12:16:51

How to properly create a multipart/form-data request in Scrapy?

I'm trying to create a scrape request based on a page that sends a POST request that redirects and returns a link to a PDF.
Here is the data from DevTools:
FormData

hostURL: http://oris.co.palm-beach.fl.us/or_web1/
pdfPath: \\wcp01zfs-03.clerk.local\files2\ORISPDF\
pdfURL: http://oris.co.palm-beach.fl.us/pdf/
pages: 1
id: 22591587
mpages: 1
doc_id: 22591587
page1: image_from_file.asp?imageurl=\\ors_fs\ORImage\O\30338\O.30338.0268.0001.tif
WaterMarkText: 1

headers:
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9,ru;q=0.8,uk;q=0.7
Cache-Control: max-age=0
Connection: keep-alive
Content-Length: 1095
Content-Type: multipart/form-data; boundary=----WebKitFormBoundarysGRfL8zMeuSs4zsH
Cookie: ASPSESSIONIDACDTTCTA=OJBBKBLCGCBLFGODHMLKCIFG; BIGipServer~external~coc_oris.mypalmbeachclerk.com_80=rd102o00000000000000000000ffff978432b7o80; BIGipServer~external~coc_oris.mypalmbeachclerk.com_8080=rd102o00000000000000000000ffff978433b7o8080
DNT: 1
Host: oris.co.palm-beach.fl.us:8080
Origin: http://oris.co.palm-beach.fl.us
Referer: http://oris.co.palm-beach.fl.us/or_web1/details_img.asp?doc_id=22591587&pg_count=1&pg_num=1&click=1
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36

If I try to pass the formdata as a dict, then in one line the last backslash ekseypit a quote. it turns out without the backslash is not the link.
If I just copy Formdata in the form of a source, then some characters are not displayed correctly.
5c9f33a5d457a909428533.png
As a result, the redirect goes to a link like -
http://oris.co.palm-beach.fl.us:8080/PdfServlet/nv_433__img.pdf
and returns 404.
whereas the working link looks like this -
http://oris.co.palm-beach.fl.us/pdf/nv_643_22591587_img.pdf

Here's the code for the form I'm trying to simulate.
<td valign="bottom">
    
        <form name="courtform" action="http://oris.co.palm-beach.fl.us:8080/PdfServlet/PdfServlet27" method="post" enctype="multipart/form-data">
    
    <input type="hidden" name="hostURL" value="http://oris.co.palm-beach.fl.us/or_web1/" size="60">
    <input type="hidden" name="pdfPath" value="\\wcp01zfs-03.clerk.local\files2\ORISPDF\" size="60">
    <input type="hidden" name="pdfURL" value="http://oris.co.palm-beach.fl.us/pdf/" size="60">
    
    <input type="hidden" name="pages" value="1" size="60">
    <!--<input type="hidden" name="pages" value="1" size="60">-->
    <input type="hidden" name="id" value="22590889" size="60">
    <input type="hidden" name="mpages" value="1" size="60">
    <input type="hidden" name="doc_id" value="22590889" size="60">
    
        <input type="hidden" name="page1" value="image_from_file.asp?imageurl=\\ors_fs\ORImage\O\30336\O.30336.1200.0001.tif" size="60">
    
    <input type="hidden" name="WaterMarkText" value="1" size="60">
  
      &nbsp;&nbsp;<input name="button" type="button" value="View PDF" onclick="javascript:ValidateAndSubmit(this.form)">&nbsp;&nbsp;

</form></td>

What am I doing wrong?

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question