O
O
opex132015-03-02 17:21:35
Google Sheets
opex13, 2015-03-02 17:21:35

Google spreadsheet, import from website with Jsp?

There is a link like - ec.europa.eu/taxation_customs/dds2/tra/transit_home.jsp?Lang=fi&MRN=15FI000000010520T8&Expand=true
The task is to extract the last date from the open list using google tables.
ImportHTML replies that there is no table there, because It cannot make a request to Javascript, and ImportXML behaves in approximately the same way.
I would be glad for any tips on the right path or crutches.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
A
Alexander Ivanov, 2015-03-03
@oshliaer

Hello.
It's called something like "browser on server".
In theory, we must force the server to load the page to ourselves, call all client procedures, take the context and parse it, and send the result to the network.
Problems:

  • Where can I get/run/beg/configure the server for mere mortals?
  • What!?

Idea:
  • Organize part of the automation in the workplace.
  • Publish data, aka "publish the result to the web", on Google Drive.

Step by step:
  1. It is necessary to download and unpack phantomjs.org/download.html Let the folder with the phantomjs.exe file be called
    'C:\phantomjs\bin'
  2. You need to download the file parse_jsp_to_file.js Save it to the same place where the phantomjs.exe file is located
  3. Need to install Drive for Mac/PC
  4. You need to set up a folder on the disk for public access (Visible to everyone on the Internet). Let's call it FOLDER.
  5. The FOLDER folder needs to be synced to Drive for Mac/PC. May her path be
    'C:\Диск Google\FOLDER'
  6. Open the parse_jsp_to_file.js file with a text editor and change the line
    var folder = 'C:\\_\\Google Drive\\temp (main)\\(public)\\';

    on the line *watch for slashes. Setting specified for Windows
    var folder = 'C:\\Диск Google\\FOLDER\\';

  7. It remains to run the command line and paste there
    'C:\phantomjs\bin\phantomjs.exe parse_jsp_to_file.js'

  8. If the yyyy-mm-dd.xml file appeared in the C:\Google Drive\FOLDER folder, then you can go further, otherwise, pick and check.
  9. You need to get the ID of the FOLDER folder on the Disk. This is the longest number/letter combination in a shared address.
  10. You need to take the IMPORTXML() formula from the goo.gl/GUpaf2 file
  11. It is necessary to replace the ID in this formula with your own.
  12. All.

It works like clockwork for me.
Problems:
  • I didn't add exception handlers.
  • JS file must be encoded in UTF-8
  • I added a BAT file for my setup. But I have C:\phantomjs\bin in my environment variables.

What is the result:
  • BAT can be configured to run every day.
  • Files are written one for each day.
  • In Tables, it becomes possible to track the date the context was changed.

PS:
For multiple links, you just need to use another simple_cycle.js script . It needs the existence of a file.txt file , which describes the parameters that should form the URL for the request.
The result in the "Multiple File Example" worksheet is goo.gl/GUpaf2 .

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question