K
K
Kirill Strokin2021-10-20 11:20:07
Parsing
Kirill Strokin, 2021-10-20 11:20:07

How to parse HTML code into a Google Sheets cell?

Is it possible to parse the html code of a table into a cell using Google Sheets?

Importhtml is not suitable because:
1. data is stored in multiple cells;
2. only the content is saved, but the html-code of the table is needed directly <table><tbody><td>, etc.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
G
Grigory Boev, 2021-10-28
@ProgrammerForever

I've used this.

/**
* Открывает URL и возвращает код страницы
* Telegram - @ProgrammerForever
*
* @param {string} URL URL который нужно открыть
* @param {boolean} isCut Указывает, нужно ли обрезать страницу до 50000 символов по длине, по умолчанию false
* @param {boolean} noScript Указывает, нужно ли удалять скрипты из кода
* @return Исходный код страницы
* @customfunction
*/
function getHTML(URL,isCut,noScript) {
  if ((URL === undefined)||(URL == "")) { return "#ОШИБКА Пустой URL";};
  if (isCut === undefined) {var isCut=true;};
  if (noScript === undefined) {var noScript=true;};
  if (URL.map){     //Если задан диапазон
    return URL.map(getHTML);
  }else{
    try {
      var payload = {
      };
      var headers={        
      };
      var options = {
        'method' : 'get',
        'headers' : headers,
        'payload': payload
      };
      
      var response = UrlFetchApp.fetch(URL,options);
      var charset=response.getAllHeaders["charset"];
      var responseText=response.getContentText(charset?charset:"UTF-8");
      if (noScript){ 
        responseText=responseText.replace(/<script[^>]*>(?:(?!<\/script>)[^])*<\/script>/gmi,"");
        responseText=responseText.replace(/<!--.*?-->/gmi,"");
        responseText=responseText.replace(/<link.*?\/>/gmi,"");
        responseText=responseText.replace(/<meta.*?\/>/gmi,"");
        responseText=responseText.replace(/[\n\r\t]/gmi,"");
        
      };
      if (isCut&&(responseText.length>50000)){return responseText.substring(0,50000);}else{return responseText;};
    } catch (err) {
      return "#ОШИБКА "+err.message;
    };
  };
};

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question