D
D
davin4u2013-11-23 22:20:00
go
davin4u, 2013-11-23 22:20:00

Why does http.Get() hang in Go?

There is a task consisting in downloading a large number of pages. At first I used the goquery library to make it easier to get the text of the blocks I needed, but I noticed that in some situations the program simply stopped, without errors, without stopping the program, it just hung. I thought that the problem was in this library, I abandoned it and began to get the page code in the standard Go way - http.Get(url) but the problem did not disappear. It was only possible to find out that the program stops at the http.Get() call. What could be the problem?

func getPage(url string)(HTML string, err error){
  res, err := http.Get(url) 
  if err != nil { 
    return "", err
  } 
  defer res.Body.Close() 
  body, err := ioutil.ReadAll(res.Body) 
  if err != nil { 
    return "", err
  } 
  lenp := len(body) 
  return string(body[:lenp]), nil
}

Answer the question

In order to leave comments, you need to log in

2 answer(s)
I
Ivan Egorov, 2013-11-27
@davin4u

I would hang up the timeout (reduce it to an acceptable one).
If no response is received within N seconds, return an error.
Timeout is created by adding a custom http.Transport to the http.Client constructor (lines 15-20).

Program code with timeout.
package main

import (
  "bytes"
  "errors"
  "fmt"
  "io/ioutil"
  "net"
  "net/http"
  "time"
)

func getPage(url string, timeout time.Duration) (HTML string, e error) {
  client := &http.Client{
    Transport: &http.Transport{
      Dial: func(network, addr string) (net.Conn, error) {
        return net.DialTimeout(network, addr, timeout)
      },
      ResponseHeaderTimeout: timeout,
    },
  }

  req, e := http.NewRequest("GET", url, nil)
  if e != nil {
    return "", errors.New(fmt.Sprintf(`http.NewRequest failed: %s`, e.Error()))
  }

  resp, e := client.Do(req)
  if e != nil {
    return "", errors.New(fmt.Sprintf("client.Do failed: %s)", e.Error()))
  }
  defer resp.Body.Close()

  bodyAsBytes, e := ioutil.ReadAll(resp.Body)
  if e != nil {
    return "", errors.New(fmt.Sprintf("ioutil.ReadAll failed: %s)", e.Error()))
  }
  bodyAsBuffer := bytes.NewBuffer(bodyAsBytes)

  return bodyAsBuffer.String(), nil
}

func main() {
  HTML, e := getPage("http://google.com/", time.Duration(1*time.Second))
  if e != nil {
    fmt.Printf("[ERROR] %s\n", e.Error())
  } else {
    fmt.Printf("[INFO] %s\n", HTML)
  }
}

If timeout is set like this (1 nanosecond), then the program will display an error message:
>>> HTML, e := getPage("http://google.com/", time.Duration(1*time.Nanosecond))
[ERROR] client.Do failed: Get http://google.com/: i/o timeout)

If timeout is set like this (5 seconds), then the program will display HTML pages:
HTML, e := getPage("http://google.com/", time.Duration(5*time.Second))
[INFO] <!doctype html><html itemscope="" itemtype="http://schema.org/WebPage"><head><meta content="........

I would make a pool of download tasks. If some did not succeed, I would restart them after a while. In my opinion, this is a typical pool - worker task:
1) You can solve it through resque + goworker for example. This gives visibility. Fallen tasks can be immediately restarted, to see what went wrong. The statistics are cumulative;
2) You can solve it by creating a goroutine as a pool and N goroutines as workers. And to create the logic of processing the fallen tasks yourself.

A
Anatoly, 2013-11-23
@taliban

Maybe they ban you on the side where you download from? They see that you start pouring everything and stupidly turn on some kind of protection. Try setting the timeout to a small one.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question