Return value from stream in python 3?

A

astrotrain2016-11-11 19:42:47

Python

astrotrain, 2016-11-11 19:42:47

Actually a question. Threads in python work smartly, but I didn’t find how to normally return a value from it. Here is an example code:

import requests
import random
import re
import queue

pattern = 'http://www.astateoftrance.com/episodes/episode-'
k = 700
lock = threading.Lock()

def getPage():
  #print ("Hello")
  global k
  while (k < 800):
    url = pattern+str(k)+'/'
    response = requests.get(url)
    content = response.content.decode('utf-8')
    contLow = content.lower()
    #print(contLow)
    if (re.findall('', contLow)):
      print(url)
    lock.acquire()
    k = k + 1
    lock.release()

threads = []
for i in range(20):
  t = threading.Thread(target=getPage)
  threads.append(t)
  t.start()

Somewhere they suggest using multiprocessing instead of threading, someone advises to override the thread class so that it returns a value. Isn't there a more direct way to get a value from a stream? Or do I need to use a helper solution?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

R

Roman Kitaev, 2016-11-11
@deliro

Threads in python work smartly

Is that some kind of compliment? Threads in python, like python itself, are slow. And also GIL is.
Value you can write down in threadsafe structures. For example, in a list, and in the main thread, check if it is long enough and if not, sleep for 0.1 second. Just remember to catch timeouts and errors.
PS What you want to achieve is conveniently done through aiohttp / eventlet

A

asd111, 2016-11-12
@asd111

Usually, streams do not return values, but write them to a list or somewhere else.
I would write your example like this (by the way, all downloaded pages are in the results):

from multiprocessing.dummy import Pool as ThreadPool
from pprint import pprint

import requests

pattern = 'http://www.astateoftrance.com/episodes/episode-'
start_url = 700
urls_list = []


def gen_urls(start):
    for i in range(start, 800):
        url = pattern + str(i) + '/'
        urls_list.append(url)


def my_url_get(url):
    result = requests.get(url)
    print("{url} was Downloaded".format(url=url))
    return result


gen_urls(start_url)
pprint(urls_list)

pool = ThreadPool(20)
results = pool.map(my_url_get, urls_list)
pool.close()
pool.join()

pprint(results[0].content.decode('utf-8'))