F
F
febarabash2019-07-17 13:08:28
Amazon Web Services
febarabash, 2019-07-17 13:08:28

How to use Scrapy with AWS Lambda?

There are 2 applications: on flask, and on scrapy. Each of them is flooded into a separate lambda through zappa. The application faces have 3 endpoints, each of which is through SQS tiggerit scrap lambda. The trigger itself works fine, but there are 3 questions:
1) Is it possible to somehow remove the limit on the performance of lambda on scrapie? (I found an opportunity to increase the limit to only 15 minutes, during this time scrapy does not have time to collect all items)
2) Is it possible to flush through this sqm lambda without API Gateway through SQS, and whether it is possible to flood the application through zappa so that the api gateway is not created. Or do I need to fill in scrapes manually?
3) If you cannot trigger lambdas without API Gateway, then how can I return the correct response?
Now I have the following function:

def lambda_event(event, context):
  try:
    data = json.loads(event['body'])
    scrapy_settings = get_project_settings()
    scrapy_settings['ITEM_PIPELINES'] = {
      'sunbiz_spiders.pipelines.DynamodbPipeline': 300,
    }
    scrapy_settings['DOWNLOAD_DELAY'] = 0.5
    process = CrawlerProcess(settings=scrapy_settings)
    if data['spider_name'] == 'SearchByPersonSpider':
      spider = SearchByPersonSpider
    elif data['spider_name'] == 'GetDetailSpider':
      spider = GetDetailSpider
    else:
      spider = SearchByNameSpider
    process.crawl(spider, search_params=data['spider_name'])
    process.start()
  except Exception:
    pass

  return {
    'statusCode': 200,
    'body': json.dumps('All done.'),
  }

config zappa:
{
    "production": {
        "app_function": "main.lambda_event",
        "aws_region": "us-east-1",
        "profile_name": "default",
        "project_name": "sunbiz-search-s",
        "runtime": "python3.6",
        "s3_bucket": "zappa-envjkpiz6"
    }
}

And when prompted I get list index out range werkzeug / test.py line 1146

Answer the question

In order to leave comments, you need to log in

1 answer(s)
I
Ivan Shumov, 2019-07-17
@inoise

1. No, u can't increase time limit. Maybe u ok with increasing lambda memory? Or can u separate your lambda via AWS StepFunctions?
2. U do not need use zappa at all and u really can use sqs and lambda without api gateway
3. I have not got enough python expertise. Sorry

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question