Answer the question
In order to leave comments, you need to log in
How to use Scrapy with AWS Lambda?
There are 2 applications: on flask, and on scrapy. Each of them is flooded into a separate lambda through zappa. The application faces have 3 endpoints, each of which is through SQS tiggerit scrap lambda. The trigger itself works fine, but there are 3 questions:
1) Is it possible to somehow remove the limit on the performance of lambda on scrapie? (I found an opportunity to increase the limit to only 15 minutes, during this time scrapy does not have time to collect all items)
2) Is it possible to flush through this sqm lambda without API Gateway through SQS, and whether it is possible to flood the application through zappa so that the api gateway is not created. Or do I need to fill in scrapes manually?
3) If you cannot trigger lambdas without API Gateway, then how can I return the correct response?
Now I have the following function:
def lambda_event(event, context):
try:
data = json.loads(event['body'])
scrapy_settings = get_project_settings()
scrapy_settings['ITEM_PIPELINES'] = {
'sunbiz_spiders.pipelines.DynamodbPipeline': 300,
}
scrapy_settings['DOWNLOAD_DELAY'] = 0.5
process = CrawlerProcess(settings=scrapy_settings)
if data['spider_name'] == 'SearchByPersonSpider':
spider = SearchByPersonSpider
elif data['spider_name'] == 'GetDetailSpider':
spider = GetDetailSpider
else:
spider = SearchByNameSpider
process.crawl(spider, search_params=data['spider_name'])
process.start()
except Exception:
pass
return {
'statusCode': 200,
'body': json.dumps('All done.'),
}
{
"production": {
"app_function": "main.lambda_event",
"aws_region": "us-east-1",
"profile_name": "default",
"project_name": "sunbiz-search-s",
"runtime": "python3.6",
"s3_bucket": "zappa-envjkpiz6"
}
}
Answer the question
In order to leave comments, you need to log in
1. No, u can't increase time limit. Maybe u ok with increasing lambda memory? Or can u separate your lambda via AWS StepFunctions?
2. U do not need use zappa at all and u really can use sqs and lambda without api gateway
3. I have not got enough python expertise. Sorry
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question