Answer the question
In order to leave comments, you need to log in
Building bigdata architecture based on kinesis+spark streaming?
Hello, there is a task:
To build a flexible architecture for processing big data in real time (stream).
IOT and mobile devices will be used as producer's.
I chose the following stack kinesis (for streaming) + cluster with spark streaming for data processing.
I've watched conferences on best practice designs and patterns and looked at the docks on AWS. In principle, everything is clear, but there are a couple of points that I did not understand.
1) Kinesis captures data from end devices in streams and transfers them to spark streaming, in spark streaming we process the data and save it to the database (which database is better to choose (DynamoBD?), while there will only be json, so alternatives to NoSQL solutions that can be I don’t see integration with such a stack). There is an alternative here only as local storage - s3, I read that it is more suitable for files, pictures and videos, and in general, in principle, for batch processing.
2) I only know Python well, I don’t have time to master scala and java, is it possible to configure everything only using python code? (do not count producers)
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question