Answer the question
In order to leave comments, you need to log in
I want to start a project (startup) using Big Data - where to start?
There is an idea for a startup using big data and visualization, where you can start studying the topic.
It is also planned to analyze the collected statistical data (ideally, millions of records) to see if anyone has experience of how this is generally implemented and in which direction you can look (study).
Answer the question
In order to leave comments, you need to log in
First you need to understand if you need Big Data habrahabr.ru/post/194434
If you don’t have Big Data, then you can take these tools:
1. Pandas - data processing, I / O
2. Sklearn - building models
3. In in terms of the database for storage, options are possible:
3.1 SQL-bases - SQLite, postgres
3.2 NoSQL - Mongo, etc.
4. If it is expected that some of the data will be used more actively, i.e. you need hot caching - take Redis
or
its
analogues
Apache Hive - to store all this in a digestible form
Apache Spark - to build predictive models and all sorts of non-classical groupings
Things are more complicated with visualization. First you need to understand what kind of visualization is needed - static or dynamic + the language in which it will be more convenient for you personally to write visualization.
If we visualize in static (in .jpg files, for example), then like this:
R - lattice, ggplot2
Python - matplotlib, seaborn
If
we want super cool real-time dashboards, then like this:
R - Shiny
Python - bokeh
what data sources you have, it will be easier to understand what to dig and what tools.
a million is not bigdata. For starters, I advise you to watch www.youtube.com/watch?v=TEHdfPa1eJA
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question