F
F
Fyodor Buzinov2017-12-06 15:22:21
Scrapy
Fyodor Buzinov, 2017-12-06 15:22:21

What is the correct way to save data with Scrapy?

Good afternoon!
You need to get project data from the page.
Data, all on one page:
project name project
description
team
team - there are usually a lot of participants here and here you need to get: Name, status in the project, links to the profile
Now I do this

class ProjectItem(scrapy.Item):
    id = scrapy.Field()
    name = scrapy.Field()       
    team = scrapy.Field()

I am already passing json to the team as a string
for people in all_team:
    ...
    team.append({
                "id": id,
                "full_name": full_name,
                "current_position": current_position,
                "website": website,
            })

l.add_value('team', json.dumps(team))

Then I plan to save it later in the nosql database, through pipelines . I plan to create
two tables: project and team.
It feels like a solution with storage as a string, it seems to me so-so.
Tell me, is this a suitable method or do I need to use two Items for the project at once and separate commands, how to connect them in this case?
Is it possible to store a list in the team field without converting it to a string?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
E
Evgen, 2017-12-06
@Verz1Lka

1) In scrapy, starting with the first major version, it is no longer necessary to define scrapy.Item, you can use a regular dict.
2) Yes, you can store a field of any format in Item, it could well be a list of dict.
3) You can create two types of Item, but then you will have to check pipelines for its type, in my opinion it’s easier to use item of the same type and parse it into pipelines as you like, by writing queries to insert into 2 NoSQL tables.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question