W
W
wolron2020-05-28 16:02:22
Python
wolron, 2020-05-28 16:02:22

How to get all values ​​of the "title" key from the parsed page code?

The page was parsed into a variable. It has HTML tags and some JSON code as a set of keys and values.
The question is how to write the value of a certain key in this pile of different data?

Here is part of the code for the parsed page:

[<form action="/fater/json/productlist?type=PRODUCT_LIST" class="js-ajax-request" data-ajax='{"url":"/fater/json/productlist?type=PRODUCT_LIST" ,  "requestParamSeries": [{"name":"standardFilters","checkFor":"values"},{"name":"rangeFilters","checkFor":"minValue","checkFor2":"maxValue"}], "dataRendering":true, "method":"POST"}' data-ajax-id="ajax-productlist" data-current-state-ajax-uri="/fater/json/productlist?categoryString=outletproducts&amp;type=PRODUCT_LIST" id="command" method="post" onsubmit="return false;"><script data-init-data="ajax-productlist" data-no-initial-callup="true" type="application/json">
                        {"response":{"cheap":"0","expensive":"0","items":[{"productIndex":"0","sku":"KA90IVI20R","type":"shop","title":"Холодильник Side by Side","isInComparison":false,"comparable":true,"productsInComparisonSize":0,"headers":["iQ500","Холодильник Side by Side","","177 x 91 cm","Inox-easyclean","KA90IVI20R"],"price":{"value":164990.0,"displayValue":"164 990,00 ₽"},"stockStatus":{"trafficLight":"green","text":"[G11]","buyable":true,"permanentlyNotAvailable":false},"link":"/fater/outlet/KA90IVI20R?breadcrumb=","productImage":{"src":"//media3.123.com/Product_Shots/{width}x{height}/MCSA00762608_E6797_KA90IVI20G_407519_def.jpg","alt":"KA90IVI20R"},"hookline":"Холодильник coolDuo серии iQ 500 типа \"side-by-side\" с технологией NoFrost, дополнительно оснащен дозатором для воды и льда.","keyBenefits":["Многопоточная система охлаждения multiAirflow обеспечивает равномерное распределение воздуха и охлаждение на всех уровнях холодильника.","Технология noFrost защищает от образования инея и избавит вас от необходимости размораживать холодильник.","Холодильник шириной 70см - существенное увеличение полезного объема для хранения продуктов.","Функция superFreezing понижает температуру на заданный промежуток времени, чтобы быстрее заморозить только что добавленные продукты.","Функция superCooling, или 'суперохлаждение', уменьшает температуру на заданное время,


In the script itself:
import requests # модуль для парсинга
from bs4 import BeautifulSoup #модуль для парсинга

s = requests.Session()
loging = s.get(URL, headers=HEADERS, params=None)
soup = BeautifulSoup(loging.content, 'html.parser')


Example : key "title" , value "Refrigerator Side by Side". And write this data into a dictionary. As a result, a dictionary should appear with all the values ​​of the "title" keys that occur in the page code. There are dozens of them there.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
S
soremix, 2020-05-30
@SoreMix

The full code of the page is not visible, if there are a lot of JSON objects, but they can be found and separated from each other, then first you should find all the JSON strings, translate them, in fact, into JSON ( json.loads(s)) and there, already running through all the keys, look for the title.
I can offer a simpler option - regular expressions.

import re

titles = re.findall(r'[\'\"]title[\'\"]: [\'\"](.+?)[\'\"]', loging.text)

[\'\"]to support two options,
{"title": "Холодильник Side by Side"}and{'title': 'Холодильник Side by Side'}

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question