Answer the question
In order to leave comments, you need to log in
Have links to documents in VK changed (hashes)?
Good afternoon, a week ago I created a similar question and the SoreMix user helped me, but this problem arose again ((
[Link to the question] Have the links to documents and VK files changed?
In short, there is a discussion in VK in which users post the package for the game Own Game (SIGame) and to make life easier for people, I decided to choose the best packages and write them to Google doc, which everyone could use and easily search for packages by topic.And
now VK has changed the hashes in the links for all attached files and the links in my spreadsheet SoreMix _
suggested me the direction, but unfortunately, I can’t contact him anymore ((
SoreMix gave the following advice:
1. Collect all links from the table
2. Collect all links from the topic in VK
3. Each link from the table, truncated to a hash, compare with all links from the VK topic, also cut off
4. If the links match, save them somewhere in the list
5. Read all columns with links in the table, if the cut link matches the cut restored one, then write the east link in this cell of the table
But unfortunately, I have practically no experience with vk api, google doc api and other tools to implement this ((
If anyone has a similar experience, or can give advice, I will be glad for any help.
Sincerely yours, Ilya
Answer the question
In order to leave comments, you need to log in
Yes, I forgot by accident, I rarely use telegram. I have a lot of different hardcode, because I did it one-time.
Now there is no way to glue everything into one file, I can just give
the resources. First you need to get all the links that are.
second.xlsx - the name of the dock with data from the table
urls.txt - a text file where links to the docks will be saved VK
The tables were different in structure, so there are two options, one when there is just a link to the dock, the second option when the link is made in the form of a hyper links
from openpyxl import load_workbook
import time
import re
wb = load_workbook('second.xlsx')
with open('urls.txt', 'w', encoding='utf-8') as f:
for sheetname in wb.sheetnames:
sheet = wb[sheetname]
for i in range(1, sheet.max_row+1):
content = sheet.cell(row=i, column=4).value
if content:
f.write(content + '\n')
'''url = re.search(r'=HYPERLINK\("(.+?)"', content)
if url:
f.write(url.group(1).split('?')[0] + '\n')'''
import vk_requests
import time
import json
app_id = 'todo'
login = 'todo'
password = 'todo'
api = vk_requests.create_api(app_id=app_id, login=login, password=password)
def get_docs():
all_docs = []
comments_count = api.board.getComments(group_id=135725718, topic_id=34975471, count=1, offset=3)['count']
for x in range(comments_count//100 + 1):
print('Parsing {}/{} page'.format(x, comments_count//100 + 1))
comments = api.board.getComments(group_id=135725718, topic_id=34975471, count=100, offset=x * 100)['items']
for comment in comments:
attachments = comment.get('attachments', None)
if attachments:
for attachment in attachments:
if attachment['type'] != 'doc':
continue
attachment_url = attachment['doc'].get('url', None)
if attachment_url:
all_docs.append(attachment_url)
time.sleep(0.3)
return all_docs
if __name__ == '__main__':
docs = get_docs()
with open('urls.txt', 'r', encoding='utf-8') as f:
urls = f.readlines()
with open('restored.txt', 'w', encoding='utf-8') as f:
for doc in docs:
base_doc = doc.split('?')[0]
for url in urls:
if base_doc in url:
f.write(doc + '\n')
from openpyxl import load_workbook
import re
wb = load_workbook('second.xlsx')
with open('restored.txt', 'r', encoding='utf-8') as f:
restored_urls = f.readlines()
for sheetname in wb.sheetnames:
sheet = wb[sheetname]
for i in range(1, sheet.max_row+1):
content = sheet.cell(row=i, column=4).value
if content:
url = content
base_url = url.split('?')[0]
for restored_url in restored_urls:
if base_url in restored_url:
sheet.cell(row=i, column=4).value = restored_url
'''if content:
url = re.search(r'=HYPERLINK\("(.+?)"', content)
if url:
base_url = url.group(1).split('?')[0]
for restored_url in restored_urls:
if base_url in restored_url:
cell_text = '=HYPERLINK("{}";"Скачать")'.format(restored_url)
sheet.cell(row=i, column=4).value = cell_text'''
wb.save('restored2.xlsx')
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question