Have links to documents in VK changed (hashes)?

I

ilonikso2020-06-11 13:34:30

JavaScript

ilonikso, 2020-06-11 13:34:30

Good afternoon, a week ago I created a similar question and the SoreMix user helped me, but this problem arose again ((

[Link to the question] Have the links to documents and VK files changed?

In short, there is a discussion in VK in which users post the package for the game Own Game (SIGame) and to make life easier for people, I decided to choose the best packages and write them to Google doc, which everyone could use and easily search for packages by topic.And

now VK has changed the hashes in the links for all attached files and the links in my spreadsheet SoreMix _

suggested me the direction, but unfortunately, I can’t contact him anymore ((

SoreMix gave the following advice:

1. Collect all links from the table
2. Collect all links from the topic in VK
3. Each link from the table, truncated to a hash, compare with all links from the VK topic, also cut off
4. If the links match, save them somewhere in the list
5. Read all columns with links in the table, if the cut link matches the cut restored one, then write the east link in this cell of the table

But unfortunately, I have practically no experience with vk api, google doc api and other tools to implement this ((

If anyone has a similar experience, or can give advice, I will be glad for any help.

Sincerely yours, Ilya

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

S

soremix, 2020-06-11
@ilonikso

Yes, I forgot by accident, I rarely use telegram. I have a lot of different hardcode, because I did it one-time.
Now there is no way to glue everything into one file, I can just give
the resources. First you need to get all the links that are.
second.xlsx - the name of the dock with data from the table
urls.txt - a text file where links to the docks will be saved VK
The tables were different in structure, so there are two options, one when there is just a link to the dock, the second option when the link is made in the form of a hyper links

from openpyxl import load_workbook
import time
import re

wb = load_workbook('second.xlsx')

with open('urls.txt', 'w', encoding='utf-8') as f:

    for sheetname in wb.sheetnames:
        sheet = wb[sheetname]

        for i in range(1, sheet.max_row+1):
            content = sheet.cell(row=i, column=4).value
            if content:

                f.write(content + '\n')
                
                '''url = re.search(r'=HYPERLINK\("(.+?)"', content)
                
                if url:
                    f.write(url.group(1).split('?')[0] + '\n')'''

The second script for parsing the topic and searching for all the documents that were uploaded to the topic.
app_id - application ID from https://vk.com/apps?act=manage
login, password - login and password from VK
First, all the docks are collected, then the links that were saved in the last step are read.
We run through each dock, cut it to the ? sign to get the base, i.e. vk.com/document123_123 We
run through all the URLs from the previous step, if the base part of the dock is in the URL, then the new link is written to the restored.txt file

import vk_requests
import time
import json

app_id = 'todo'
login = 'todo'
password = 'todo'

api = vk_requests.create_api(app_id=app_id, login=login, password=password)


def get_docs():

    all_docs = []

    comments_count = api.board.getComments(group_id=135725718, topic_id=34975471, count=1, offset=3)['count']

    for x in range(comments_count//100 + 1):

        print('Parsing {}/{} page'.format(x, comments_count//100 + 1))

        comments = api.board.getComments(group_id=135725718, topic_id=34975471, count=100, offset=x * 100)['items']

        for comment in comments:
            attachments = comment.get('attachments', None)

            if attachments:
                for attachment in attachments:
                    if attachment['type'] != 'doc':
                        continue
                    
                    attachment_url = attachment['doc'].get('url', None)

                    if attachment_url:
                        all_docs.append(attachment_url)

        time.sleep(0.3)

    return all_docs



if __name__ == '__main__':
    
    docs = get_docs()

    with open('urls.txt', 'r', encoding='utf-8') as f:
        urls = f.readlines()

    with open('restored.txt', 'w', encoding='utf-8') as f:
        for doc in docs:
            base_doc = doc.split('?')[0]

            for url in urls:
                if base_doc in url:
                    f.write(doc + '\n')

The last script reads the source file with the table, reads all links from the previous step. Runs through all columns with links, if the base of this link is in the list of restored links, then the cell overwrites the restored link. Next,
second.xlsx is saved - a table with data
restored.txt - a file with new links from the previous step

from openpyxl import load_workbook
import re


wb = load_workbook('second.xlsx')

with open('restored.txt', 'r', encoding='utf-8') as f:
    restored_urls = f.readlines()


for sheetname in wb.sheetnames:
    sheet = wb[sheetname]

    for i in range(1, sheet.max_row+1):
        content = sheet.cell(row=i, column=4).value

        if content:
            url = content
            base_url = url.split('?')[0]

            for restored_url in restored_urls:
                if base_url in restored_url:
                    sheet.cell(row=i, column=4).value = restored_url
            
        
        '''if content:
            url = re.search(r'=HYPERLINK\("(.+?)"', content)
                
            if url:
                base_url = url.group(1).split('?')[0]

                for restored_url in restored_urls:
                    if base_url in restored_url:
                        cell_text = '=HYPERLINK("{}";"Скачать")'.format(restored_url)
                        sheet.cell(row=i, column=4).value = cell_text'''


wb.save('restored2.xlsx')