How to compare two xlsx files?

M

maximka122017-04-20 21:35:28

SQL

maximka12, 2017-04-20 21:35:28

Task:
Two tables are given. Both are designed differently. But both have a "NUMBER" column. It is necessary that two files are compared if the cell values in the NUMBER column in the first and second files are the same, the lines in which these values are located are written to another sheet or Excel workbook. How to implement it?

Reply

Answer the question

In order to leave comments, you need to log in

5 answer(s)

F

freeExec, 2018-05-05
@freeExec

There is only one source - open data of the federal / municipal authorities, whether they are good or not, everyone decides for himself.

T

Timokins, 2018-05-05
@timokins

These services do not have an API. Everything is in manual, semi-manual mode.
As already mentioned above - only open data and they are. If you didn't find it, then you didn't search well.
The website of the Ministry of Internal Affairs has a detailed report on the regions of the Russian Federation in one file. Easy to parse.
If you are interested in cities separately, then the path is the same: go to the regional Ministry of Internal Affairs, select a city and see a report on this city.
On other points also, except for the quality of roads.

A

Alexander Aksentiev, 2018-05-05
@Sanasol

https://vk.com/dev/database
More data and more convenient than here, there is probably no such data anywhere.

E

Elvis, 2017-04-21
@maximka12

I wrote this for an hour, probably (because in general what libraries had to be used). The decision in the forehead, maybe somewhere is not true, but it seems to do what is needed (if I correctly understood what is needed).

import pandas as pd
import xlrd as xr
excelreed = xr.open_workbook('1.xlsx')
oneexcel = excelreed.sheet_by_index(0)
excelreed = xr.open_workbook('2.xlsx')
twoexcel = excelreed.sheet_by_index(0)

list_j = []
for i in range(1,oneexcel.nrows):
    for j in range(1,twoexcel.nrows):
        if twoexcel.row_values(j)[0] == oneexcel.row_values(i)[0]:
            list_j.append(oneexcel.row_values(i) + twoexcel.row_values(j)[1:])
df = pd.DataFrame(list_j)
df.to_excel('out.xlsx', header=False, index=False)

Screenshots:
https://s.mail.ru/4GV2/xZo3GG6uR
https://s.mail.ru/3HT1/Sxs4MUztB

S

shushpanio, 2017-04-21
@shushpanio

I suggest a variant through "Zho $ y" but it will work:
1. do so that in both tables the NUMBER column is leftmost.
2. copy the values from the column number from both tables to a separate sheet (1 immediately below the 2nd)
3. select duplicate values through conditional formatting
4. select Filter by color by filter - get a list of duplicate values
5. transfer the resulting list to another sheet and delete duplicates
6. using VLOOKUP, pull the information from the original
But I repeat - the solution is through one place. For a one-time action, a ride.
For frequent use, macros and scripts will help you (I can’t help here)