B
B
blacksan2018-07-05 15:55:20
Java
blacksan, 2018-07-05 15:55:20

Photo hosting parsing. How Yandex spars Google Drive dogs?

Hello, there is a photo hosting, after uploading a photo, it becomes available at the link https://fotohost.ru/image/xxxxxxxx, where x is a random symbol from the A-Za-z09 series.
I want to parse photos from it using open links.
Is there any other way to get valid links other than to match xxxxxxxxx by checking the response code?
Is this action legal?
From recent news, I learned that Yandex placed documents from google docs available via links in the search results, does Yandex also go through all possible links, or is there another way to scan all possible domain urls?

Answer the question

In order to leave comments, you need to log in

7 answer(s)
D
Dmitry Alexandrov, 2018-07-05
@blacksan

Yandex can do this because:
1) they have a lot of metrics from all users by which they can immediately know the links
2) their robots can follow the links, i.e. in document X there is a link to document Y and in document Y there is a link to document
Z
Your option is "other than picking up xxxxxxxxx while checking the response code?" for you and will be the only available if you do not find any holes, holes in the api.

V
Victor P., 2018-07-05
@Jeer

Search engines, in fact, are parsers.
As for legality, if the link is in the public domain, then you can watch it. In theory, you can’t use it in your developments, that is, pass off the resources found in this way as your own, you need to indicate “taken from such and such a site”.
As for how you can pull it out, in addition to enumeration, it seems that the same Yandex has a list of pages that it has indexed for a specific site, perhaps this info can only be available to counter owners, dig in this direction

V
Vladlen Grachev, 2014-11-03
@gwer

An example of one of the options in the screenshot (Python).
If in words, then something like this:
1. Expand the original array into a one-dimensional one.
2. Sort by both coordinates (with priority by X).
3. From the resulting one-dimensional array, we make a two-dimensional one, sort the elements of each row by Y.
4. Transpose.
Algorithm sketched offhand and on the forehead. These conditions seem to be met. The principle showed, you will finalize the details yourself (sorting order, etc.).

S
SHVV, 2014-11-03
@SHVV

Is it me or are you trying to invent Delaunay triangulation?
I would like to know more about the source data and the target.

V
Vitaly, 2014-11-06
@vipuhoff

theoretically, I don’t see anything complicated, you can try this way, we look for the center of mass of the points, then we change the coordinates from Cartesian to polar, where the new space is in the center of mass, we determine the coordinates of all points in polar coordinates, sort by angle, take all the points in order and get exactly what is needed.

M
Mrrl, 2014-11-06
@Mrl

The sort result will be prettier if you do this.
Recursion:
- divide the longer side of the matrix into two equal (or differing by 1) parts.
- if the long side was vertical, then sort the elements of the matrix by y, so that elements with less y are in the lower half, and if it is horizontal, then sort by x, so that elements with less x are in the left half. We sort not rows-columns, but the matrix as a whole (as a one-dimensional array).
- repeat the procedure for each of the resulting halves.

A
Alexander Khmelev, 2014-11-13
@akhmelev

Slowly, but without loss of memory - permutations. Those. if speed is not very important, use sorting with at least the same bubble, but two-dimensional. We go through all the elements of the matrix moving along the side diagonals 11 12 21 13 22 31 .... Each element needs to be compared with 8 elements around it (although, in fact, three are enough, right, lower, right-lower) and find the best candidate for exchange of places. Comparison is carried out by the sum of x + y or, better, by the square of the length of the vector (x ^ 2 + y ^ 2). Repeat iteration of the matrix while there are exchanges. Because the heaviest element will fall to the end of the matrix, each new pass can be made not one shorter.
There are many ways to speed things up through additional arrays and indexes. The simplest way, as noted above, is to expand into one-dimensional (by side diagonals), sort by an efficient algorithm (of which there are a sea) according to the same criterion x ^ 2 + y ^ 2, roll back into two-dimensional.
Everything is speculative. We must check.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question