J
J
Justl2014-05-29 22:18:37
C++ / C#
Justl, 2014-05-29 22:18:37

Monitor site content changes. (C#)

Hello. I am writing term paper. Tell me what algorithms are suitable for optimal content tracking? At the moment, it has been studied - parsing XML, and comparing image hashes. Tell me sources where you can read about tracking methods and algorithms, if a source with examples would be ideal.
Thank you.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
I
Ilya, 2014-05-30
@Gorily

A very interesting question. Are you trying to make a definition for any site or a specific one? If any, then some block on the page or the entire page?
Immediately pop-up pitfalls:
1. What about sites where content is loaded dynamically (Ajax)?
2. If the page has a display of the current time or a state dependent on it (for example: "written 10 minutes ago") - is this considered a change or not? If not, how to universally monitor and ignore?
etc.
Or do you just describe all possible algorithms in your coursework? If so, then you should look towards the algorithms of the cache servers.

Q
Qiev, 2014-06-02
@Qiev

You can do it by analogy with a ready-made solution implemented in Python
https://thp.io/2008/urlwatch/
Functionality:
shows changes on a web page line by line, similar to how it is done in a version control system (diff).
It has dynamic content filtering mechanisms.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question