B
B
bsideup2015-10-28 21:47:01
big data
bsideup, 2015-10-28 21:47:01

machine learning. How to do pattern derivation on a large amount of data?

Hello. I am looking for at least some materials on the topic of searching for the same type of URLs among the incoming data stream.
For example, we receive a large amount of data like:
/api/users/11
/api/users/10
/api/users/10a
/api/users/
/api/users/10/events/99999/
/api/users /abc/events/99999/
It is necessary that the system eventually be able to categorize them by type:
/api/users/11 -> /api/users/*
/api/users/10 -> /api/users/*
/api/users /10a -> /api/users/*
/api/users/ -> /api/users/
/api/users/10/events/99999/ -> /api/users/*/events/*
/api/users/ abc/events/99999/ -> /api/users/*/events/*
Thanks!

Answer the question

In order to leave comments, you need to log in

1 answer(s)
R
Roman Mirilaczvili, 2015-10-31
@2ord

A URL path can be thought of as a directed graph.
Each part of the path separated by a slash represents a node.
Duplicate nodes can be merged together if they match certain node patterns and the frequency of such nodes is equal to one (the URL of some product on the site is unique despite repetitions in requests in the logs). Let's say only numeric values ​​(/1/, /2/, /999/) or nodes generated for permalink (/kakoe-to-nazvanie-statii-bloga/).
Additional reference: Graph clustering and community search.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question