N
N
nirvimel2017-02-02 21:45:19
linux
nirvimel, 2017-02-02 21:45:19

How to organize system-wide monitoring of the reading of certain files with control over the volume of what has been read?

There is a large file storage area with movies, clips and music, open locally via Samba. People come in, watch, listen... To put the trash in order, you need to collect the following statistics on viewing/listening: how often each particular file was accessed, what percentage of the file volume was read in one access session on average (did a person watch the clip before end, or to the middle, or cut down immediately as soon as I understood the essence). The problem is also that when you simply enter the network directory in Dolphin, Nautilus and any graphic viewer, thumbnails are generated, that is, there is read access, just like when you open it in the player. I plan to deal with this by filtering the logs by ratio = read_size / total_file_size, for this, again, you need to know the volume of access read in one session.
To clarify: The goal is not to control access by users, but to control resources by collecting statistics on their demand.
From the beginning, I looked towards writing my own FUSE driver (I haven’t had to do this yet) to transparently transparently pass requests with logging. But questions remain about the performance of such a solution and the pitfalls of using Samba over FUSE.
Then I thought about using auditd and writing a script that would parse the /var/log/audit/audit.log output (more precisely, a process (one for the entire system) can be attached to the dispatcher in auditd.conf). But, firstly, it is very ugly, as it requires root rights to any change in monitoring parameters, and the script itself is run as root (very bad). But the main problem is that the syscall log gives too low an abstraction level, that is, you have to keep track of all open descriptors and each file poiter yourself in order to get a log like:

Reading from: "./movie.mkv"; offset: 0x78563; length: 0x89332
You need to keep inside the analogue of the pointer to each descriptor of each process and move it after each operation (except for read, take into account write, fseek, maybe something else), you can not skip (unrecognized, for example) not a single syscall. In this case, there will be too many rules for calculating the file poiter and changing descriptors (duplication / closing / reopening), I doubt that I can take into account all edge cases.
Perhaps I'm reinventing the wheel and don't know about the existence of a ready-made solution?
In which direction should you dig? What do you advise, gentlemen?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
A
Alexey Cheremisin, 2017-02-02
@leahch

Linux has a wonderful mechanism for monitoring files and directories inotify. So you don't have to leave samba and write your fs to fuse.
https://ru.m.wikipedia.org/wiki/Inotify

O
Oleg Batalov, 2017-02-03
@badmilkman

try the vfs modules of samba itself, at least you can collect statistics on file popularity from their logs

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question