V
V
Vincent12021-11-20 11:03:03
*nix-like systems
Vincent1, 2021-11-20 11:03:03

How to count the number of lines in a file by condition?

We need to count the number of new lines in the log for the last 24 hours that contain `router.php`.
Now I do it like this:
cat example.com.log | grep "router" | wc -l
But this option does not take into account the date. How to add date to search criteria?
log entry example

66.249.70.79 - - [17/Nov/2021:15:15:56 -0500] "GET /router.php HTTP/1.0" 301 274 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Answer the question

In order to leave comments, you need to log in

3 answer(s)
X
xotkot, 2021-11-20
@Vincent1

if you know the exact start and end dates, then it's easy:

awk '/router.php/' example.com.log | awk '/\[16\/Nov\/2021:15:15:56 -0500\]/,/\[17\/Nov\/2021:15:15:56 -0500\]/' | wc -l

ps
if you do it specifically through date, and not stupidly process it as strings, which is easier, but you need to know a specific range, then you need to take into account some nuances, the same locale, since you or someone else who will run this command at home may have an excellent locale for the date than in the log itself,
compare
$ date -d now
$ LANG=ru_RU.UTF-8 date -d now
$ LANG=en_US.UTF-8 date -d now

the locales themselves should naturally be available in the system, you can check it with the localectl utility:
$ localectl list-locales
en_US.UTF-8
ru_RU.UTF-8

also +0300 is not just that, and if you process the script while in a different time zone, then a time shift in one direction or another may occur,
so if you make adjustments for the time zone (% z) + locale, we get:
LANG=en_US.UTF-8 date -d 'now-24hours' +'[%d/%b/%Y:%H:%M:%S %z]'

but the easiest way, as for me, is to convert the date to unix timestamp (the number of seconds from the beginning of the Unix epoch), this will allow us not to bother with the locale and time zone, and the resulting number (seconds) will be absolute and can already be compared with another without any problems number (seconds) from the period that we need.
cat example.ru.log  | awk -F'[][/:]' '/router.php/{"date +%s -d \""$2"-"$3"-"$4" "$5":"$6":"$7"\"" | getline z; print z" "$0}'

here we converted to an understandable date format, and then converted time ( +% s) to Unix and simply added the resulting result to the beginning of the string.
result

1637320437 212.193.33.123 - - [19/Nov/2021:14:13:57 +0300] "GET /router.php HTTP/1.0" 301 445 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
1637320437 212.193.33.123 - - [19/Nov/2021:14:13:57 +0300] "GET /router.php HTTP/1.0" 301 445 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
1637320438 212.193.33.123 - - [19/Nov/2021:14:13:58 +0300] "GET /router.php HTTP/1.0" 301 449 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
1637321647 212.193.33.123 - - [19/Nov/2021:14:34:07 +0300] "GET /router.php HTTP/1.0" 301 449 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
1637321647 212.193.33.123 - - [19/Nov/2021:14:34:07 +0300] "GET /router.php HTTP/1.0" 301 447 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
1637321648 212.193.33.123 - - [19/Nov/2021:14:34:08 +0300] "GET /router.php HTTP/1.0" 301 446 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
1637321650 212.193.33.123 - - [19/Nov/2021:14:34:10 +0300] "GET /router.php HTTP/1.0" 301 445 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
1637321650 212.193.33.123 - - [19/Nov/2021:14:34:10 +0300] "GET /router.php HTTP/1.0" 301 451 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
1637321651 212.193.33.123 - - [19/Nov/2021:14:34:11 +0300] "GET /router.php HTTP/1.0" 301 445 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
1637324092 212.193.33.123 - - [19/Nov/2021:15:14:52 +0300] "GET /router.php HTTP/1.0" 301 449 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
1637326108 212.193.33.123 - - [19/Nov/2021:15:48:28 +0300] "GET /router.php HTTP/1.0" 301 447 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
1637327153 212.193.33.123 - - [19/Nov/2021:16:05:53 +0300] "GET /router.php HTTP/1.0" 301 446 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

the date 1 day (24 hours) ago can be found like this:
date +%s -d 'now-1day'
or even
echo $[`date +%s` - 24*60*60]
just subtracting the number of seconds needed from the current
one, we end up with:
cat example.ru.log |awk -F'[][/:]' '/router.php/{"date +%s -d \""$2"-"$3"-"$4" "$5":"$6":"$7"\"" | getline z; print z" "$0}' | awk -v t=$[`date +%s` - 24*60*60] '$1>=t'

now, if desired, you can easily expand and specify the search range from (t1) to (t2)
cat example.ru.log |awk -F'[][/:]' '/router.php/{"date +%s -d \""$2"-"$3"-"$4" "$5":"$6":"$7"\"" | getline z; print z" "$0}' | awk -v t1=`date +%s -d '19-Nov-2021 15:00:00 +0300'` -v t2=`date +%s -d '19-Nov-2021 16:00:00 +0300'` '$1>=t1 && $1<=t2'

here we are looking in the range of one hour
from 19-Nov-2021 15:00:00 +0300
to 19-Nov-2021 16:00:00 +0300
result
1637324092 212.193.33.123 - - [19/Nov/2021:15:14:52 +0300] "GET /router.php HTTP/1.0" 301 449 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Mobile Safari/537.36 (compatible; Googlebot/2.1; +www.google.com/bot.html)"
1637326108 212.193.33.123 - - [19/Nov/2021:15:48:28 +0300] "GET /router.php HTTP/1.0" 301 447 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Mobile Safari/537.36 (compatible; Googlebot/2.1; +www.google.com/bot.html)"

S
SOTVM, 2021-11-20
@sotvm

the same way, but add another date grep through the "pipe", and only then count
like this:
cat example.com.log | grep "17/Nov/2021" | grep "router" | wc -l
do
you want multiple dates? grep -E
egrep -w 'first_DATA|second_DATA' or similar grep -ew 'first_word|second_word'
do not forget to escape special characters

S
Saboteur, 2021-11-22
@saboteur_kiev

We get an approximate date for the last day with the required accuracy
olddate=$(date -d "-24 hours" "+%d/%b/%Y:%H:%M:%S")
But there is a problem. If there were no records in the past at this second, we will fly by. Therefore, let's round up to at least an hour, but there is still a risk that if the application has been turned off for an hour, then we will not find the starting position at all. I don’t know how to solve this problem, it depends on the fact that you will always have something in the file or not always, and if not always, then everything becomes more complicated. But let's go from simple, just round up to at least an hour:
olddate=$(date -d "-24 hours" "+%d/%b/%Y:%H")
Then, using sed, you can find the text from the specified date to the end of the file, and immediately filter router:
sed -nE "/$olddate/,\${/router/p}" file.log|wc -l
Well, or a one-liner

sed -nE "/$(date -d "-24 hours" "+%d/%b/%Y:%H")/,\${/router/p}" file.log|wc -l

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question