I
I
Ivan2020-09-14 10:47:41
bash
Ivan, 2020-09-14 10:47:41

grep replacing character when writing to file?

Good afternoon, I'm very new to Bash, could you please tell me?
I have 29 document.xml pieces in subdirectories, while some of them are in 1251 encoding and some of them are in UTF, I search in turn like this

grep -h -r -m1 xdms:number */*document.xml | cut -f 2 -d '>' | cut -f 1 -d '<'
А26-55550000
А26-98765
А26-98765
А26-1111
А26-6666
А26-2222
А26-7777
А26-8989
�48-9999
�48-10101010
�48-111111000
�48-1212000
�48-10


grep -h -r -m1 xdms:date */*document.xml | cut -f 2 -d '>' | cut -f 1 -d '<'
2020-07-10
2020-07-10
2020-07-10
2020-07-10
2020-07-10
2020-07-10
2020-07-10
2020-07-10
2020-07-14
2020-07-14
2020-07-14
2020-07-14
2020-07-14


grep -h -r xdms:header */*document.xml | grep uid= | cut -f 3 -d '=' | cut -f 2 -d '"'
fe042606-0147-4bdd-9cfd-598e2694dda9
fe042606-0147-4bdd-9cfd-598e2694dda9
fe042606-0147-4bdd-9cfd-598e2694dda9
fe042606-0147-4bdd-9cfd-598e2694dda9
fe042606-0147-4bdd-9cfd-598e2694dda9
fe042606-0147-4bdd-9cfd-598e2694dda9
fe042606-0147-4bdd-9cfd-598e2694dda9
fe042606-0147-4bdd-9cfd-598e2694dda9
197C0D87-15CD-4019-B7B0-ED101EDC75A1
066677AA-EB8B-45FB-A4CD-99B2D2239F33
51429AE7-3A66-4713-804B-82FB8D760731
EA50943C-AA47-4978-A5C9-A6A6583EF65C
22331428-64B9-40E7-A6E0-A4A488280C9D


Tell me how can I write these values ​​​​to the file in columns alternately and replace the symbol � with the letter P so that it looks something like

P48-1212000; 2020-07-14; EA50943C-AA47-4978-A5C9-A6A6583EF65C
and so all 29 lines from

UPD greps
Write each grep to its own file then combine

paste number.txt date.txt uid.txt > all.txt

We get this
�48-104875; 2020-07-14; B2E2BF37-7F61-4BD1-A3BA-59A5C870847F;
�48-104888; 2020-07-14; AE3DC4F7-4A76-49EC-8ED8-219A92596667;
�48-104912; 2020-07-14; 481D61F3-733C-48FB-9C50-4DE447078473;
�48-104916; 2020-07-14; 79784F13-9B7B-4E8D-B49A-C932B3808E69;

And how can I catch and replace this character from 1251 now? with the letter P?
sed -i 's/�/P/g' all.txt Does
nothing, If you look at all.txt through the mc editor, this character is replaced by a dot

Answer the question

In order to leave comments, you need to log in

3 answer(s)
I
Ivan, 2020-09-14
@vanchezz

eventually

#!/bin/sh

grep -h -r -m1 xdms:number */*document.xml | cut -f 2 -d '>' | cut -f 1 -d '<' | awk '{print $0""}' > number.txt
grep -h -r -m1 xdms:date */*document.xml | cut -f 2 -d '>' | cut -f 1 -d '<' | awk '{print $0""}' > date.txt   
grep -h -r xdms:header */*document.xml | grep uid= | cut -f 3 -d '=' | cut -f 2 -d '"' | awk '{print $0""}' > uid.txt
paste -d";" number.txt date.txt uid.txt > all.txt
cat all.txt | iconv -f WINDOWS-1251 -t UTF-8 > allutf.txt
sed -i 's/Рђ/А/g' allutf.txt

at the exit
А26-6666;2020-07-10;fe042606-0147-4bdd-9cfd-598e2694dda9
А26-2222;2020-07-10;fe042606-0147-4bdd-9cfd-598e2694dda9
А26-7777;2020-07-10;fe042606-0147-4bdd-9cfd-598e2694dda9
А26-8989;2020-07-10;fe042606-0147-4bdd-9cfd-598e2694dda9
П48-9999;2020-07-14;197C0D87-15CD-4019-B7B0-ED101EDC75A1
П48-10101010;2020-07-14;066677AA-EB8B-45FB-A4CD-99B2D2239F33
П48-111111000;2020-07-14;51429AE7-3A66-4713-804B-82FB8D760731
П48-1212000;2020-07-14;EA50943C-AA47-4978-A5C9-A6A6583EF65C
П48-10;2020-07-14;22331428-64B9-40E7-A6E0-A4A488280C9D
П48-104852;2020-07-14;3D596B50-750D-47BE-ABAC-E1FAC09C7802
П48-104855;2020-07-14;6F9150F0-98E7-49FB-B340-CAF572235782
П48-104875;2020-07-14;B2E2BF37-7F61-4BD1-A3BA-59A5C870847F

V
Viktor Taran, 2020-09-14
@shambler81

give an example file, it will be easier

S
Saboteur, 2020-09-15
@saboteur_kiev

You can get twisted like this

(while true;do read A;read B;read C;[ -z $A ] && break;echo "$A;$B;$C"; done)<<<$(find . -name "*document.xml" -exec grep -oPm3 '(header.*uid="\K[A-F0-9-]*|date>\K[0-9-]*|<xdms:number>\K[^<]*)' {} \;) | sed  -e s'/�/П/'g

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question