Using grep/sed/awk to identify log documents access

I require to refine a large log documents with several lines in various layouts.

My objective is to extract one-of-a-kind line access that have the very same beginning pattern, as an example '^ 2011 - 02 - 21. *MyKeyword. *Mistake', properly getting a checklist of examples for each and every line pattern, consequently recognizing the patterns.

I just recognize a couple of patterns until now, and also checking out the documents by hand is most definitely not the alternative.

Please note that besides the well-known patterns, there is a variety of unidentified ones also, and also I would certainly such as to automate removing those too.

What is the most effective means to do this? I do recognize normal expressions fairly well, yet have not done much collaborate with awk/sed which I visualize would certainly be made use of at some time in this procedure.

0
2019-05-18 22:27:35
Source Share
Answers: 1

If I recognize appropriately, you have a number of patterns, and also you intend to extract one suit per pattern. The adhering to awk manuscript need to suffice. It publishes the first event of the offered pattern, and also documents that the pattern has actually been seen so as not to publish succeeding events.

awk '
/^2011-02-21.*MyKeyword.*Error/ {
    if (!seen["^2011-02-21.*MyKeyword.*Error"]++) print;
    next;
}
1 {if (!seen[""]++) print}  # also print the first line that matches no pattern
'

Here is a version that maintains one MyKeyword.*Error line daily.

awk '
/^[0-9]{4}-[0-9]{2}-[0-9]{2}.*MyKeyword.*Error/ {
    if (!seen[substr($0,10) "MyKeyword.*Error"]++) print;
    next;
}
'
0
2019-05-21 07:03:28
Source