This is a perfect use case for a length limited collections.deque
:
from collections import deque
line_history = deque[maxlen=25]
with open[file] as input:
for line in input:
if "error code" in line:
print[*line_history, line, sep='']
# Clear history so if two errors seen in close proximity, we don't
# echo some lines twice
line_history.clear[]
else:
# When deque reaches 25 lines, will automatically evict oldest
line_history.append[line]
Complete explanation of why I chose this approach [skip if you don't really care]:
This isn't solvable in a good/safe way using for
/range
, because indexing only makes sense if you load the whole file into memory; the file on disk has no idea where
lines begin and end, so you can't just ask for "line #357 of the file" without reading it from the beginning to find lines 1 through 356. You'd either end up repeatedly rereading the file, or slurping the whole file into an in-memory sequence [e.g. list
/tuple
] to have indexing make sense.
For a log file, you have to assume it could be quite large [I regularly deal with multi-gigabyte log files], to the point where loading it into memory would exhaust main memory, so slurping is a
bad idea, and rereading the file from scratch each time you hit an error is almost as bad [it's slow, but it's reliably slow I guess?]. The deque
based approach means your peak memory usage is based on the 27 longest lines in the file, rather than the total file size.
A naïve solution with nothing but built-ins could be as simple as:
with open[file] as input:
lines = tuple[input] # Slurps all lines from file
for i, line in enumerate[lines]:
if "error code" in line:
print[*lines[max[i-25, 0]:i], line, sep='']
but like I said, this requires enough memory to hold your entire log file in memory at once, which is a bad thing to count on. It also repeats
lines when two errors occur in close proximity, because unlike deque
, you don't get an easy way to empty your recent memory; you'd have to manually track the index of the last print
to restrict your slice.
Note that even then, I didn't use range
; range
is a crutch a lot of people coming from C backgrounds rely on, but it's usually the wrong way to solve a problem in Python. In cases where an index is needed [it usually isn't], you usually need the value too, so
enumerate
based solutions are superior; most of the time, you don't need an index at all, so direct iteration [or paired iteration with zip
or the like] is the correct solution.
The h
old buffer is good for storing a line [or group of lines] until some later test proves true. In other words, it is good for handling sequences of data which you want sequential but are not yet sequential - because it enables you to stick them together. But it also requires a lot of copies between the two buffers. This isn't so bad if you're building up a series of lines with H
old commands - just appending - but every time you
ex
change buffers you copy the whole of one to the other and vice-versa.
When you're working with a series of lines which are already sequential, and you want to prune them based on context, then the better way to go is with look-ahead - as opposed to the h
old-buffer's look-behind. cuonglm does this for the second half of his answer already - but you can use that logic for
either form.
sed '$!N;/\nage.*: 10/P;D' outfile
See, that will append the N
ext input line following an embedded \n
ewline delimiter to the current pattern space on every line which is !
not the $
last. It then checks if the line just pulled matches a pattern, and, if so it P
rints only up to the first \n
ewline in pattern space - so only the preceding line. Last, it D
eletes up to the first \n
ewline in pattern space and starts the cycle again. So throughout the file you
maintain a one-line look-ahead without unnecessarily swapping buffers.
If I alter the command only a little you can see specifically how it works - by sliding over the file with a two-line window throughout. I'll add a l
ook command just before the D
:
sed '$!N;/\nage.*: 10/P;l;D'
Name is : sara
Name is : sara\nage is : 10$
age is : 10\nName is : john$
Name is : john\nage is : 20$
age is : 20\nName is : Ron$
Name is : Ron
Name is : Ron\nage is : 10$
age is : 10\nName is : peggy$
Name is : peggy\nage is : 30$
age is : 30$
That's its output. The lines which end in $
are the result of the l
ook command - which renders an escaped version of pattern space to stdout. The lines which do not end in $
are those
which would otherwise be P
rinted. As you can see, the previous line is only P
rinted when the second line in pattern space - the N
ext line as just pulled in and which follows the \n
ewline in pattern space - matches your pattern.
Besides the solutions already offered you, another way you might go for printing only Name lines preceding an age line which does not end in 10:
sed -n '/^Name/N;/ 10$/!s/\nage.*//p'
...which only appends a \n
ewline followed by the
N
ext input line if pattern space begins with the string Name, and only p
rints a line to output if pattern space does not end with the string 10 and if sed
can successfully s///
ubstitute away a \n
ewline followed by the string age and all that follows until the tail of pattern space. Because there cannot be a \n
ewline in pattern space except as the result of an edit command - such as N
ext - the ensures that the only
Name lines printed are those immediately preceding an age line which does not end in the string 10.
All of the syntax used in the above answer is POSIX standard - it should work as written with any sed
which supports the standard.