I'm... not sure that's the kind of Friday-at-2AM embarrassing mistakes you want to put on your blog if you sell yourself as an elite consultant?
Not using `-iname`, using `-print0` and being surprised to see NULs appearing, the weird pipe + xargs instead of just `-exec`, using some hyper-convoluted way of replacing the NULs instead of just man find... that's probably not the best advertisement for “decades of consulting experience ”.
> the weird pipe + xargs instead of just `-exec
I think that part makes sense though as it's simply the older idiom before -exec existed. (That's the reason why both find and xargs have specific flags related to /0-delimited filenames that are basically counterparts to each other)
Also, shouldn't the two be roughly equal in efficiency? To my knowledge, xargs (without -i) does the same command aggregation that -exec ... + does.
So the number of "grep" processes spawned by the two commands should be roughly the same, I think.
Being an experienced consultant doesn't mean you know everything about everything.
Hell, I put myself in that camp and I am perfectly capable of hacks, cludge, and mistakes.
Be compassionate.
To be fair, he is a mathematical consultant who uses computer tools, not a specialist in computer tools.
I thought exactly the same. Those who can do, do. Those who cannot do, teach. Those who cannot teach, consult.
But instead of dwelling on prejudices I decided to try my own solution. See https://news.ycombinator.com/item?id=42163286
Why insult teachers?
Find is one of those tools I use seldom enough that I completely forget how to use it, but is also complex enough that when I do need it I have to spend way too long studying the man page to figure out the right incantations.
In almost all cases I just want something simple, like finding a file somewhere on disk with based on a partial filename.
Now there are probably some nice, more modern tools made for this, but usually when I need it it's on some system where they're not present and I can't just install random stuff from the interwebs. So find it is...
You're using -print0 and surprised that it's output has NUL characters between them?
Was puzzled about that too, especially since his solution "find ... -print0 | strings" undoes the advantage that -print0 gives you, i.e. safe handling of filenames with newlines in them (and his "sed" solution straight-up undoes the -print0 completely).
So with all due respect to the author, I wonder if he was just using -print0 after rote-learning it as part of the find command (or having had some tutor implore "ALWAYS use -print0"), without knowing what it does.
> There may be better solutions [1], but my solution was to insert a call to strings in the pipeline
The "right" answer is to switch to using -print rather than -print0
-print delimits the values with a newline character (\n) -print0 delimits the values with a null character (\0)
Not always perfectly right because an argument containing a filename containing a space character will be interpreted as 2 arguments.
No it won’t, because none of the output is interpreted as an argument. It’s passed as lines to grep. The second invocation correctly uses print0 and pairs with xargs to understand this.
Now, it does fail with filenames that have newlines in them, but who would do such a thing!
I wrote "Not always perfectly right" thinking about all cases, not this particular one: in nearly all cases (bar being absolutely sure there is no blank character anywhere) -print0 (and therefore xargs -0) seems better to me, and it sure saved me on many occasions. Better let "find" do all the work it can, including filtering filenames.
Certainly, which is why I put quotes around right, but for this usage, it's not an issue. Find prints the whole path on a single line (including the spaces) and grep (by default) puts the full matched line, so you'll still get the full file path regardless of how many spaces are in it.
Am I the only one who has gone all in on using "-exec +"?
find . -name '*.py' -type f -exec grep -il {} +
I've switched away from find entirely, and now use "fd" whose exec functionality is quite straightforward to use.
That solves only the second part of their task. The part which they actually had no problem with. But I agree the exec + solution feels better then the xargs -0 solution.
Agreed. The first part of that task just seemed to be a misunderstanding of what -print0 is, and using `strings` as the fix is weird. I'm surprised they didn't suggest `tr '\0' '\n'`... :-)
I recommend giving ripgrep a try. (it's been around awhile now) https://github.com/BurntSushi/ripgrep
It's not compatible with grep though. How do you search for a square bracket?
$ grep '[][]' </dev/null
$ rg '[][]' </dev/null
rg: regex parse error:
(?:[][])
^^
error: unclosed character class
$
And why does it search the current directory when its input is redirected from /dev/null? What other surprises are there?It's compatible or close enough with more modern regex syntaxes. Which are probably familiar to a lot more people than grep. Want to search for square brackets, then escape them (or do a a string literal search with -F)
So much faster than grep for these things! Love ripgrep! I also use it to rip apart directories of log files. Super convenient
The first line they start with is utter nonsense. find -print0 will not produce lines, but records (or strings) separated by NUL. But grep is a tool working with lines (separated by LF). No mystery that it cannot work.
Using -print0 is necessary if you have filenames containing LF chars. Otherwise just use -print and grep and everything should be fine.
Now how do we handle NUL separated records? That required a bit of thinking, the Unix world is based so much on lines. Without extensive testing the following awk program seems to work:
BEGIN { RS = "\0" }
$0 ~ regexp
Call with awk -v 'regexp=what I search for'
In their script that would be awk -v "regexp=$1"
Edit: Credits for s/whitespace/LF chars/ go to user hnfongWhen grepping for filenames print0 is needed only when the files have new lines in them. (Which is quite degenerate.) grep works fine with spaces and tabs in the stdin
Thanks! Updated.
Instead of `find -name '*.py' | grep -i "$PATTERN"` you can use `find -iname "*${PATTERN}*.py"` for case-insensitive glob-matched filenames, or mess around with regexes on the whole path with `find -iregex "$REGEX"`.
And yeah, why would you ASCII NUL terminate each filename output by `find` by using `-print0`? I mean, who adds quotes, backslashes or whitespace to their Python source file names?
Why not just globstar in the first place? grep foo **/*ham*py
Pretty convoluted, no?
I would likely use -exec:
$ mkdir dir.py
$ echo blah >> blah.py
$ find . -type f -name "*.py" -exec grep -i BLAH {} \;
blah
$
Edit: Ah, right, he's filtering on filenames. That's what -iname is for. The man file is quite good.You keep using that -print0, I do not think it means what you think it means