Regex discrepancy help

I’m pulling my hair out trying to understand why this happens … (and, BTW, the regex pattern is just an example.)

➜  ~ BASE_PATH="/home/martin/Desktop/Larkin Poe [a2487986] (Copy)/"   
➜  ~ SEARCH_PATTERN=".(_.*|DS_Store)*"
➜  ~ find $BASE_PATH -type f -regextype egrep -regex ".*/$SEARCH_PATTERN"
/home/martin/Desktop/Larkin Poe [a2487986] (Copy)/._.DS_Store
/home/martin/Desktop/Larkin Poe [a2487986] (Copy)/._something
/home/martin/Desktop/Larkin Poe [a2487986] (Copy)/.DS_Store
➜  ~ for FILE_PATH in $BASE_PATH${~SEARCH_PATTERN}; do echo $FILE_PATH; done
/home/martin/Desktop/Larkin Poe [a2487986] (Copy)/._.DS_Store
/home/martin/Desktop/Larkin Poe [a2487986] (Copy)/.DS_Store
➜  ~ 

Can anyone cast some light on this? Why does the same pattern give two results? I’m guessing it’s got something to do with the _.* being interpreted differently, i.e., parameter expansion is at work. However, I need a way for the same pattern to be interpreted the same in both find and for.

Incidentally, I’m using -regextype egrep so the search pattern doesn’t need escaping. Likewise, I’m prepending .*/ to the pattern in find because a match is on the whole path.

In regex, a dot matches any single character, not just the literal dot. That also seems to be the case with ‘egrep’ type:

posix-egrep regular expression syntax (GNU Findutils 4.10.0).

That’s probably why the ‘._something’ is included only in the first case. Try escaping the dots with ‘find’.

Thanks for your help, @Marian.

I understand that escaping is sometnimes necessary, but I don’t think this is the whole picture here (I am using egrep not posix_egrep.) But this doesn’t explain the variations.

My theory is that the expanding the ${~expression} is treating substrings ._. and ._ the same, and resets after finding the first match. I’ll have to do more experimenting, or approach the problem another way.

Essentially, I have a script that passes the expression to both find and for constructs, e.g., `script.sh --path --regex , and expression should work in both scenarios.

egrep regular expression syntax (GNU Findutils 4.10.0)

Try without expansions and see if that fixes it.

for the for command, shell is applying pathname expansion to your pattern (which is not as powerful as regular expressions); at least this is the area i would concentrate my investigation

1 Like

You’re absolutely right in suspecting that the discrepancy comes
from how the shell interprets the pattern in the for loop versus
how find interprets it

Here’s a reusable Zsh function that ensures regex-based matching
using find, avoids glob expansion quirks, and handles paths with
spaces or special characters safely:

filter_files_by_regex() {
local base_path=“$1”
local regex_pattern=“$2”
[[ “$base_path” != / ]] && base_path=“${base_path}/”
find “$base_path” -type f -regextype egrep -regex ".
/$regex_pattern" | while IFS= read -r file_path; do
echo “$file_path”
done
}

Usage:
BASE_PATH=“/home/martin/Desktop/Larkin Poe [a2487986] (Copy)/”
SEARCH_PATTERN=“.(._.|DS_Store)

filter_files_by_regex “$BASE_PATH” “$SEARCH_PATTERN”

1 Like

I think this is the point. I can’t control what the shell is doing. I’ve tried just about every way, but can’t get parity between two commands.

Thanks, @PerttiS. As you probably guessed, I’m using globbing to parse files and folders. find is used to test that the at least one file is present before starting the loop.

This method was fine with simple expressions, e.g., *.FLAC, but adding functionality means I need to use regex. I’m not keen on rewriting my scripts, which is why I’d like to find a way that each gives the same results.

Funny thing is I already use a similar loop function elsewhere. I’ll give this a shot, and update you on my progress later.

It sounds like you’re transitioning from simple glob patterns to more complex regex-based file matching while wanting to preserve your current script structure—completely reasonable, especially if you have working code you don’t want to disrupt.

To keep your find logic and still match regex patterns, you might try something like this:

shopt -s nullglob
for file in *; do
if [[ “$file” =~ .FLAC$ ]]; then
echo “Matched: $file”
fi
done

Or, if using find and grep for more flexibility with regex:

find . -type f | grep -E ‘.FLAC$’

This way, you’re bridging the gap without a full rewrite.

@PerttiS, I appreciate your help. I’ve adjusted my code with minimal changes, allowing simple matches, e.g., *.flac, or more complex regex using positional parameters from the command line. Globbing no more!

files=$(find "$base_path" -type f -regextype egrep -regex ".*$regex_pattern")
if [[ -n $files ]]; then
  echo $files | while IFS= read -r file_path; do
  folder="${file_path%/${~regex_pattern}}"
  filename=$file_path:t
  echo "folder[$folder]"
  echo "file_path[$file_path]"
  echo "filename[$filename]"
  done
else
  echo No matches
fi

I’m sure I would have got there in the end, but not this evening. Thanks.