Archive for September, 2011

findstr may not always work as expected

I was a big fan of good old findstr until I spend around 2 hours on build issues after I did a massive find and replace on our source depot.

The issue was following:

Consider you have a simple text document like this. Assume everything in your sources are always in Windows 1252 character set. For source files one may not think about encoding too much, however it’s important. If you save the sources in 8 bit encoding (Windows 1252) or in UTF 8, findstr will not have issues finding the string you are searching for (as long as it’s in Windows 1252 encoding). However if you save the sources in Unicode encoding (Codepage 1200) or Unicode Big Endian encoding (Codepage 1201), then findstr will not find the files.

G:\temp>dir /b
file1 - unicode - big.txt
file1 - unicode.txt
file1 - utf8.txt
file1.txt
G:\temp>findstr /i example *
file1 - utf8.txt:This is an example.
file1.txt:This is an example.

I also had GNU grep for win32 that I downloaded from http://gnuwin32.sourceforge.net/packages.html. It has the same problem:

G:\temp>"c:\Program Files\GnuWin32\bin\grep.exe" -R example *
file1 - utf8.txt:This is an example.
file1.txt:This is an example.

I installed latest version of Cygwin and it’s grep had the same problem:

/cygdrive/g/temp
$ grep -R example *
file1 - utf8.txt:?This is an example.
file1.txt:This is an example.

Comparing the files makes it obvious if a textreader doesn’t understand from unicode, it has absolutely no change to find the searched text in these files:

 

Solutions

Visual Studio

The best solution I think is using Visual Studio:

  • It’s free (any free express edition has this functionality)
  • It has good Regex functionality (like grep, unlike findstr)
Cons:
  • It’s not command line.
  1. From the Edit -> Find and Replace menu, select Find in Files.
  2. To configure which path you want to search click on [...] button.
  3. Navigate to the top level folder you want to search and using the chevrons add it to the selected folders.
  4. Hit OK and you’re back to the Find and Replace screen. You may choose to display file names only (good for checking out the files first or some other processing)
  5. The results show up on the Find Results pane.
Find and Replace with Regex is also possible with Visual Studio:

 

Powershell

Powershell is prone to text encoding, combination of 2 commands will give you same functionality as grep:

Get-ChildItem -recurse -include * | Select-String -CaseSensitive "example"

 

Comments