[Gllug] Find non-7-bit characters in files

Russell Howe rhowe at siksai.co.uk
Thu Jun 16 20:23:45 UTC 2005


On Thu, Jun 16, 2005 at 06:02:44PM +0100, Richard Jones wrote:
> Here's a small Thursday afternoon puzzler for everyone.
> 
> I hae a large number of files (HTML files in fact, not that it
> matters).  A clueless^Wevil web monkey^Wdesigner has hidden bytes in
> them that are in the range 0x80 - 0xff, so the files aren't valid
> UTF-8.
> 
> I want to find those characters.  Preferably quickly from the command
> line.

Assuming these are windows-1252 files, why not just do this?

find /somewhere -type f -name '*.html' -exec recode windows1252..utf8

?

Then you even get the UTF8 version of whatever character said monkey
used...

-- 
Russell Howe       | Why be just another cog in the machine,
rhowe at siksai.co.uk | when you can be the spanner in the works?
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug




More information about the GLLUG mailing list