[Gllug] Find non-7-bit characters in files
Russell Howe
rhowe at siksai.co.uk
Thu Jun 16 20:23:45 UTC 2005
On Thu, Jun 16, 2005 at 06:02:44PM +0100, Richard Jones wrote:
> Here's a small Thursday afternoon puzzler for everyone.
>
> I hae a large number of files (HTML files in fact, not that it
> matters). A clueless^Wevil web monkey^Wdesigner has hidden bytes in
> them that are in the range 0x80 - 0xff, so the files aren't valid
> UTF-8.
>
> I want to find those characters. Preferably quickly from the command
> line.
Assuming these are windows-1252 files, why not just do this?
find /somewhere -type f -name '*.html' -exec recode windows1252..utf8
?
Then you even get the UTF8 version of whatever character said monkey
used...
--
Russell Howe | Why be just another cog in the machine,
rhowe at siksai.co.uk | when you can be the spanner in the works?
--
Gllug mailing list - Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug
More information about the GLLUG
mailing list