[Gllug] Finding files with names that differ by case

Richard Russell richard.a.russell at gmail.com
Thu Sep 20 14:02:27 UTC 2007


I would suggest using your favourite scripting language and do something
like (this may not be optimal but should work):

---
iterate over all filenames, for each one do:
  find canonical name (use a regex to get the file part, then shift to
lowercase - eg /one/two/Three becomes three)
  build a hashmap with the key being the cnaonical name, and the value being
an array (or list) of actual filenames

now iterate over the list of keys, for each one do
  if there's only 1 entry in the array, delete it
  if there's more, do whatever you wanted to do with the files
    I'd also do a check to see if the actual filenames are the same -
/one/two/Three and /one/Three may cause a clash)
---

alternatively, use a series of shell commands and files to do a similar
thing (less efficient, but no development really needed):

----
find to generate the list of files in a file
sed to prepend the canonical name to each line, with a comma separator
sort to group them by canonical name
uniq -df to find duplicated files only
awk to turn this into a list of commands like "echo /one/two/Three >> three"
----

... now you have the same data structure as before (keys are filenames), so
you can do whatever with the contents of each file, like loop over each key,
and process the contents somehow (maybe using head, basename and dirname).

Slightly more convoluted, but ... :-)


Hope this helps.

Cheers

Richard


On 20/09/2007, Jon Dye <jon at pecorous.co.uk> wrote:
>
> Hi,
>
> I have a directory tree full of files and I'd like to find all the files
> whose names only differ in case.  Where files differ by case they
> should be in the same directory as each other, e.g.
>
> /one/two/three
> /one/two/Three
> /one/four
> /one/Four
>
> Does anyone have any suggestions as to how I can achieve this?
>
> This is a result of transferring the files to windows and back and I
> want to delete the duplicates so I only have one copy of each file.
>
> The contents should be identical if that's any help.
>
> JD
>
> --
> "The NRA says guns don't kill people... people kill people. I think the
> gun helps!"
>                 - Eddie Izzard
>
>
> --
> Gllug mailing list  -  Gllug at gllug.org.uk
> http://lists.gllug.org.uk/mailman/listinfo/gllug
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.lug.org.uk/pipermail/gllug/attachments/20070920/efa49cbc/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: signature.asc
URL: <http://mailman.lug.org.uk/pipermail/gllug/attachments/20070920/efa49cbc/attachment.asc>
-------------- next part --------------
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug


More information about the GLLUG mailing list