Sunday, February 24, 2013

PowerShell: Get-Hash to filter file duplicates

You can find duplicate files and filter out one copy of each set into a new directory, leaving behind all the copies.  This is possible thanks to the PowerShell Community Extension modules PSCX get-hash cmdlet.

(Note: I've created a more extensive script utilizing get-hash and taglib.  Click here for a more comprehensive music filter technique)

1. Download the PSCX zipped folder and copy it's contents into your PowerShell modules folder ($env:psmodulepath).  I used version 3.0 to run in PowerShell 3.0.

2. Import-module pscx; careful using help.  I've had my shell lock up several times when asking for command help -- no matter the command. Run update-help to fix this problem.

3.  Go to a populated folder and type get-hash .\*.* to ensure the module and get-hash command are working.  You'll see path and hashstring objects for each file.

4.  Create a destination folder for your filtered files (i.e. e:\test\hash)

5. Run the following command from the root of the folder structure you wish to examine.

getchild-item -file -recurse|get-hash|sort hashstring -unique|%{move $_.path e:\test\path}

#BTW the -file option for GCI is new for PowerShell 3.0 otherwise you'd
#have to put in the |?{!$_.psiscontainer}| section after the gci to exclude
#subfolders

You now have one copy of each file in your new filtered folder.  All other copies are left in their original locations.

Obviously you can alter the command line to create subfolders, move based on various properties, etc., but basically your work is done.

If you have problems with duplicate file names when moving them to the new folder, you can have your filename duplicates create subfolders based on their hashstring, which will be unique, and be placed into them.

if ((gci -ea 0 e:\test\hash\($_.path('\')[-1]))){md -ea 0 e:\test\hash\$_.hashstring;move $_.path e:\test\hash\$_.hashstring}

1 comment:

Unknown said...

I got the solution when i used a utility DuplicateFilesDeleter - tool for finding and deleting duplicate files,