Saturday, March 2, 2013

PowerShell: Music Files, Get-Hash, and creating filtered folders

As music files are collected, it can be found that some files will have exact names even though the songs may be different versions.  Another problem is getting exact duplicate songs with different file names.  I created a script which filters both scenarios and moves one file-hash-based copy of exact files to a new folder structure based on the Artist name -- leaving hash duplicates in the original folders.  If a second exact-named but not a hash-duplicate file attempts to move into the Artist's folder, the script makes a subfolder named after the files hash then moves the file to it.  

I've only tested this script using PowerShell 3.0.
Disclaimer: This script will create new folders and move files from their original location to new ones.  Script errors can result in misnamed files.

I'll explain more as we go over the script:

#Taglib is described in my previous blog
#Click here to learn how to install this ID3 tag utility

$taglib = "C:\PowerShell\taglib\libraries\taglib-sharp.dll"
[system.reflection.assembly]::loadfile($taglib)|out-null

#Select the destination root folder where you want 
#your processed files to go. Make sure to leave the
#trailing backslash

$destination = "e:\hash3\"

#The hashtable variable will gather all your music files from the 
#current folder and all subfolders and process their hash strings.
#Click here to learn how to install Get-Hash

$hashtable = gci -file -recurse -include *.flac,*.ape,*.ra,*.m4a,*.mp2,*.wma,*.mp3|get-hash

#We add a new Hashtable property for the incoming Artist object and
#assign an empty value "" as placeholder

$hashtable|add-member -membertype noteproperty -name Artist -value ""

#Now we select each file in the hashtable and resolve the 
#artists name.  If the file is not a valid music file, we
#write the error file name to the console and add it to a
#text file with other invalid files.

foreach ($a in $hashtable){

$fileperformers = $nul

$filealbumartists = $nul
$fileartist = $nul
$albumartist = $nul

#Here's the try/catch for the taglib which is trying to load
#the ID3 file values.  If it finds any ID3 errors, it'll move the file to 
#an "invalid Audio File"\hash-string subfolder.


try{
$media = [taglib.file]::create($a.path)
}
catch [exception]{
write-host -foregroundcolor "yellow" ($a.path + " - invalid audio file")
$a.path|out-file c:\powershell\hasherrors.txt -append
$fileartist = "Invalid Audio File"
if (!(test-path ($destination+$fileartist+"\"+$a.hash))){md ($destination+$fileartist+"\"+$a.hash+"\")}
move $a.path ($destination+$fileartist+"\"+$a.hash+"\")
}

#We pull the Artist name and filter brackets and untypical characters



$fileartist = $media.tag.performers
if ($fileartist){
$fileartist = $fileartist -replace ("\[","")
$fileartist = $fileartist -replace ("\]","")
$fileartist = $fileartist -replace ("[^0-9a-zA-Z-&\']"," ")
}

#Next we pull the Album Artist name and process it

$albumartist = $media.tag.albumartists
if ($albumartist){
$albumartist = $albumartist -replace ("\[","")
$albumartist = $albumartist -replace ("\]","")
$albumartist = $albumartist -replace ("[^0-9a-zA-Z-&\']"," ")
}

#We see if the performers tag was a real value
#if not, we try to use the album artist value
#if no performer or album artist value, we use "Unknown"

if (!$fileartist){$fileartist = $albumartist}
if (!$fileartist){$fileartist = "Unknown"}

#We enter the result into the added Artist object

$a.Artist =$fileartist
}

#We sort the resulting hash table by unique hashes
#this gives us a list of unique hashes, leaving duplicates
#behind

$hashtable = $hashtable|sort hash -unique


#We create new destination folders, check for exact file names
#in the new folders, and if exist, then create hash-named folders
#and place the file into it
#we also bypass error files which have already been moved

foreach ($h in $hashtable){
if (!(test-path $h.path)){continue}
if (!(test-path ($destination+$h.artist))){md ($destination+$h.artist)}

if ((test-path ($destination+$h.artist+"\"+$h.path.split('\')[-1]))){
md ($destination+$h.artist+"\"+$h.hash)
move $h.path ($destination+$h.artist+"\"+$h.hash)}

if (!(test-path ($destination+$h.artist+"\"+$h.path.split('\')[-1]))){move $h.path ($destination+$h.artist)}
}