So I pull up my PowerShell 3 console and started hacking:
$clnt = new-object System.Net.WebClientThe website returns download links relative to the domain as /Download/.... fortunately the URI class is built for this. If you New-Object a URI and pass it an old URI and a relative URL, then the returned URI is a fully qualified version of the absolute URL. URI also has a Segments property that splits the URL by it's parts so I can directly access the Download name.
$URL = "http://www.archive.org/details/MyFavoriteHusband_866"
$path = "$home\Downloads\MFH\"
$t = $clnt.Downloadstring($url)
$regex = [regex] '"([^"]*[.]mp3)">'
$fileList = $regex.Matches($t)
$URI = [system.URI] $URL
Clear
$i = 0
$fileList | % { $_.Groups[1].Value } | %{
$URI2 = new-object "System.URI" -argument $URI,$_
$DownLoadName = $Path + $URI2.Segments[-1]; $DownLoadName
$i++;$i
$clnt.DownloadFile($URI2.AbsoluteUri,$DownLoadName)
}
The site contained 2 links for every MP3, so I needed to make the Regex pick up only one of them. This meant that I had to grab extra data. This made my For-Each a bit more complicated. as Each Match Looked something like (The quotes were in the result):
"/download/MyFavoriteHusband_866/Mfh1951-03-24124IrisLizsEaster.mp3">
Had I my trusty cheat sheet, I would have done the Regex as:
$regex = [regex] '(?<=")([^"]*[.]mp3)(?=">)'The changes are "Zero-width positive lookbehind assertion" and "Zero-width positive lookahead assertion"s In short they say look for, but don't return as part of the match. http://msdn.microsoft.com/en-us/library/bs2twtah(v=vs.85).aspx
$fileList | %{
$URI2 = new-object "System.URI" -argument $URI,$_.Value
It is amazing,but not surprising, how much a wrong Regex may work, but make everything more complicated that follows.
To make sure I could "See it work" I added in an $i to index and the name of the file coming down. Yes, I should have made this a progress bar. Maybe later!
- Josh
-----------
Update: Fixed Typos
Update2: Added note on the Regex Change
No comments:
Post a Comment