主要是学习Python处理xml的方法
先上一个纯Python的,有问题,部分歌曲下载不到,返回404错误,待解决
2010-12-15
终于找到404的原因了,今天早上回来,又执行了一下脚本,发现能下载的文件名都是没有空格的,而404的则都有空格,对文件名进行了编码,解决了,另外,由于有首歌的歌名有个中文的单引号,故需要将它编码为gbk,才发现Python2.4.3不支持gbk,所以,最后的程序,开头用的是Python2.7.1 ,修改的程序见页面底部
[python]
#!/bin/env python
import os
import urllib
from xml.dom import minidom
import time
download_time=time.strftime(‘%Y%m%d’,time.localtime())
download_path=os.getcwd()+”/”+”wxf”+download_time
if not os.path.isdir(download_path):
os.mkdir(download_path)
usock = urllib.urlopen(‘http://www.wangxiaofeng.net/mp3player.xml’)
xmldoc = minidom.parse(usock)
usock.close()
songlist = xmldoc.getElementsByTagName(‘song’)
for i in range(0,len(songlist),1):
url=songlist[i].attributes[“path”].value.encode(‘utf-8’)
name=songlist[i].attributes[“title”].value
urllib.urlretrieve(url,download_path+”/”+name+”.mp3″)
[/python]
改良下,结合bash处理
[python]
#!/bin/env python
import os
import urllib
from xml.dom import minidom
import time
download_time=time.strftime(‘%Y%m%d’,time.localtime())
download_path=os.getcwd()+”/”+”wxf”+download_time
if not os.path.isdir(download_path):
os.mkdir(download_path)
usock = urllib.urlopen(‘http://www.wangxiaofeng.net/mp3player.xml’)
xmldoc = minidom.parse(usock)
usock.close()
songlist = xmldoc.getElementsByTagName(‘song’)
f=open(‘mp3list.txt’,’w’)
for i in range(0,len(songlist),1):
url=songlist[i].attributes[“path”].value.encode(‘utf-8’)
f.write(url+”\n”)
f.close()
[/python]
[bash]
#!/bin/bash
mkdir /home/python_code/wxf20101214
cd !$
cat /home/python_code/mp3list.txt |while read line
do
wget “${line}”
done
[/bash]
url编码后的程序
[python]
#!/usr/local/python271/bin/python
import os
import urllib
from xml.dom import minidom
import time
download_time=time.strftime(‘%Y%m%d’,time.localtime())
download_path=os.getcwd()+”/”+”wxf”+download_time
if not os.path.isdir(download_path):
os.mkdir(download_path)
usock = urllib.urlopen(‘http://www.wangxiaofeng.net/mp3player.xml’)
xmldoc = minidom.parse(usock)
usock.close()
songlist = xmldoc.getElementsByTagName(‘song’)
for i in range(0,len(songlist),1):
url=urllib.quote(songlist[i].attributes[“path”].value.encode(‘gbk’),safe=”:/”)
name=songlist[i].attributes[“title”].value
urllib.urlretrieve(url,download_path+”/”+name+”.mp3″)
[/python]