用Python批量下载豆瓣小站的音乐

原理很简单,豆瓣小站的网页HTML看了一下,每首歌的名字和地址都写好了在里面,只是每次载入地址是不一样的而已。用urllib读取一下,分析出里面的名字和地址,然后下载即可。

先看效果图,上面是命令行的运行结果,左边是代码,右边是拖回来的文件,成就感满满!

Screenshot - 061013 - 21:53:00

最后附上python3.2示例代码:

import urllib.request
from urllib.request import urlopen
import re,os

url = input('Please enter the URL of the douban site.\n(E.g., http://site.douban.com/quinta-diminuita/)\n')
content = urlopen(url).read().decode('utf-8')
#分析音乐人名称并创建相应目录
site_name = ''.join(re.findall('<title>\n\s*(.*?)</title>', content, re.DOTALL))
artist = site_name.replace('的小站\n(豆瓣音乐人)','')
print('Preparing to download all the songs by '+artist+'...')
if os.path.exists('./'+artist) == False:
	os.mkdir('./'+artist)
#分析出所有房间地址
roomlist = re.findall('<div class=\"nav-items\">(.*?)<\/div>', content, re.DOTALL)
roomurl = re.findall('href=\"(.*?)\"', roomlist[0])
#对于每个房间,如果有播放列表,则分析每个列表中的歌曲名和对应地址,然后下载
for h in roomurl:
	content = urlopen(h).read().decode('utf-8')
	playlist = re.findall('song_records\s=\s\[(.*?)\]\;\n', content, re.DOTALL)
	if playlist != []:
		for i in playlist:
			playlist_spl = re.split(r'},{', i)
			for j in playlist_spl:
				song_title = ''.join(re.findall('\"name\"\:\"(.*?)\"', j))
				song_title = ''.join(song_title).replace('\\','')
				song_title = ''.join(song_title).replace('/','.')
				song_url = re.findall('\"rawUrl\"\:\"(.*?)\"', j)
				song_url_str = ''.join(song_url).replace('\\','')
				filepath = './'+artist+'/'+song_title+'.mp3'
				if os.path.isfile(filepath) == False:
					print('Downloading '+song_title+'...')
					urllib.request.urlretrieve(song_url_str, filepath)
				else:
					print(song_title+' alreadly exists. Skipped.')
print('All the songs have been downloaded (if there are any). Enjoy!')

Leave a comment

Your email address will not be published. Required fields are marked *