前言
python免费学习资料以及群交流解答点击即可加入
开发工具
Python版本:3.6.5
相关模块:
requests模块
parsel模块
爬取网站
https://www.tianyabook.com/shu/3801.html
获取每一章小说链接
import requestsimport parselurl = 'https://www.tianyabook.com/shu/3801.html'headers = { 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'}response = requests.get(url=url, headers=headers)selector = parsel.Selector(response.text)page_urls = selector.css('.panel-body dd a::attr(href)').getall()
获取每一章小说内容以及章节名字
new_url = 'https://www.tianyabook.com' + page_urlresponse = requests.get(url=new_url, headers=headers)response.encoding = response.apparent_encodingselector = parsel.Selector(response.text)content = selector.css('#htmlContent::text').getall()title = selector.css('.page-header h1::text').get()html_data = ''.join(content)html_content = html_data.strip()print(html_content )
小说内容保存本地txt
with open('金瓶梅.txt', mode='a', encoding='utf-8') as f: f.write(title) f.write('\n') f.write(html_content) f.write('\n') print('{}已下载完成'.format(title))