千机游戏提供最新游戏下载和手游攻略!

使用Python技术深入挖掘经典名著《金瓶梅》

发布时间:2024-10-19浏览:48

前言

python免费学习资料以及群交流解答点击即可加入

开发工具

Python版本:3.6.5

相关模块:

requests模块

parsel模块

爬取网站

https://www.tianyabook.com/shu/3801.html

获取每一章小说链接

import requestsimport parselurl = 'https://www.tianyabook.com/shu/3801.html'headers = { 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'}response = requests.get(url=url, headers=headers)selector = parsel.Selector(response.text)page_urls = selector.css('.panel-body dd a::attr(href)').getall()

获取每一章小说内容以及章节名字

new_url = 'https://www.tianyabook.com' + page_urlresponse = requests.get(url=new_url, headers=headers)response.encoding = response.apparent_encodingselector = parsel.Selector(response.text)content = selector.css('#htmlContent::text').getall()title = selector.css('.page-header h1::text').get()html_data = ''.join(content)html_content = html_data.strip()print(html_content )

小说内容保存本地txt

with open('金瓶梅.txt', mode='a', encoding='utf-8') as f: f.write(title) f.write('\n') f.write(html_content) f.write('\n') print('{}已下载完成'.format(title))

热点资讯