文库吧小说爬取下载打包 EPUB

采用整本下载的接口，无需登录（但需要知道小说ID），用正则分章。
开源在 Github 上：https://github.com/apachecn/Book ... dTool/lightnovel.py
关键代码：
[ol]

def format_text(text):

# 多个换行变为一个

text = re.sub(r'(\r\n)+', '\r\n', text)

# 去掉前两行

text = re.sub(r'^.+?\r\n.+?\r\n', '', text)

# 去掉后两行

text = re.sub(r'\r\n.+?\r\n.+?$$$$', '', text)

# 划分标题和段落

def rep_func(m):

s = m.group(1)

return '' + s[4:] + '' \

if s.startswith(' ') else \

'' + s + ''

text = re.sub(r'^(.+?)$$$$', rep_func, text, flags=re.M)

# 拆分章节，过滤空白章节

chs = filter(None, text.split(''))

# 将章节拆分为标题和内容

map_func = lambda x: {

'title': re.search(r'(.+?)', x).group(1),

'content': re.sub(r'.+?', '', x),

}

return list(map(map_func, chs))

def get_info(html):

root = pq(html)

dt = root('#content > div:nth-child(1) > table:nth-child(1) tr:nth-child(2) > td:nth-child(4)').text()[5:].replace('-', '') or 'UNKNOWN'

url = root('#content > div:nth-child(1) > div:nth-child(6) > div > span:nth-child(1) > fieldset > div > a').attr('href')

title = root('#content > div:nth-child(1) > table:nth-child(1) tr:nth-child(1) > td > table tr > td:nth-child(1) > span > b').text()

author = root('#content > div:nth-child(1) > table:nth-child(1) tr:nth-child(2) > td:nth-child(2)').text()[5:]

return {'dt': dt, 'url': url, 'title': fname_escape(title), 'author': fname_escape(author)}

def download_ln(args):

id = args.id

save_path = args.save_path

headers = default_hdrs.copy()

headers['Cookie'] = args.cookie

url = f'https://www.wenku8.net/book/{id}.htm'

html = request_retry('GET', url, headers=headers).content.decode('gbk')

info = get_info(html)

print(info['title'], info['author'], info['dt'])

ofname = f"{save_path}/{info['title']} - {info['author']} - {info['dt']}.epub"

if path.exists(ofname):

print('已存在')

return

safe_mkdir(save_path)

articles = [{

'title': info['title'],

'content': f"作者：{info['author']}",

}]

url = f'http://dl.wenku8.com/down.php?type=udefault_hdrstf8&id={id}'

text = request_retry('GET', url, headers=headers).content.decode('utf-8')

chs = format_text(text)

articles += chs

gen_epub(articles, {}, None, ofname)

[/ol]复制代码
已发布到 PYPI，可以一键下载安装：
[ol]

pip install BookerDownloadTool

dl-tool ln

[/ol]复制代码
代码, 章节

文库吧小说爬取下载打包 EPUB

相关帖子

浏览过的版块

热门主题

ioio事件是什么鬼？

养老贷又来了，贷贷相传啊

今天要撸2次

好评有礼给的是红包还是优惠卷

现在干啥都太难了，珍惜吧

淘宝现在也好难搞啊

现在的ai能生产图文结合的内容吗

周固固突然发飙了，谁惹他了呢？吃光群众等

怎么吵架了啊

老坛们看过来，周固固同志狂撒金币。折射一

热门板块

公告

网站帮助 - Yoo趣儿

我们的愿景

在 Yoo趣儿投放广告

Yoo趣儿网站用户应遵守规则

文库吧小说爬取下载打包 EPUB

相关帖子

浏览过的版块

热门主题

ioio事件是什么鬼？

养老贷又来了，贷贷相传啊

今天要撸2次

好评有礼给的是红包还是优惠卷

现在干啥都太难了，珍惜吧

淘宝现在也好难搞啊

现在的ai能生产图文结合的内容吗

周固固突然发飙了，谁惹他了呢？吃光群众等

怎么吵架了啊

老坛们看过来，周固固同志狂撒金币。折射一

热门板块

公告

网站帮助 - Yoo趣儿

我们的愿景

在 Yoo趣儿 投放广告

Yoo趣儿网站用户应遵守规则

在 Yoo趣儿投放广告