上代码:
[Python] 纯文本查看 复制代码# 导包
import re
import requests
# 定义变量
url = 'https://pan.baidu.com/component/view?id='
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36"
}
# 随机id值,5000 以后应该没有了吧
for i in range(2000, 5000):
# 拼接字符串
urls = url + str(i)
# 不关注内容,直接根据 head 获取状态
response = requests.head(urls, headers=headers)
if response.status_code == 200:
content = requests.get(urls, headers=headers).text
# 提取 title
title = re.findall('(.*)', content)
# 设置关键词,与这个关键词相关的是会员活动,还有一堆其他链接,与会员活动无关
member = "会员狂欢季"
if member in str(title):
# 打印标题和链接
print(title, urls)
举个栗子:
['百度网盘 | 中国联通 - 会员狂欢季'] https://pan.baidu.com/component/view?id=2002
['百度网盘 - 会员狂欢季'] https://pan.baidu.com/component/view?id=2104
['百度网盘 | 车来了 - 会员狂欢季'] https://pan.baidu.com/component/view?id=2109
['百度网盘 - 会员狂欢季'] https://pan.baidu.com/component/view?id=2172
['百度网盘 - 会员狂欢季'] https://pan.baidu.com/component/view?id=2321
['百度网盘 - 会员狂欢季'] https://pan.baidu.com/component/view?id=2325
['百度网盘 - 会员狂欢季'] https://pan.baidu.com/component/view?id=2440
['百度网盘 - 会员狂欢季'] https://pan.baidu.com/component/view?id=2441
['百度网盘 - 会员狂欢季'] https://pan.baidu.com/component/view?id=2444
['百度网盘 - 会员狂欢季'] https://pan.baidu.com/component/view?id=2449
['百度网盘 | 有驾 - 会员狂欢季'] https://pan.baidu.com/component/view?id=3412
['百度网盘 | 知乎 - 会员狂欢季'] https://pan.baidu.com/component/view?id=3490
['百度网盘 | 魅族 - 会员狂欢季'] https://pan.baidu.com/component/view?id=3498
我的图床咋传不上图,愁人