于是乎,写了这个。用xpath获取用户的帖子数和积分,然后
[color=]水王指数=帖子数/积分值
。
因为1个帖子算0.3分,由于存在小数,极限情况就是9个帖子,算2.7分,显示为2,得到水王指数=9/2=4.5
也就是说,这个指数越大,水贴所在积分的比重就越大。
[color=]帖子数量较大时,这个指数越大(趋近于3.3333),就表示越能水。
[Python] 纯文本查看 复制代码#wangzhi=https://www.52pojie.cn/home.php?mod=space&uid=1530891&do=profile&from=spaceimport requests
from bs4 import BeautifulSoup as bsp
from lxml import etree
from time import sleep
wangzhi=r"https://www.52pojie.cn/home.php?mod=space&uid=1530890&do=profile&from=space"
tou={"Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Encoding":"gzip, deflate, br",
"Accept-Language":"zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2",
"Connection":"keep-alive",
"Cookie":"",
"Host":"www.52pojie.cn",
"Sec-Fetch-Dest":"document",
"Sec-Fetch-Mode":"navigate",
"Sec-Fetch-Site":"same-origin",
"Sec-Fetch-User":"?1",
"Upgrade-Insecure-Requests":"1",
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/115.0",
}
for wangzhi in ["https://www.52pojie.cn/home.php?mod=space&uid=%s&do=profile&from=space"%i for i in range(500000,500030)]:
xy=requests.get(wangzhi, headers=tou)
et=bsp(xy.text,"lxml")
html=etree.HTML(xy.text)
try:
yonghuming=html.xpath("/html/body/div[7]/div[1]/div/a[2]/text()")
jifen=html.xpath("/html/body/div[7]/div[4]/div/div[2]/div/div[1]/div[4]/ul/li[2]/text()")
huitie=html.xpath("/html/body/div[7]/div[4]/div/div[2]/div/div[1]/div[1]/ul[3]/li/a[2]/text()")
zhuti=html.xpath("/html/body/div[7]/div[4]/div/div[2]/div/div[1]/div[1]/ul[3]/li/a[3]/text()")
sleep(3)
yonghuming,jifen,huitie,zhuti=yonghuming[0],int(jifen[0]),int(huitie[0].split(" ")[-1]),int(zhuti[0].split(" ")[-1])
print(yonghuming,"水王指数:%.2f"%((zhuti+huitie)/jifen) )
except Exception as res:
"用户已被注销或者清理"
finally:
'52 pojie'
爬取uid=498888到498988的100名用户,去掉已经被封号的,结果如下:
"
好吧,爬不动了,论坛不让爬