取蓝奏云直链教程2(含密码)(附python源码)

查看 113|回复 11
作者:baipiao520   
前情回顾
第一期传送门:取蓝奏云直链教程(附python源码)
上次我们分析了一个无访问密码的单文件分享链接。
上次主要运用了re库,也就是正则表达式来取出网页中的参数,有人推荐我用bs4来提取参数,但是bs4不太适用于JavaScript,所以本文还是使用正则来提取。
准备工作
浏览器
python环境
开始分析
有了上次的经验,我们直接访问一个带密码的分享链接并查看浏览器
未输入密码前:


{6BA127C9-DADD-4b44-8389-5932C02CC59C}.png (53.55 KB, 下载次数: 0)
下载附件
2024-3-22 22:30 上传

输入密码后:


{87DA92C7-D10E-4047-9966-5BCCAC340892}.png (87.14 KB, 下载次数: 0)
下载附件
2024-3-22 22:28 上传

同时查看网络调试:


{EE111C16-A9BB-4b87-93B7-0020085C3853}.png (198.34 KB, 下载次数: 0)
下载附件
2024-3-22 22:32 上传

发现这次的网页反而没有套娃式请求,而是一步到位。
那我们也直接开始request。
import requests
url = "https://wwt.lanzouu.com/iW5jF1s99k6j"
password = 6666
headers={
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0"
    }
response = requests.get(url, headers=headers)
print(response.text)
我们观察取回来的网页,发现和上次很相似


{1BA179F7-93F2-4d2d-AE09-198B56D65AE3}.png (174.8 KB, 下载次数: 0)
下载附件
2024-3-22 22:36 上传



{E86F61BA-8CF6-4e4a-9DB8-352431617C51}.png (61.57 KB, 下载次数: 0)
下载附件
2024-3-22 22:37 上传

只不过这次的ajax脚本在一个down_p()函数中
相比上次还节省了好几步
那我们直接开始提取参数吧!
url_pattern = re.compile(r"url\s*:\s*'(/ajaxm\.php\?file=\d+)'")
url_match = url_pattern.search(response.text).group(1)
skdklds_pattern = re.compile(r"var\s+skdklds\s*=\s*'([^']*)';")
skdklds_match = skdklds_pattern.search(response.text).group(1)
print(url_match, skdklds_match)
考虑到Match类型我们只需要用到group(1)方法,这次我在定义变量时就直接使用group(1)方法,也方便后续调用。文末我会给出这次和上次的优化后的代码。
接下来是模拟post请求
data = {
    'action': 'downprocess',
    'sign': skdklds_match,
    'p': password,
}
headers = {
    "Referer": url,
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0"
}
response2 = requests.post(f"https://{re_domain(url)}{url_match}", headers=headers, data=data)
print(response2.text)
password可以是str类型,也可以是int类型,因为在转换为data时都会自动转为str类型,这里看个人喜好。
有了上次的教训,别忘记在协议头中加入Referer。
后面就和之前的一模一样了
import json
data = json.loads(response2.text)
dom = data['dom']
url = data['url']
full_url = dom + "/file/" + url
headers = {
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"accept-language": "zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6",
"sec-ch-ua": "\"Chromium\";v=\"122\", \"Not(A:Brand\";v=\"24\", \"Microsoft Edge\";v=\"122\"",
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "\"Windows\"",
"sec-fetch-dest": "document",
"sec-fetch-mode": "navigate",
"sec-fetch-site": "none",
"sec-fetch-user": "?1",
"upgrade-insecure-requests": "1",
"cookie": "down_ip=1"
}
response3 = requests.get(full_url, headers=headers, allow_redirects=False)
print(response3.headers['Location'])
完整程序(带密码)
import requests
import re
import json
def re_domain(url):
    pattern_domain = r"https?://([^/]+)"
    match = re.search(pattern_domain, url)
    if match:
        domain = match.group(1)
        return domain
    else:
        return None
url = "https://wwt.lanzouu.com/iW5jF1s99k6j"
password = "6666"
headers={
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0"
    }
response = requests.get(url, headers=headers)
url_pattern = re.compile(r"url\s*:\s*'(/ajaxm\.php\?file=\d+)'")
url_match = url_pattern.search(response.text).group(1)
skdklds_pattern = re.compile(r"var\s+skdklds\s*=\s*'([^']*)';")
skdklds_match = skdklds_pattern.search(response.text).group(1)
print(url_match, skdklds_match)
data = {
    'action': 'downprocess',
    'sign': skdklds_match,
    'p': password,
}
headers = {
    "Referer": url,
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0"
}
response2 = requests.post(f"https://{re_domain(url)}{url_match}", headers=headers, data=data)
data = json.loads(response2.text)
dom = data['dom']
url = data['url']
full_url = dom + "/file/" + url
headers = {
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"accept-language": "zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6",
"sec-ch-ua": "\"Chromium\";v=\"122\", \"Not(A:Brand\";v=\"24\", \"Microsoft Edge\";v=\"122\"",
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "\"Windows\"",
"sec-fetch-dest": "document",
"sec-fetch-mode": "navigate",
"sec-fetch-site": "none",
"sec-fetch-user": "?1",
"upgrade-insecure-requests": "1",
"cookie": "down_ip=1"
}
response3 = requests.get(full_url, headers=headers, allow_redirects=False)
print(response3.headers['Location'])
如何区分是否需要密码
其实这两个网页的区别还是很大的,也有很多方法可以区分:
[ol]
  • 有密码的文件,没密码的文件名 - 蓝奏云
    具体方法
    if "文件" in response.text:
    print("包含密码")
    else:
    print("无密码")
  • 有密码的包含很多的,没密码的没有
    具体方法
    if "" in response.text:
    print("包含密码")
    else:
    print("无密码")
  • 还有后面的很多函数都是只有有密码的才有,这里就不做演示了。
    [/ol]
    完整程序
    import requests
    import re
    import json
    def re_domain(url):
        pattern_domain = r"https?://([^/]+)"
        match = re.search(pattern_domain, url)
        if match:
            domain = match.group(1)
            return domain
        else:
            return None
    def getwithp(url, password):
        headers={
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0"
        }
        response = requests.get(url, headers=headers)
        url_pattern = re.compile(r"url\s*:\s*'(/ajaxm\.php\?file=\d+)'")
        url_match = url_pattern.search(response.text).group(1)
        skdklds_pattern = re.compile(r"var\s+skdklds\s*=\s*'([^']*)';")
        skdklds_match = skdklds_pattern.search(response.text).group(1)
        data = {
            'action': 'downprocess',
            'sign': skdklds_match,
            'p': password,
        }
        headers = {
            "Referer": url,
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0"
        }
        response2 = requests.post(f"https://{domain}{url_match}", headers=headers, data=data)
        data = json.loads(response2.text)
        full_url = data['dom'] + "/file/" + data['url']
        headers = {
        "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
        "accept-language": "zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6",
        "sec-ch-ua": "\"Chromium\";v=\"122\", \"Not(A:Brand\";v=\"24\", \"Microsoft Edge\";v=\"122\"",
        "sec-ch-ua-mobile": "?0",
        "sec-ch-ua-platform": "\"Windows\"",
        "sec-fetch-dest": "document",
        "sec-fetch-mode": "navigate",
        "sec-fetch-site": "none",
        "sec-fetch-user": "?1",
        "upgrade-insecure-requests": "1",
        "cookie": "down_ip=1"
        }
        response3 = requests.get(full_url, headers=headers, allow_redirects=False)
        return response3.headers['Location']
    def getwithoutp(url):
        headers={
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0"
        }
        response = requests.get(url, headers=headers)
        iframe_pattern = re.compile(r'')
        matches = iframe_pattern.findall(response.text)
        response2 = requests.get(f"https://{domain}{matches[1]}", headers=headers)
        pattern = r"'sign'\s*:\s*'([^']+)'"
        sign = re.search(pattern, response2.text).group(1)
        pattern2 = r"url\s*:\s*'([^']+)'"
        url2 = re.search(pattern2, response2.text).group(1)
        data = {
            'action': 'downprocess',
            'signs': '?ctdf',
            'sign': sign,
            'websign': '',
            'websignkey': 'bL27',
            'ves': 1
        }
        headers = {
            "Referer": matches[1],
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0"
        }
        response3 = requests.post(f"https://{domain}{url2}", headers=headers, data=data)
        data = json.loads(response3.text)
        full_url = data['dom'] + "/file/" + data['url']
        headers = {
        "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
        "accept-language": "zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6",
        "sec-ch-ua": "\"Chromium\";v=\"122\", \"Not(A:Brand\";v=\"24\", \"Microsoft Edge\";v=\"122\"",
        "sec-ch-ua-mobile": "?0",
        "sec-ch-ua-platform": "\"Windows\"",
        "sec-fetch-dest": "document",
        "sec-fetch-mode": "navigate",
        "sec-fetch-site": "none",
        "sec-fetch-user": "?1",
        "upgrade-insecure-requests": "1",
        "cookie": "down_ip=1"
        }
        response4 = requests.get(full_url, headers=headers, allow_redirects=False)
        return response4.headers['Location']
    url = "https://wwt.lanzouu.com/iW5jF1s99k6j"
    password = "6666"
    domain = re_domain(url)
    headers={
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0"
        }
    response = requests.get(url, headers=headers)
    if "文件" in response.text:
        print("包含密码")
        result = getwithp(url, password)
    else:
        print("无密码")
        result = getwithoutp(url)
    print(result)
    结语
    本教程仅供参考学习思路,网页随时会变,并非永久可用。
    多文件(文件夹)分享下期再讲。

    密码, 下载次数

  • wzvideni   

    大佬,我照着你的教程打算自己试一下带密码的文件夹形式的蓝奏云链接,目前已经能把输入密码后的那个界面的json数据给请求出来了,但是请求具体文件时返回为空,不知道是怎么回事,在网页端进入文件夹输入一次密码后再点击具体文件时是不需要输入单个文件的密码的,不知道是不是这个原因,但是我加上Referer头也一样。
    大佬如果有时间的话可以看一下吗
    代码如下:
    [Python] 纯文本查看 复制代码import json
    import re
    import requests
    def re_domain(url):
        pattern_domain = r"https?://([^/]+)"
        match = re.search(pattern_domain, url)
        if match:
            domain = match.group()
            return domain
        else:
            return None
    url = "https://wwur.lanzout.com/b01rs66mb"
    password = "xfgc"
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0"
    }
    response = requests.get(url, headers=headers)
    url_match = re.search(r"url\s*:\s*'(/filemoreajax\.php\?file=\d+)'", response.text).group(1)
    file_match = re.search(r"\d+", url_match).group()
    t_match = re.search(r"var\s+ib\w+\s*=\s*'([^']*)';", response.text).group(1)
    k_match = re.search(r"var\s+_h\w+\s*=\s*'([^']*)';", response.text).group(1)
    print(url_match)
    print(file_match)
    print(t_match)
    print(k_match)
    # print(response.text)
    data = {
        'lx': 2,
        'fid': file_match,
        'uid': '1674564',
        'pg': 1,
        'rep': '0',
        't': t_match,
        'k': k_match,
        'up': 1,
        'ls': 1,
        'pwd': password
    }
    headers = {
        "Referer": url,
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0"
    }
    print(f"{re_domain(url)}{url_match}")
    response2 = requests.post(f"{re_domain(url)}{url_match}", headers=headers, data=data)
    # print(response2.text)
    data = json.loads(response2.text)
    # print(data)
    text_list = data['text']
    headers = {
        "Referer": url,
        "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
        "accept-language": "zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6",
        "sec-ch-ua": "\"Chromium\";v=\"122\", \"Not(A:Brand\";v=\"24\", \"Microsoft Edge\";v=\"122\"",
        "sec-ch-ua-mobile": "?0",
        "sec-ch-ua-platform": "\"Windows\"",
        "sec-fetch-dest": "document",
        "sec-fetch-mode": "navigate",
        "sec-fetch-site": "none",
        "sec-fetch-user": "?1",
        "upgrade-insecure-requests": "1",
        "cookie": "down_ip=1"
    }
    for text in text_list:
        print(text['name_all'])
        file_url = f"{re_domain(url)}/{text['id']}"
        print(file_url)
        response3 = requests.get(file_url, headers=headers, allow_redirects=False)
        print(response3)
        print(response3.text)
        # print(response3.headers['Location'])
        break
    baipiao520
    OP
      


    wzvideni 发表于 2024-3-24 09:33
    大佬,我照着你的教程打算自己试一下带密码的文件夹形式的蓝奏云链接,目前已经能把输入密码后的那个界面的 ...

    你最后取回的file_url其实就是我第一篇里面的不带密码的访问,所以可以直接调用我的函数
    [Python] 纯文本查看 复制代码import json
    import re
    import requests
    def re_domain(url):
        pattern_domain = r"https?://([^/]+)"
        match = re.search(pattern_domain, url)
        if match:
            domain = match.group()
            return domain
        else:
            return None
    def getwithoutp(url):
        headers={
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0"
        }
        response = requests.get(url, headers=headers)
        iframe_pattern = re.compile(r'')
        matches = iframe_pattern.findall(response.text)
        response2 = requests.get(f"{domain}{matches[1]}", headers=headers)
        pattern = r"'sign'\s*:\s*'([^']+)'"
        sign = re.search(pattern, response2.text).group(1)
        pattern2 = r"url\s*:\s*'([^']+)'"
        url2 = re.search(pattern2, response2.text).group(1)
        data = {
            'action': 'downprocess',
            'signs': '?ctdf',
            'sign': sign,
            'websign': '2',
            'websignkey': 'xLG2',
            'ves': 1
        }
        headers = {
            "Referer": matches[1],
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0"
        }
        response3 = requests.post(f"{domain}{url2}", headers=headers, data=data)
        data = json.loads(response3.text)
        full_url = str(data['dom']) + "/file/" + str(data['url'])
        headers = {
        "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
        "accept-language": "zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6",
        "sec-ch-ua": "\"Chromium\";v=\"122\", \"Not(A:Brand\";v=\"24\", \"Microsoft Edge\";v=\"122\"",
        "sec-ch-ua-mobile": "?0",
        "sec-ch-ua-platform": "\"Windows\"",
        "sec-fetch-dest": "document",
        "sec-fetch-mode": "navigate",
        "sec-fetch-site": "none",
        "sec-fetch-user": "?1",
        "upgrade-insecure-requests": "1",
        "cookie": "down_ip=1"
        }
        response4 = requests.get(full_url, headers=headers, allow_redirects=False)
        return response4.headers['Location']
    url = "https://wwur.lanzout.com/b01rs66mb"
    password = "xfgc"
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0"
    }
    response = requests.get(url, headers=headers)
    url_match = re.search(r"url\s*:\s*'(/filemoreajax\.php\?file=\d+)'", response.text).group(1)
    file_match = re.search(r"\d+", url_match).group()
    t_match = re.search(r"var\s+ib\w+\s*=\s*'([^']*)';", response.text).group(1)
    k_match = re.search(r"var\s+_h\w+\s*=\s*'([^']*)';", response.text).group(1)
    domain = re_domain(url)
    print(url_match)
    print(file_match)
    print(t_match)
    print(k_match)
    # print(response.text)
    data = {
        'lx': 2,
        'fid': file_match,
        'uid': '1674564',
        'pg': 1,
        'rep': '0',
        't': t_match,
        'k': k_match,
        'up': 1,
        'ls': 1,
        'pwd': password
    }
    print(f"{domain}{url_match}")
    response2 = requests.post(f"{domain}{url_match}", headers=headers, data=data)
    # print(response2.text)
    data = json.loads(response2.text)
    # print(data)
    text_list = data['text']
    for text in text_list:
        print(text['name_all'])
        print(text)
        file_url = f"{domain}/{text['id']}"
        print(file_url)
        print(getwithoutp(file_url))
        break
    m96118   

    讲解的非常到位,谢谢分享
    sai609   

    一,蓝奏云,浏览器自带下载工具即可,秒下
    二,123,天翼,度盘,夸克,阿里盘:直链提取后,免注册登陆而下载,有啥办法
    PS 不用自己账号,怕封
    tsanye   

    谢谢🙏分享,学习
    jm1jm1   

    讲解的很详细,谢谢 分享,慢慢消化中
    shallies   

    学习了,感谢楼主技术分享
    saccsf   

    感谢楼主技术分析
    BBA119   

    能不能讲讲这个的意义    讲解的很详细,看不懂    谢谢 分享,
    您需要登录后才可以回帖 登录 | 立即注册

    返回顶部