自动扫描 GitHub 仓库，发现泄露的 AI API 密钥 ...

作者：gaoxiaotian 发布时间：2025-10-18 10:00:39

打造一个自动化的GitHub AI API密钥泄露扫描工具
最近在使用各种AI服务时，发现不少开发者会不小心将API密钥（OpenAI、Anthropic Claude等）提交到GitHub上，这可能导致：
1、API密钥被他人盗用
2、产生巨额费用（有人因此损失几千美元）
3、账号安全风险
于是我决定开发一个自动化扫描工具，可以扫描到别人仓库的api_key。
项目实现
1、基于GitHub Actions运行
2、支持多种AI API密钥格式，减少误报
3、生成完整的扫描报告，包含文件路径、行号、置信度评级
4、发现问题自动创建Issue通知
5、完全免费利用GitHub Actions的免费额度
技术架构项目采用模块化设计，主要包含以下几个核心模块：

111.png (128.08 KB, 下载次数: 1)
下载附件
2025-10-16 16:31 上传

核心实现
1.敏感信息检测器这是整个项目的核心，负责识别代码中的API密钥。我定义了30+种检测模式，涵盖：
OpenAI API Key（sk-...、sk-proj-...）
Anthropic Claude（sk-ant-...）
Google AI/Gemini（AIza...）
环境变量赋值（OPENAI_API_KEY = "..."）
对象属性（apiKey: "..."）
关键代码：
[Python] 纯文本查看复制代码# config.py - 部分检测模式
SENSITIVE_PATTERNS = [
# OpenAI API密钥格式
r'sk-[a-zA-Z0-9]{32,}',
r'sk-proj-[a-zA-Z0-9_-]{32,}',

# Anthropic API密钥格式
r'sk-ant-[a-zA-Z0-9_-]{32,}',

# Google AI (Gemini) API密钥格式
r'AIza[a-zA-Z0-9_-]{35}',

# 环境变量模式
r'OPENAI_API_KEY[\s]*=[\s]*["\']?([a-zA-Z0-9_-]{20,})["\']?',
r'ANTHROPIC_API_KEY[\s]*=[\s]*["\']?([a-zA-Z0-9_-]{20,})["\']?',

# camelCase 模式
r'apiKey[\s]*:[\s]*["\']([a-zA-Z0-9_-]{20,})["\']',
r'openaiApiKey[\s]*[:=][\s]*["\']([a-zA-Z0-9_-]{20,})["\']',
[Python] 纯文本查看复制代码# secret_detector.py - 核心检测逻辑
class SecretDetector:
def detect_secrets_in_text(self, text: str, file_path: str = "") -> List[Dict]:
      """在文本中检测敏感信息"""
      findings = []
      lines = text.split('\n')

      for line_num, line in enumerate(lines, 1):
         for pattern in self.patterns:
            matches = pattern.finditer(line)
            for match in matches:
                  secret = match.group(0)

                  # 过滤示例代码
                  if self._is_likely_example(line, secret):
                     continue

                  findings.append({
                     'file_path': file_path,
                     'line_number': line_num,
                     'line_content': line.strip(),
                     'secret': secret,
                     'pattern': pattern.pattern,
                     'confidence': self._calculate_confidence(secret, line)
                  })

      return findings

def _is_likely_example(self, line: str, secret: str) -> bool:
      """判断是否可能是示例代码"""
      line_lower = line.lower()
      example_keywords = [
         'example', 'sample', 'demo', 'test', 'placeholder',
         'your_api_key', 'xxx', 'todo', 'replace', 'change_me'
      ]

      for keyword in example_keywords:
         if keyword in line_lower:
            return True
      return False

def _calculate_confidence(self, secret: str, line: str) -> str:
      """计算置信度"""
      # 高置信度：密钥格式完整且不在注释中
      if (secret.startswith('sk-') and len(secret) > 40 and
         not line.strip().startswith('#') and
         not line.strip().startswith('//')):
         return 'high'

      # 中等置信度：符合基本模式
      if len(secret) >= 30:
         return 'medium'

      return 'low'
2.智能扫描策略为了避免重复扫描和提高效率，我实现了扫描历史管理：
[Python] 纯文本查看复制代码# scanner.py - 主扫描逻辑
class CloudScanner:
def scan_ai_projects(self, max_repos: int = 50) -> str:
      """自动搜索并扫描AI相关项目"""
      print(f"🚀 开始自动搜索 AI 相关项目")
      scan_start_time = datetime.now()

      # 定义过滤函数：检查仓库是否已扫描
      def is_scanned(repo_full_name: str) -> bool:
         return self.scan_history.is_scanned(repo_full_name)

      # 搜索仓库，实时过滤已扫描的
      repos_to_scan = self.github_scanner.search_ai_repos(
         max_repos=max_repos,
         skip_filter=is_scanned if self.skip_scanned else None
      )

      # 扫描所有仓库
      all_findings = []
      for idx, repo in enumerate(repos_to_scan, 1):
         print(f"🔍 [{idx}/{len(repos_to_scan)}] 扫描仓库: {repo['full_name']}")
         findings = self._scan_repository(repo, scan_type="auto:ai-projects")
         all_findings.extend(findings)

      # 生成报告
      report_path = self.report_generator.generate_report(
         all_findings, scan_start_time, scan_type="auto:ai-projects"
      )

      return report_path
3.详细的扫描报告报告生成器会创建结构化的TXT报告，包含：
[Python] 纯文本查看复制代码# report_generator.py - 报告生成核心
class ReportGenerator:
def generate_report(self, scan_results: List[Dict],
                     scan_start_time: datetime,
                     scan_type: str = "auto") -> str:
      """生成扫描报告"""
      timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
      filename = f"scan_report_{timestamp}.txt"
      filepath = os.path.join(self.output_dir, filename)

      with open(filepath, 'w', encoding='utf-8') as f:
         # 报告头
         f.write("╔" + "═" * 78 + "╗\n")
         f.write("║" + "       🔒 InCloud GitHub 云上扫描器 - 扫描报告".ljust(78) + "║\n")
         f.write("╚" + "═" * 78 + "╝\n\n")

         # 按仓库分组显示结果
         results_by_repo = self._group_by_repo(scan_results)
         for repo_url, findings in results_by_repo.items():
            self._write_repo_findings(f, repo_url, findings)

         # 统计信息
         self._write_statistics(f, scan_results)

      return filepath

def _mask_secret(self, secret: str) -> str:
      """部分隐藏密钥"""
      if len(secret)
报告示例：

19b1b7c3-c405-4b4f-9831-e0728e71f058.png (88.66 KB, 下载次数: 1)
下载附件
2025-10-16 16:26 上传

4. GitHub Actions自动化这是项目的一大亮点！
无需本地运行，完全基于GitHub Actions：
[XML] 纯文本查看复制代码# .github/workflows/manual-scan.yml
name: AI API Key Scanner - Manual Scan
on:
  workflow_dispatch:
inputs:
   scan_type:
      description: '扫描类型'
      required: true
      type: choice
      options:
      - 'auto - 自动搜索AI项目'
      - 'user - 扫描指定用户'
      - 'org - 扫描指定组织'
      - 'repo - 扫描单个仓库'
   max_repos:
      description: '最大扫描仓库数'
      type: number
      default: 50
permissions:
  contents: write  # 允许提交报告
  issues: write # 允许创建 Issue
jobs:
  manual-scan:
runs-on: ubuntu-latest
steps:
   - name: 检出代码
      uses: actions/checkout@v4

   - name: 设置 Python
      uses: actions/setup-python@v5
      with:
      python-version: '3.10'

   - name: 执行扫描
      run: |
      python scan_github.py --auto --max-repos ${{ github.event.inputs.max_repos }}

   - name: 提交报告到仓库
      run: |
      git config user.name "github-actions[bot]"
      git config user.email "github-actions[bot]@users.noreply.github.com"
      git add scan_reports/
      git commit -m "添加扫描报告 [skip ci]"
      git push

   - name: 上传 Artifacts
      uses: actions/upload-artifact@v4
      with:
      name: scan-report-${{ github.run_number }}
      path: scan_reports/
      retention-days: 90

   - name: 创建告警 Issue
      if: steps.analyze.outputs.has_findings == 'true'
      uses: actions/github-script@v7
      with:
      script: |
         await github.rest.issues.create({
            owner: context.repo.owner,
            repo: context.repo.repo,
            title: '发现潜在密钥泄露',
            body: '详见扫描报告...',
            labels: ['security', 'auto-scan']
         });
核心特性
1.多模式扫描支持4种扫描模式：
[Python] 纯文本查看复制代码# 自动搜索 AI 相关项目
python scan_github.py --auto --max-repos 50
# 扫描指定用户的所有公开仓库
python scan_github.py --user username
# 扫描指定组织
python scan_github.py --org organization
# 扫描单个仓库
python scan_github.py --repo owner/repo_name
2.智能过滤
自动过滤示例代码（包含 example、demo、placeholder 等关键词）
跳过二进制文件和媒体文件
排除 node_modules、.git 等目录
置信度评分（高/中/低）
3.扫描历史管理避免重复扫描，提高效率：
[Python] 纯文本查看复制代码# scan_history.py
class ScanHistory:
def mark_as_scanned(self, repo_name: str, findings_count: int, scan_type: str):
      """标记仓库为已扫描"""
      self.history[repo_name] = {
         'last_scan': datetime.now().isoformat(),
         'findings_count': findings_count,
         'scan_type': scan_type
      }
      self._save_history()

def is_scanned(self, repo_name: str) -> bool:
      """检查仓库是否已扫描"""
      return repo_name in self.history
4.超时保护
GitHub Actions有60分钟的限制，我加入了超时保护：
[Python] 纯文本查看复制代码def _check_timeout(self, current_idx: int, total_repos: int) -> bool:
"""检查是否超时"""
if self._is_timeout():
      elapsed_minutes = (time.time() - self.scan_start_time) / 60
      print(f"扫描超时（已运行 {elapsed_minutes:.1f} 分钟）")
      print(f"已完成 {current_idx}/{total_repos} 个仓库的扫描")
      print(f"已保存扫描数据，剩余仓库将在下次扫描时处理")
      return True
return False
[Python] 纯文本查看复制代码# 完整项目GitHub地址：https://github.com/gaocaipeng/InCloudGitHub
# 如果觉得有用，欢迎Star和Fork！

密钥, 仓库

自动扫描 GitHub 仓库，发现泄露的 AI API 密钥

相关帖子

热门主题

国产英伟达，摩尔把上市融资的75亿元拿去买

✅DMIT 三网 GIA CMIN2 MALIBU EB 维多利亚

有MJJ遇到过TG号全部设备都被登出了吗？

【快讯】HostHatch Seoul HH 新节点首尔

Hk-One-0.5G-52-LS 少量放貨速度

公司项目分享：硅谷人工智能公司 Nexa AI

拿到了 300 来部短剧的海外发行版权，下一

长话短说大家觉得花三十万结婚，存款花完

建议拉黑 IObit 旗下所有软件

重度苹果用户投华做了两面派

热门板块

公告

网站帮助 - Yoo趣儿

我们的愿景

在 Yoo趣儿投放广告

Yoo趣儿网站用户应遵守规则