PDF文件批量删除空白页（带UI界面+成品）

查看 353|回复 10

作者：泡泡汽水发布时间：2025-1-27 14:08:18

[Python] 纯文本查看复制代码import tkinter as tk
from tkinter import filedialog, messagebox
from tkinter.ttk import Progressbar
import pdfplumber
from PyPDF2 import PdfReader, PdfWriter
import threading
def is_blank_page(page):
text = page.extract_text()
if not text or text.isspace():
      images = [im for im in page.images]
      if len(images) == 0:
         return True
return False
def remove_blank_pages(input_pdf_path, output_pdf_path, progress_var, total_pages):
reader = PdfReader(input_pdf_path)
writer = PdfWriter()
with pdfplumber.open(input_pdf_path) as pdf:
      for i in range(len(reader.pages)):
         page = pdf.pages
         if not is_blank_page(page):
            writer.add_page(reader.pages)
         # 更新进度条
         progress_var.set((i + 1) / total_pages * 100)
         root.update_idletasks()  # 确保界面更新
with open(output_pdf_path, "wb") as output_pdf:
      writer.write(output_pdf)
def select_input_file():
file_path = filedialog.askopenfilename(filetypes=[("PDF 文件", "*.pdf")])
if file_path:
      input_entry.delete(0, tk.END)
      input_entry.insert(0, file_path)
def select_output_path():
file_path = filedialog.asksaveasfilename(defaultextension=".pdf", filetypes=[("PDF 文件", "*.pdf")])
if file_path:
      output_entry.delete(0, tk.END)
      output_entry.insert(0, file_path)
def process_pdf():
input_pdf_path = input_entry.get()
output_pdf_path = output_entry.get()
if not input_pdf_path or not output_pdf_path:
      messagebox.showerror("错误", "请选择输入和输出路径。")
      return
try:
      reader = PdfReader(input_pdf_path)
      total_pages = len(reader.pages)
      progress_var.set(0)  # 重置进度条
      progress_bar['maximum'] = 100
      progress_bar['value'] = 0
      # 使用线程避免GUI冻结
      thread = threading.Thread(target=lambda: remove_blank_pages(input_pdf_path, output_pdf_path, progress_var, total_pages))
      thread.start()
      # 检查线程是否完成
      def check_thread():
         if thread.is_alive():
            root.after(100, check_thread)  # 继续检查
         else:
            messagebox.showinfo("成功", "空白页移除成功！")
      root.after(100, check_thread)
except Exception as e:
      messagebox.showerror("错误", f"发生了一个错误: {str(e)}")
# 创建主窗口
root = tk.Tk()
root.title("PDF 空白页移除工具")
# 输入文件选择
input_label = tk.Label(root, text="选择要处理的 PDF 文件:")
input_label.pack(pady=5)
input_entry = tk.Entry(root, width=50)
input_entry.pack(pady=5)
input_button = tk.Button(root, text="浏览...", command=select_input_file)
input_button.pack(pady=5)
# 输出文件选择
output_label = tk.Label(root, text="选择保存位置:")
output_label.pack(pady=5)
output_entry = tk.Entry(root, width=50)
output_entry.pack(pady=5)
output_button = tk.Button(root, text="浏览...", command=select_output_path)
output_button.pack(pady=5)
# 添加进度条
progress_var = tk.DoubleVar()
progress_bar = Progressbar(root, variable=progress_var, maximum=100)
progress_bar.pack(pady=20, fill=tk.X)
# 处理按钮
process_button = tk.Button(root, text="开始移除空白页", command=process_pdf)
process_button.pack(pady=20)
# 运行主循环
root.mainloop()
运行后截图：

微信截图_20241226092042.png (46.09 KB, 下载次数: 2)
下载附件
2024-12-26 09:52 上传

一个是需要选中处理的PDF路径，一个是保存的位置
制作背景：由于有大量excel文件需要打印，合并了EXCEL再生成PDF查看格式是否发生变化，看到合并的pdf后一堆空白页，根本删不完，而且也在网上找方法大部分都是教怎么预览删除，于是就制作了批量删除空白页的小软件，刚好同事也需要，但是她没python环境，干脆用tkinter做了简单的UI，打包后文件有点大（约60M），这个也没优化了。。将就着能用！
下载:https://wwww.lanzoue.com/iSbaB2j25jyh 密码:ar92
文件, 微软

相关帖子

• ✅ 泉州电信 CN2

• 帮忙看下站点访问日志和生成的木马文件

• [干货]关于电商行业模特试穿分享

• 【原创汉化】Everything 1.5.0.1416b 简体中文完全汉化绿色版(高亮搜索关键词)

• 微软 2FA 好像可以云备份了

• 7-Zip 26.02 中文美化解 NSIS 脚本版

• 无意发现残缺精简显卡驱动方法

• 建议VENTOY增加默认镜像文件夹。

• 启动U盘开源制作工具 Ventoy 1.1.16

• Windows 11内存标准再变！从32GB最佳、到8GB够用

 L57860598   2025-1-27 14:09:14

感谢楼主分享！
我给喜欢收藏的家人补个网盘链接一键转存。
通过网盘分享的文件：PDF 空白页移除工具
链接: https://pan.baidu.com/s/1otAkxRN6yqIt1pRHDOvjOg?pwd=52pj
链接：https://pan.quark.cn/s/d7f90a6df0e5
提取码：Y8mw
https://drive.uc.cn/s/86393ff5f5234
密码：2VpK

泡泡汽水
OP
  2025-1-27 14:10:09

djc82 发表于 2025-1-6 17:51
我用些废文件扫了个50页的，里面有十几页空白页，你有时间研究一下：http://4275.com/f2rw3o
扫描件PDF通常是由图像组成的，不是文本。传统的基于文本提取的方法extract_text()可能无法准确地检测到空白页。。
重新写了将每一页转换为灰度图像，然后应用二值化（阈值化）处理。如果页面几乎完全是白色或接近白色，则认为它是空白页面。
使用Python库如Pillow (PIL) 来处理图像，并结合PyMuPDF (fitz) 或者 PyPDF2 来遍历PDF页面。
[Python] 纯文本查看复制代码import fitz  # PyMuPDF
from PIL import Image
import numpy as np
def is_blank_image(page_image, threshold=0.95):
"""Determine if an image is blank based on the percentage of white pixels."""
img = np.array(page_image)
gray = np.dot(img[...,:3], [0.2989, 0.5870, 0.1140])  # Convert to grayscale
white_pixels_ratio = np.mean(gray >= 250)  # Check for almost white pixels
return white_pixels_ratio > threshold
def remove_blank_pages_from_scanned_pdf(input_pdf_path, output_pdf_path):
doc = fitz.open(input_pdf_path)
non_blank_pages = []
for page_num in range(len(doc)):
      page = doc.load_page(page_num)
      pix = page.get_pixmap()
      img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
      if not is_blank_image(img):
         non_blank_pages.append(page_num)
# Create a new PDF with only non-blank pages
writer = fitz.open()
for page_num in non_blank_pages:
      writer.insert_pdf(doc, from_page=page_num, to_page=page_num)
writer.save(output_pdf_path)

alancxl   2025-1-27 14:10:48

有实用性

 青春莫相随   2025-1-27 14:11:42

感谢分享

 ybsypy   2025-1-27 14:12:25

试试好用，有时候用得着

 zlzx01   2025-1-27 14:13:08

看起来不错啊

 啥叫破解   2025-1-27 14:14:02

留着备用

 yigaosoft   2025-1-27 14:14:45

好的，收藏了，除了文件比较大，其他一点毛病没有

 gaoxiaoao   2025-1-27 14:15:45

感谢楼主分享，扫描的空白页能删除吗

PDF文件批量删除空白页（带UI界面+成品）

相关帖子

热门主题

最近收BA的人很多交易了要立刻取消BA 教训

刚看了一个视频，让我又清醒了一下

小小农民新开中转站，欢迎来踩

港版安卓机是满血的国际版安卓机吗？

我 ThreeJSON 又回来了： V 友们批评得对！

继之前 5.4 的 “收口”之后， 5.6 Sol 好

折腾 homelab 挺长时间了建了一个群想不

codex 打开风扇狂转怎么办

Vibe 的一个中文起名小工具

你们明天要去看周星驰的电影么？

热门板块

公告

网站帮助 - Yoo趣儿

我们的愿景

在 Yoo趣儿投放广告

Yoo趣儿网站用户应遵守规则

PDF文件批量删除空白页（带UI界面+成品）

相关帖子

热门主题

最近收BA的人很多 交易了要立刻取消BA 教训

刚看了一个视频，让我又清醒了一下

小小农民新开中转站，欢迎来踩

港版安卓机是满血的国际版安卓机吗？

我 ThreeJSON 又回来了： V 友们批评得对！

继之前 5.4 的 “收口”之后， 5.6 Sol 好

折腾 homelab 挺长时间了 建了一个群 想不

codex 打开风扇狂转怎么办

Vibe 的一个中文起名小工具

你们明天要去看周星驰的电影么？

热门板块

公告

网站帮助 - Yoo趣儿

我们的愿景

在 Yoo趣儿 投放广告

Yoo趣儿网站用户应遵守规则

最近收BA的人很多交易了要立刻取消BA 教训

折腾 homelab 挺长时间了建了一个群想不

在 Yoo趣儿投放广告