本地 tts 生成，基于最新的 Kokoro 模型 （啥都好，就是 ...

作者：pyjiujiu 发布时间：2025-1-27 14:08:30

前言：看新闻出了强大的 tts 轻量化模型，于是就抱着试一试心态测试下，效果确实可以，自认为终于不必薅 edge-tts 的羊毛了
这篇也主要是向各位介绍下，顺便交流
说明：
1 暂时
[color=]仅支持英文
，（中文还不支持还在研发中）
2 模型地址：
https://hf-mirror.com/hexgrad/Kokoro-82M （镜像地址，可直接访问）
3 用的新的辅助三方库 kokoro-onnx，仓库地址：https://github.com/thewh1teagle/kokoro-onnx
4 模型文件 kokoro-v0_19.onnx 体积 329MB （fp32 精度的版本）（可以通过 hf 或 github 找链接下载）
模型应该还可以量化，比如fp16,int8之类，未来可以期待一波
5 还有个 voices.json 文件，  这个是 kokoro-onnx仓库自己的操作，将模型发布带的 voicepack 转过来的（需要从github下载）
---分割---
这个本来是个测试，不过AI辅助很方便，那么就顺手写个 GUI，（简陋勿怪，仅为测试）
* 需要先安装  kokoro-onnx
[color=]pip install kokoro-onnx
* 两个文件放在脚本同目录即可

mulu.PNG (12.91 KB, 下载次数: 0)
下载附件
2025-1-21 00:15 上传

* 简单的界面

screenshot01.PNG (14.5 KB, 下载次数: 0)
下载附件
2025-1-21 00:16 上传

---代码---
[Python] 纯文本查看复制代码
import tkinter as tk
from tkinter import ttk, scrolledtext, messagebox
from kokoro_onnx import Kokoro
import soundfile as sf
import threading  # For running TTS in a separate thread
import time
from functools import wraps
from datetime import datetime
# 获取当前时间
now = datetime.now()
formatted_time = now.strftime('%Y%m%d_%H%M')
def timeit(func):
"""
一个用于测量函数运行时间的装饰器。
Args:
      func: 要装饰的函数。
Returns:
      一个封装了计时功能的函数。
"""
@wraps(func)
def wrapper(*args, **kwargs):
      start_time = time.time()
      result = func(*args, **kwargs)
      end_time = time.time()
      execution_time = end_time - start_time
      print(f"函数 '{func.__name__}' 运行时间: {execution_time:.2f} 秒")
      return result
return wrapper

VOICE_NAME = [
'af', # Default voice is a 50-50 mix of Bella & Sarah
'af_bella', 'af_sarah', 'am_adam', 'am_michael',
'bf_emma', 'bf_isabella', 'bm_george', 'bm_lewis',
'af_nicole', 'af_sky',
]
#暂时语言仅英语
LANG_NAME =[
"en-us",  # English
"en-gb",  # English (British)
"fr-fr",  # French
"ja",  # Japanese
"ko",  # Korean
"cmn",  # Mandarin Chinese
]
class TTSApp:
def __init__(self, root):
      self.root = root
      self.root.title("Kokoro TTS GUI")
      self.kokoro = None  # Initialize Kokoro instance
      self.create_widgets()
def create_widgets(self):
      # --- Text Area ---
      ttk.Label(self.root, text="Text to Speak:").grid(row=0, column=0, sticky="w", padx=5, pady=5)
      self.text_area = scrolledtext.ScrolledText(self.root, wrap=tk.WORD, width=60, height=10)
      self.text_area.grid(row=1, column=0, columnspan=3, padx=5, pady=5)
      # --- Voice Parameter ---
      ttk.Label(self.root, text="Voice:").grid(row=2, column=0, sticky="w", padx=5, pady=5)
      self.voice_var = tk.StringVar(value="af") # Default value
      self.voice_combobox = ttk.Combobox(self.root, textvariable=self.voice_var, values=VOICE_NAME)
      self.voice_combobox.grid(row=2, column=1, sticky="ew", padx=5, pady=5)
      # --- Speed Parameter ---
      ttk.Label(self.root, text="Speed (0.5-2.0):").grid(row=3, column=0, sticky="w", padx=5, pady=5)
      self.speed_var = tk.DoubleVar(value=1.0)  # Default speed
      self.speed_entry = ttk.Entry(self.root, textvariable=self.speed_var)
      self.speed_entry.grid(row=3, column=1, sticky="ew", padx=5, pady=5)
      # --- Language Parameter ---
      ttk.Label(self.root, text="Language:").grid(row=4, column=0, sticky="w", padx=5, pady=5)
      self.lang_var = tk.StringVar(value="en-us")
      self.lang_entry = ttk.Entry(self.root, textvariable=self.lang_var)
      self.lang_entry.grid(row=4, column=1, sticky="ew", padx=5, pady=5)
      # --- Run Button ---
      self.run_button = ttk.Button(self.root, text="Generate Speech", command=self.run_tts)
      self.run_button.grid(row=5, column=0, columnspan=3, pady=10)
      self.root.columnconfigure(1,weight=1) # Make column expand
def run_tts(self):

      text = self.text_area.get("1.0", "end-1c").strip()
      voice = self.voice_var.get()
      try:
      speed = float(self.speed_var.get())
      if not 0.5

模型, 还不

相关帖子

zhengzhenhui945 2025-1-27 14:09:22

模型读出来的英文很有质感，遗憾的是没有中文，期待啊

没方向感 2025-1-27 14:09:52

加入播放功能：
pip install kokoro-onnx soundfile pygame
下载模型数据
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/kokoro-v0_19.onnx
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/voices.bin
[Python] 纯文本查看复制代码import tkinter as tk
from tkinter import ttk, scrolledtext, messagebox
from kokoro_onnx import Kokoro
import soundfile as sf
import threading  # For running TTS in a separate thread
import pygame
import time
from functools import wraps
from datetime import datetime
# 获取当前时间
now = datetime.now()
formatted_time = now.strftime('%Y%m%d_%H%M')
VOICE_NAME = [
'af', # Default voice is a 50-50 mix of Bella & Sarah
'af_bella', 'af_sarah', 'am_adam', 'am_michael',
'bf_emma', 'bf_isabella', 'bm_george', 'bm_lewis',
'af_nicole', 'af_sky',
]
#暂时语言仅英语
LANG_NAME =[
"en-us",  # English
"en-gb",  # English (British)
"fr-fr",  # French
"ja",  # Japanese
"ko",  # Korean
"cmn",  # Mandarin Chinese
]
def play_mp3(file_path):
pygame.mixer.init()  # 初始化混音器
pygame.mixer.music.load(file_path)  # 加载MP3文件
pygame.mixer.music.play()  # 播放音乐
while pygame.mixer.music.get_busy():  # 等待音乐播放完毕
      continue
pygame.mixer.music.unload()  # 卸载音乐
pygame.mixer.quit()  # 退出混音器

class TTSApp:
def __init__(self, root):
      self.root = root
      self.root.title("Kokoro TTS GUI")
      self.kokoro = None  # Initialize Kokoro instance
      self.create_widgets()

def create_widgets(self):
      # --- Text Area ---
      ttk.Label(self.root, text="文本朗读内容:").grid(row=0, column=0, sticky="w", padx=5, pady=5)

      self.text_area = scrolledtext.ScrolledText(self.root, wrap=tk.WORD, width=60, height=10)
      self.text_area.grid(row=1, column=0, columnspan=3, padx=5, pady=5)

      # --- Voice Parameter ---
      ttk.Label(self.root, text="音量:").grid(row=2, column=0, sticky="w", padx=5, pady=5)
      self.voice_var = tk.StringVar(value="af") # Default value
      self.voice_combobox = ttk.Combobox(self.root, textvariable=self.voice_var, values=VOICE_NAME)
      self.voice_combobox.grid(row=2, column=1, sticky="ew", padx=5, pady=5)

      # --- Speed Parameter ---
      ttk.Label(self.root, text="速度 (0.5-2.0):").grid(row=3, column=0, sticky="w", padx=5, pady=5)
      self.speed_var = tk.DoubleVar(value=1.0)  # Default speed
      self.speed_entry = ttk.Entry(self.root, textvariable=self.speed_var)
      self.speed_entry.grid(row=3, column=1, sticky="ew", padx=5, pady=5)

      # --- Language Parameter ---
      ttk.Label(self.root, text="语言:").grid(row=4, column=0, sticky="w", padx=5, pady=5)
      self.lang_var = tk.StringVar(value="en-us")
      self.lang_entry = ttk.Entry(self.root, textvariable=self.lang_var)
      self.lang_entry.grid(row=4, column=1, sticky="ew", padx=5, pady=5)

      # --- infomation ---
      self.info= ttk.Label(self.root, text="")
      self.info.grid(row=5, column=1, sticky="ew", padx=5, pady=5,columnspan=10)

      # --- Run Button ---
      self.run_button = ttk.Button(self.root, text="生成语音", command=self.run_tts)
      self.run_button.grid(row=6, column=0, columnspan=3, pady=10)
      self.root.columnconfigure(1,weight=1) # Make column expand

def run_tts(self):

      text = self.text_area.get("1.0", "end-1c").strip()
      voice = self.voice_var.get()
      try:
         speed = float(self.speed_var.get())
         if not 0.5

pyjiujiu

OP

2025-1-27 14:10:48

zhangsan2022 发表于 2025-1-21 09:53
中文的版本什么时候支持，期待。
在这里先更新说明：
1-21 模型作者说法，本月底前会放出下一个版本。
还没有放出的 0.23 版本（中间试验版本），可以在 hugging face 上体验（中文和 edge 差不多，但无法兼读字母数字，瑕疵还很多）
地址：https://huggingface.co/spaces/hexgrad/Kokoro-TTS
---分割线---
根据 hugging face #36的说法（1-12）
目前放出的是 0.19版（12月份放的），作者 hexgrad 实际已经训练好 0.23版，但还不准备放（据说已经支持中文），现在准备继续训练。
因为社区在给他持续提供更丰富的 data，处理数据也需要时间。
- If successful, you should expect the next-gen Kokoro model to ship with more voices and languages, also under an Apache 2.0 license, with a similar 82M parameter architecture.
- If unsuccessful, it would most likely be because the model does not converge, i.e. loss does not go down. That could be because of data quality issues, architecture limitations, overfitting on old data,
underfitting on new data, etc. Rollbacks and model collapse are not unheard of in ML, but fingers crossed it does not happen here—or if they do, that I can address such issues should they come up.
根据另外的帖子，0.19 版的架构是缺乏 encoder的（架构原型是 StyleTTS 2），后续要推出带encoder的，而且作者明确要实现 voice clone的功能（需要自己后训练）
因为基础模型的参数量就很小，作者有自信这将是最简单的声音克隆实施。

52wjj 2025-1-27 14:11:20

太强了，果断收藏，感谢分享

Do_zh 2025-1-27 14:12:16

期待赶紧出中文。

buybuy 2025-1-27 14:13:12

没中文的就先不试了

kongson 2025-1-27 14:13:46

这个可以，备用了，谢谢

SherlockProel 2025-1-27 14:14:31

不错，搞下来玩耍一番

13534870834 2025-1-27 14:15:08

现在的tts都要收费

本地 tts 生成，基于最新的 Kokoro 模型（啥都好，就是还不支持中文）

相关帖子

浏览过的版块

热门主题

ioio事件是什么鬼？

养老贷又来了，贷贷相传啊

今天要撸2次

好评有礼给的是红包还是优惠卷

现在干啥都太难了，珍惜吧

淘宝现在也好难搞啊

现在的ai能生产图文结合的内容吗

周固固突然发飙了，谁惹他了呢？吃光群众等

怎么吵架了啊

老坛们看过来，周固固同志狂撒金币。折射一

热门板块

公告

网站帮助 - Yoo趣儿

我们的愿景

在 Yoo趣儿投放广告

Yoo趣儿网站用户应遵守规则

本地 tts 生成，基于最新的 Kokoro 模型 （啥都好，就是还不支持中文）

相关帖子

浏览过的版块

热门主题

ioio事件是什么鬼？

养老贷又来了，贷贷相传啊

今天要撸2次

好评有礼给的是红包还是优惠卷

现在干啥都太难了，珍惜吧

淘宝现在也好难搞啊

现在的ai能生产图文结合的内容吗

周固固突然发飙了，谁惹他了呢？吃光群众等

怎么吵架了啊

老坛们看过来，周固固同志狂撒金币。折射一

热门板块

公告

网站帮助 - Yoo趣儿

我们的愿景

在 Yoo趣儿 投放广告

Yoo趣儿网站用户应遵守规则

本地 tts 生成，基于最新的 Kokoro 模型（啥都好，就是还不支持中文）

在 Yoo趣儿投放广告