Support kokoro model#4192
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds initial Kokoro TTS support to OVMS’ MediaPipe TTS node by extending request/options handling (language/speed/voice), adapting speaker-embedding loading to the pipeline’s expected shape, and updating build images/dependencies for Kokoro-related requirements.
Changes:
- Extend TTS calculator options and request parsing to support
language,speed, and updated voice handling. - Load speaker embeddings using the pipeline-reported embedding shape and introduce Kokoro-specific WAV output preparation.
- Update Docker/Make build flow to optionally install
espeak-ngand to build against a non-default OpenVINO GenAI source.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
tts_asr_roundtrip.py |
Adds a standalone TTS→ASR roundtrip script for endpoint validation. |
src/audio/text_to_speech/t2s_servable.hpp |
Refactors includes/forward decls; adds members needed for pipeline/voices concurrency. |
src/audio/text_to_speech/t2s_servable.cpp |
Loads speaker embeddings using the pipeline’s expected shape. |
src/audio/text_to_speech/t2s_calculator.proto |
Adds language and speed options to TTS node configuration. |
src/audio/text_to_speech/t2s_calculator.cc |
Parses voice/language/speed from JSON and forwards them into TTS generation; switches audio output writer. |
src/audio/text_to_speech/BUILD |
Adds GenAI dependency for the TTS calculator target. |
src/audio/speech_to_text/s2t_servable.cpp |
Minor whitespace cleanup. |
src/audio/audio_utils.hpp |
Declares Kokoro-specific audio output helper. |
src/audio/audio_utils.cpp |
Implements Kokoro WAV writer (24kHz/float) and updates headers. |
Makefile |
Adds ESPEAK build arg plumbing (default enabled). |
Dockerfile.ubuntu |
Installs espeak-ng optionally; switches GenAI clone to a fork/branch variable. |
Dockerfile.redhat |
Installs espeak-ng optionally; switches GenAI clone to a fork/branch variable. |
| def tts_request(endpoint: str, model: str, voice: str, prompt: str, language: str) -> bytes: | ||
| url = endpoint.rstrip("/") + "/audio/speech" | ||
| payload = { | ||
| "model": model, | ||
| "voice": voice, | ||
| "input": prompt, | ||
| } |
| def split_text_into_chunks(text: str, max_chars: int) -> list[str]: | ||
| if max_chars <= 0: | ||
| return [text] | ||
| text = text.strip() | ||
| if len(text) <= max_chars: | ||
| return [text] | ||
|
|
||
| sentences = [] | ||
| buf = [] | ||
| for ch in text: | ||
| buf.append(ch) | ||
| if ch in "。!?;\n": | ||
| sentence = "".join(buf).strip() | ||
| if sentence: | ||
| sentences.append(sentence) | ||
| buf = [] | ||
| if buf: | ||
| sentence = "".join(buf).strip() | ||
| if sentence: | ||
| sentences.append(sentence) | ||
|
|
||
| chunks = [] | ||
| current = "" | ||
| for s in sentences: | ||
| if not current: | ||
| current = s | ||
| continue | ||
| if len(current) + len(s) <= max_chars: | ||
| current += s | ||
| else: | ||
| chunks.append(current) | ||
| current = s | ||
| if current: | ||
| chunks.append(current) | ||
|
|
||
| if not chunks: | ||
| chunks = [text[i : i + max_chars] for i in range(0, len(text), max_chars)] | ||
| return chunks |
| if (streamIt != payload.parsedJson->MemberEnd()) { | ||
| return absl::InvalidArgumentError("streaming is not supported"); | ||
| } | ||
| SPDLOG_LOGGER_DEBUG(t2s_calculator_logger, "1"); |
| ov::Tensor speakerEmbedding; | ||
| std::string selectedVoice = "af_alloy"; | ||
| if (voiceName.has_value()) { | ||
| generatedSpeech = pipe->ttsPipeline->generate(inputIt->value.GetString(), pipe->voices[voiceName.value()]); | ||
| } else { | ||
| generatedSpeech = pipe->ttsPipeline->generate(inputIt->value.GetString()); | ||
| selectedVoice = voiceName.value(); |
| if (speakerIt != pipe->voices.end()) { | ||
| speakerEmbedding = speakerIt->second; | ||
| } |
| ARG ov_genai_branch=kokoro_tts | ||
| ARG ov_genai_repo=https://github.com/RyanMetcalfeInt8/openvino.genai.git | ||
| WORKDIR /openvino_genai/ | ||
| # hadolint ignore=DL3003 | ||
| RUN if [ "$ov_use_binary" == "0" ]; then \ | ||
| git clone https://github.com/$ov_genai_org/openvino.genai /openvino_genai && cd /openvino_genai && git checkout $ov_genai_branch && git submodule update --init --recursive && \ | ||
| git clone $ov_genai_repo /openvino_genai && cd /openvino_genai && git checkout $ov_genai_branch && git submodule update --init --recursive && \ | ||
| cmake -DCMAKE_BUILD_TYPE=$CMAKE_BUILD_TYPE -DCMAKE_CXX_FLAGS=" ${SDL_OPS} " -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DENABLE_SYSTEM_ICU="True" -DBUILD_TOKENIZERS=OFF -DENABLE_SAMPLES=OFF -DENABLE_TOOLS=OFF -DENABLE_TESTS=OFF -DENABLE_XGRAMMAR=ON -S ./ -B ./build/ && \ |
| WORKDIR /openvino_genai/ | ||
| ARG ov_genai_branch=master | ||
| ARG ov_genai_org=openvinotoolkit | ||
| ARG ov_genai_branch=kokoro_tts | ||
| ARG ov_genai_repo=https://github.com/RyanMetcalfeInt8/openvino.genai.git | ||
| # hadolint ignore=DL3003 | ||
| RUN if [ "$ov_use_binary" == "0" ]; then true ; else exit 0 ; fi ; \ | ||
| git clone https://github.com/$ov_genai_org/openvino.genai /openvino_genai && cd /openvino_genai && git checkout $ov_genai_branch && git submodule update --init --recursive && \ | ||
| git clone $ov_genai_repo /openvino_genai && cd /openvino_genai && git checkout $ov_genai_branch && git submodule update --init --recursive && \ | ||
| cmake -DCMAKE_BUILD_TYPE=$CMAKE_BUILD_TYPE -DCMAKE_CXX_FLAGS=" ${SDL_OPS} ${LTO_CXX_FLAGS} " -DCMAKE_SHARED_LINKER_FLAGS="${LTO_LD_FLAGS}" -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DENABLE_SYSTEM_ICU="True" -DBUILD_TOKENIZERS=OFF -DENABLE_SAMPLES=OFF -DENABLE_TOOLS=OFF -DENABLE_TESTS=OFF -DENABLE_XGRAMMAR=ON -S ./ -B ./build/ && \ |
| RUN_GPU_TESTS ?= | ||
| GPU ?= 0 | ||
| NPU ?= 0 | ||
| ESPEAK ?= 1 |
| ARG ov_genai_branch=kokoro_tts | ||
| ARG ov_genai_repo=https://github.com/RyanMetcalfeInt8/openvino.genai.git | ||
| WORKDIR /openvino_genai/ | ||
| # hadolint ignore=DL3003 | ||
| RUN if [ "$ov_use_binary" == "0" ]; then \ | ||
| git clone https://github.com/$ov_genai_org/openvino.genai /openvino_genai && cd /openvino_genai && git checkout $ov_genai_branch && git submodule update --init --recursive && \ | ||
| git clone $ov_genai_repo /openvino_genai && cd /openvino_genai && git checkout $ov_genai_branch && git submodule update --init --recursive && \ |
| WORKDIR /openvino_genai/ | ||
| ARG ov_genai_branch=master | ||
| ARG ov_genai_org=openvinotoolkit | ||
| ARG ov_genai_branch=kokoro_tts | ||
| ARG ov_genai_repo=https://github.com/RyanMetcalfeInt8/openvino.genai.git | ||
| # hadolint ignore=DL3003 | ||
| RUN if [ "$ov_use_binary" == "0" ]; then true ; else exit 0 ; fi ; \ | ||
| git clone https://github.com/$ov_genai_org/openvino.genai /openvino_genai && cd /openvino_genai && git checkout $ov_genai_branch && git submodule update --init --recursive && \ | ||
| git clone $ov_genai_repo /openvino_genai && cd /openvino_genai && git checkout $ov_genai_branch && git submodule update --init --recursive && \ |
9ff49e4 to
2098a5b
Compare
2098a5b to
3efa972
Compare
mzegla
left a comment
There was a problem hiding this comment.
Do we have documentation updated/ready as well?
| srcs = ["t2s_calculator.cc", | ||
| "tts_node_initializer.cpp"], | ||
| deps = [ | ||
| "//third_party:genai", |
There was a problem hiding this comment.
why is it added now and was not needed before?
| SPDLOG_LOGGER_DEBUG(t2s_calculator_logger, "T2sCalculator [Node: {}] Open start", cc->NodeName()); | ||
| const auto& options = cc->Options<mediapipe::T2sCalculatorOptions>(); | ||
| if (options.has_language() && !options.language().empty()) { | ||
| defaultLanguage = options.language(); |
There was a problem hiding this comment.
Not sure if we should override default values. I would remove default prefix here and just have variables language and speed
| absl::Status Open(CalculatorContext* cc) final { | ||
| SPDLOG_LOGGER_DEBUG(t2s_calculator_logger, "T2sCalculator [Node: {}] Open start", cc->NodeName()); | ||
| const auto& options = cc->Options<mediapipe::T2sCalculatorOptions>(); | ||
| if (options.has_language() && !options.language().empty()) { |
There was a problem hiding this comment.
Should we have it in calculator options? Is it pipeline level configuration?
| for (const auto& [name, _] : pipe->voices) { | ||
| if (!available.empty()) | ||
| available += ", "; | ||
| available += name; |
There was a problem hiding this comment.
Wouldn't we start with ',' in that case? Like after loop executes we have for example
available == ", voice1, voice2, voice3" ?
| // pass the requested name through to the pipeline with an empty embedding. | ||
| } else if (!pipe->voices.empty()) { | ||
| // No voice in the request - pick a default from the voices loaded by the servable. | ||
| auto preferredIt = pipe->voices.find("af_alloy"); |
There was a problem hiding this comment.
move af_alloy to a constant and keep it higher in the code?
| return elementsCount; | ||
| } | ||
|
|
||
| static ov::Tensor read_speaker_embedding(const std::filesystem::path& file_path, const ov::Shape& expectedShape) { |
| size_t num_floats = buffer_size / sizeof(float); | ||
| if (num_floats != 512) { | ||
| throw std::runtime_error("File must contain speaker embedding including 512 32-bit floats."); | ||
| const size_t numFloats = buffer_size / sizeof(float); |
| format.bitsPerSample = 32; | ||
| drwav wav; | ||
|
|
||
| auto status = drwav_init_memory_write(&wav, ppData, &pDataSize, &format, nullptr); |
There was a problem hiding this comment.
Don't we need to check ppData validity?
| } | ||
| hfSettings.exportSettings.modelType = modelType; | ||
| if (result->count("language")) { | ||
| textToSpeechGraphSettings.language = result->operator[]("language").as<std::string>(); |
There was a problem hiding this comment.
Shouldn't we check if language field is actually convertable to string?
| ARG ov_genai_branch=kokoro_tts | ||
| ARG ov_genai_repo=https://github.com/RyanMetcalfeInt8/openvino.genai.git |
There was a problem hiding this comment.
this will need to be reverted before merge
🛠 Summary
JIRA/Issue if applicable.
Describe the changes.
🧪 Checklist
``