Skip to content

Conversation

@songguocola
Copy link
Contributor

No description provided.

@gemini-code-assist
Copy link

Summary of Changes

Hello @songguocola, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request fundamentally changes how audio services, particularly speech synthesis, establish and manage their network connections. By introducing a dedicated WebSocket protocol layer, it aims to provide a more robust and efficient mechanism for real-time audio data exchange. This refactoring centralizes WebSocket logic, making the SpeechSynthesizerV2 more modular and testable, and sets the foundation for improved streaming capabilities.

Highlights

  • New WebSocket Protocol Implementation: Introduced AudioWebsocketCallback and AudioWebsocketRequest classes to establish and manage WebSocket connections for audio-related services, providing a standardized way to handle real-time audio data.
  • SpeechSynthesizerV2 Integration: The SpeechSynthesizerV2 class has been refactored to utilize this new WebSocket protocol for real-time speech synthesis, handling connection, message exchange, and event processing for TTS tasks.
  • Enhanced Testing for Streaming: A new unit test, TestTtsV2SpeechSynthesizerV2, was added to validate the streaming speech synthesis functionality using a mock WebSocket server, ensuring the reliability and correctness of the new protocol integration.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • src/main/java/com/alibaba/dashscope/audio/protocol/AudioWebsocketCallback.java
    • Added a new interface defining callbacks for WebSocket events (onOpen, onMessage, onError, onClose).
  • src/main/java/com/alibaba/dashscope/audio/protocol/AudioWebsocketRequest.java
    • Added a new class extending WebSocketListener to handle WebSocket connection lifecycle, message sending (text and binary), and status management.
    • Includes methods for connecting, closing, and sending messages, with timeout handling for connection.
  • src/main/java/com/alibaba/dashscope/audio/ttsv2/SpeechSynthesizerV2.java
    • Added a new class implementing AudioWebsocketCallback for speech synthesis.
    • Integrates AudioWebsocketRequest to manage the WebSocket connection for TTS.
    • Implements methods for starting, continuing, and finishing speech synthesis tasks via WebSocket messages.
    • Handles incoming text and binary messages from the WebSocket, processing speech synthesis results and audio frames.
  • src/test/java/com/alibaba/dashscope/TestTtsV2SpeechSynthesizerV2.java
    • Added a new JUnit 5 test class to verify the SpeechSynthesizerV2's streaming capabilities.
    • Uses MockWebServer to simulate WebSocket server behavior, sending task-started, task-finished, and binary audio messages.
    • Asserts that audio frames are correctly received and accumulated during a streaming call.
Activity
  • The pull request was created by songguocola.
  • No additional comments or reviews have been made on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new WebSocket protocol for CosyVoice speech synthesis. While a security audit found no vulnerabilities of medium severity or higher, the code review identified several areas for improvement. These include concerns regarding thread safety, exception handling (e.g., potential null pointer exceptions and improper InterruptedException handling), and critical flaws in the test implementation. Addressing these issues will enhance the robustness and reliability of the new functionality.

public void onClosed(WebSocket webSocket, int code, String reason) {
isOpen.set(false);
isClosed.set(true);
connectLatch.get().countDown();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The call to connectLatch.get() could result in a NullPointerException if onClosed is called before connect() has initialized the latch. It's safer to add a null check, similar to what's done in onFailure.

Suggested change
connectLatch.get().countDown();
if (connectLatch.get() != null) {
connectLatch.get().countDown();
}

Comment on lines +472 to +474
} catch (InterruptedException ignored) {
log.error("Interrupted while waiting for streaming complete");
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Catching InterruptedException and not restoring the thread's interrupted status can lead to subtle and hard-to-debug issues, especially in concurrent code. The interrupted flag should be restored by calling Thread.currentThread().interrupt(). This is handled correctly in other parts of this class, like in streamingCall.

      } catch (InterruptedException e) {
        log.error("Interrupted while waiting for streaming complete", e);
        Thread.currentThread().interrupt();
      }

* Accumulates audio data to audioStream. Reuses existing buffer when possible to minimize
* allocations.
*/
private void accumulateAudioData(ByteBuffer frame) throws Exception {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The method signature declares throws Exception, which is too broad. The only checked exception that can be thrown from the method body is IOException from audioStream.write(). It's better to declare the specific exception, throws IOException, for better error handling and code clarity.

Suggested change
private void accumulateAudioData(ByteBuffer frame) throws Exception {
private void accumulateAudioData(ByteBuffer frame) throws java.io.IOException {

callback.onComplete();
}
if (audioStream != null) {
audioStream.reset(); // 重置 ByteArrayOutputStream,清空数据但保留缓冲区

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment is in Chinese, while the rest of the code and comments are in English. For consistency and to make it understandable for a wider audience, please translate the comment to English.

Suggested change
audioStream.reset(); // 重置 ByteArrayOutputStream,清空数据但保留缓冲区
audioStream.reset(); // Reset ByteArrayOutputStream, clear data but keep the buffer

public void onMessage(WebSocket webSocket, String string) {
System.out.println("mock server recv: " + string);
JsonObject req = JsonUtils.parse(string);
if (task_id == "") {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For clarity and to follow best practices, use task_id.isEmpty() instead of task_id == "" to check for an empty string.

Suggested change
if (task_id == "") {
if (task_id.isEmpty()) {

@songguocola songguocola closed this Feb 5, 2026
@songguocola songguocola deleted the dev/cosyvoice_v2 branch February 5, 2026 03:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant