Claude/fix health check ap3 de by mirai-gpro · Pull Request #88 · aigc3d/LAM

mirai-gpro · 2026-02-23T05:45:05Z

No description provided.

Test scripts to verify A2E (Audio2Expression) lip sync quality with Japanese audio input, before investing in ZIP motion replacement or VHAP Japanese FLAME params. Includes: - generate_test_audio.py: EdgeTTS Japanese/English/Chinese audio samples - test_a2e_cpu.py: A2E model loading, Wav2Vec2 feature extraction, ZIP validation - save_a2e_output.py: Capture A2E 52-dim ARKit blendshape output - analyze_blendshapes.py: Lip sync quality scoring and language comparison - setup_oac_env.py: Auto-detect known OpenAvatarChat issues (CPU mode, deps, config) - chat_with_lam_jp.yaml: Corrected config (Gemini API + EdgeTTS ja-JP-NanamiNeural) - run_all_tests.py: Master test runner - TEST_PROCEDURE.md: Step-by-step test procedure https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Fix RuntimeError: Input data type <class 'list'> is not supported. - diagnose_onnx_error.py: Tests SileroVAD ONNX, SenseVoice, data flow - patch_vad_handler.py: Fixes timestamp[0] NoneType bug, adds defensive numpy type checking on ONNX inputs, handles 2/3-output model variants - setup_oac_env.py: Adds VAD handler bug detection (check 7/7) https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Simple test script that verifies environment, model files, data_bundle.py fix, Wav2Vec2 loading, and A2E module import. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Gemini's OpenAI-compatible API sometimes returns delta.content as dict/list instead of string, causing TypeError in set_main_data(). This patch script detects and safely converts non-string content before passing to data_bundle. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

gemini-2.0-flash returns 404 "no longer available to new users". The error dict then cascades into the set_main_data TypeError. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

SenseVoice auto-detection defaults to Chinese (<|zh|>), causing Japanese speech to be misrecognized as Chinese text. This patch forces language="ja" in the generate() call. - patch_asr_language.py: Auto-patches asr_handler_sensevoice.py - chat_with_lam_jp.yaml: Added language: "ja" to SenseVoice config - TEST_PROCEDURE.md: Added Step 4.5 for patch application https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Instead of creating a separate config file, this script patches the existing working config/chat_with_lam.yaml with 3 changes: 1. TTS voice → ja-JP-NanamiNeural 2. LLM system_prompt → Japanese 3. ASR language → ja https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Root cause analysis from production logs: - 1st ASR call: rtf=0.629 (1.25s) - OK - 2nd ASR call: rtf=15.027 (29.83s) - GPU memory exhausted, CPU fallback - fastrtc 60s timeout triggers, resets frame pipeline → system unresponsive Fix: Add torch.cuda.empty_cache() + gc.collect() after each SenseVoice and LAM inference to free GPU memory between calls. Also adds startup wrapper with PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Create the missing Audio2Expression inference service that bridges gourmet-support backend (which already has A2E hooks in /api/tts/synthesize) with the actual Wav2Vec2 + LAM A2E decoder pipeline. Services: - audio2exp-service: Flask API accepting MP3 audio, returning 52-dim ARKit blendshape coefficients at 30fps. Includes Wav2Vec2 feature extraction and fallback mode when A2E decoder is unavailable. - Frontend ExpressionManager: Maps A2E blendshapes to GVRM bone system, syncing with audio playback via currentTime. Architecture: TTS → MP3 → audio2exp-service → 52-dim blendshapes → frontend https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

The a2e_engine now searches multiple patterns for the checkpoint: - models/LAM_audio2exp_streaming.tar (flat, user's actual layout) - models/LAM_audio2exp/pretrained_models/*.tar (OpenAvatarChat layout) - models/LAM_audio2exp/*.tar (intermediate layout) Falls back to rglob search if none match. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Full drop-in replacement for gourmet-sp's concierge-controller.ts with Audio2Expression integration applied. Key changes marked with ★ comments: - ExpressionManager import and initialization - session_id added to /api/tts/synthesize requests - A2E expression data used for lip sync when available - FFT-based lip sync preserved as fallback - Proper cleanup in stopAvatarAnimation() and dispose() https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Replaces the scaffold version with the real concierge-controller.ts from gourmet-sp (claude/test-concierge-modal-rewGs branch). A2E integration is already built-in via applyExpressionFromTts() + lamAvatarController. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

uvicorn is an ASGI server (FastAPI/Starlette) and cannot serve Flask (WSGI). This caused the Cloud Run container to fail to start and listen on the port, resulting in deployment timeout. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Covers all components: backend (gourmet-support), frontend (gourmet-sp), audio2exp-service, A2E frontend patches, official HF Spaces ZIP generation procedure, test suite, deployment config, and end-to-end data flow diagrams. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

The audio2exp-service returns frames as arrays of numbers (number[][]), but applyExpressionFromTts expected objects with a .weights property ({weights: number[]}[]), causing TypeError and empty frame buffer. Changed f.weights[i] to frameData[i] to match the actual backend format. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

…AvatarController) The previous implementation used window.lamAvatarController which doesn't exist in this codebase, causing lip sync to completely fail (buffer=0, jaw=0, mouth=0). Additionally, the data format was wrong (f.weights[i] vs the actual number[][] response). Now uses ExpressionManager (vrm-expression-manager.ts) which: - Correctly handles the number[][] frame format from audio2exp-service - Syncs to audioElement.currentTime for accurate lip sync timing - Maps ARKit blendshapes (jawOpen, mouthFunnel, etc.) to GVRM bone system - Calls renderer.updateLipSync() directly Changes: - Import ExpressionManager and initialize in init() - Replace lamAvatarController dependency with ExpressionManager - Add expressionManager.stop() in stopAvatarAnimation() - All 5 call sites (speakTextGCP, speakResponseInChunks x2, shop TTS x2) now correctly drive lip sync through ExpressionManager https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

The import '../avatar/vrm-expression-manager' caused a Vite build error because that file doesn't exist in gourmet-sp's src/scripts/avatar/. Solution: inline the ExpressionManager class directly into concierge-controller.ts. This eliminates the need to copy a separate file into gourmet-sp and avoids import resolution issues. The ARKIT_INDEX map is trimmed to only the 7 mouth-related blendshapes actually used for lip sync (jawOpen, mouthFunnel, mouthPucker, etc.) https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Root cause: this.guavaRenderer doesn't exist on CoreController. LAMAvatar.astro has its own animation loop with buffer/ttsActive state. The ExpressionManager approach was completely wrong architecture. Correct approach: use window.lamAvatarController exposed by LAMAvatar.astro - setExternalTtsPlayer(): links ttsPlayer so LAMAvatar can track playback - queueExpressionFrames(): feeds A2E frames into LAMAvatar's buffer - clearFrameBuffer(): clears buffer on stop/new segment Changes: - Remove inlined ExpressionManager class (120 lines of dead code) - Restore lamAvatarController.setExternalTtsPlayer() with retry (500ms x 20) - applyExpressionFromTts: convert number[][] → {name: value}[] and queue - stopAvatarAnimation: call clearFrameBuffer() to close mouth Console should now show: - "[Concierge] ✅ Linked ttsPlayer with LAMAvatar controller" - "[Concierge] A2E: N frames queued @ 30fps" - LAM Health: buffer>0, ttsActive=true during speech https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

… code Read the ACTUAL LAMAvatar.astro, lam-websocket-manager.ts, and audio-sync-player.ts from gourmet-sp to understand the real architecture. Key findings: - LAMAvatar.getExpressionData() is called at 60fps by renderer - It reads frameBuffer[floor(ttsPlayer.currentTime * frameRate)] - Requires: externalTtsPlayer linked, frameBuffer filled, ttsActive=true - ttsActive is set by play event (requires setExternalTtsPlayer first) 4 chains must ALL work for lip sync: Chain1: Backend must return expression data (needs AUDIO2EXP_SERVICE_URL) Chain2: setExternalTtsPlayer must link ttsPlayer with LAMAvatar Chain3: applyExpressionFromTts must convert & queue frames Chain4: LAMAvatar renders from frameBuffer synced to currentTime Added diagnostic logs at each chain point: [A2E Chain1] expression received or null (backend config issue) [A2E Chain2] setExternalTtsPlayer success or LAMAvatar not found [A2E Chain3] frames queued with jawOpen sample value https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

…meBuffer, support both frame formats Compared with the ORIGINAL gourmet-sp concierge-controller.ts (from claude/test-concierge-modal-rewGs branch) and found 2 bugs: 1. stopAvatarAnimation() called clearFrameBuffer() which resets fadeOutStartTime=null, breaking LAMAvatar's graceful 200ms fade-out. The ORIGINAL code trusts LAMAvatar's own ended event handler. → Removed clearFrameBuffer() from stopAvatarAnimation() 2. Frame data format mismatch: - Original gourmet-sp: f.weights[i] (expects {weights: number[]}[]) - audio2exp-service: number[][] (raw arrays) → Now supports BOTH formats: Array.isArray(f) ? f : f.weights Key fact: before A2E changes, lip sync was working via the renderer's built-in FFT analysis. The A2E code path was dead code (AUDIO2EXP_SERVICE_URL not set). These changes ensure A2E is a pure overlay that doesn't break the existing FFT lip sync. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Root cause: When AUDIO2EXP_SERVICE_URL is set, the backend returns expression data. The original code's applyExpressionFromTts used f.weights[i] on raw number[] arrays, causing TypeError → caught by outer try/catch → isAISpeaking=false → STT worked (lucky bug). My both-format fix removed this error, so audio playback proceeds. But if the browser blocks autoplay (fires play then immediate pause), onended never fires → playPromise never resolves → initializeSession hangs → buttons never enabled → STT completely broken. Fix: Add onpause deadlock prevention to ALL 8 play-and-wait patterns, matching the existing pattern in ack playback (line 588): this.ttsPlayer.onpause = () => { if (this.ttsPlayer.currentTime < 0.1) done(); }; This detects "play then immediate pause" (autoplay block) and resolves the promise, preventing deadlock. Normal mid-playback pauses (currentTime > 0.1) are not affected. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

オリジナルのgourmet-sp concierge-controller.tsとの差分を最小化。唯一の実質変更は applyExpressionFromTts メソッドのみ: - フレーム形式: f.weights[i] → Array.isArray(f) ? f : (f.weights || []) (audio2exp-service の number[][] 形式に対応) - try/catch で非致命的エラーとして処理 - その他全メソッド(speakTextGCP, STT, sendMessage等)はオリジナルと同一 https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

…ration Previous patches removed all GVRM renderer integration (import, guavaRenderer, setupAudioAnalysis, startLipSyncLoop) and replaced with non-existent window.lamAvatarController calls, causing all A2E data to be silently dropped and lip sync to degrade to basic jaw flapping. This rewrite is based on the actual production concierge-controller.ts with minimal A2E additions: - Restore GVRM import, guavaRenderer, setupAudioAnalysis, startLipSyncLoop - Add a2eFrames/a2eFrameRate/a2eNames properties for expression storage - Add setA2EFrames() to store expression data from TTS response - Add computeMouthOpenness() to convert 52-dim ARKit blendshapes to scalar - Modify startLipSyncLoop() to use A2E frames when available, FFT as fallback - Override speakTextGCP() with inline fetch to include session_id - Add session_id to ALL TTS requests (ack, chunks, shop flow) https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

…t GVRM) Root cause: The patch was based on gourmet-support's concierge-controller.ts which uses GVRM renderer, but the actual deployed frontend (gourmet-sp) uses LAMAvatar.astro with a completely different rendering pipeline. Previous patch problems: - Added GVRM import/renderer that doesn't exist in gourmet-sp - Missing linkTtsPlayer() - LAMAvatar never received ttsPlayer reference -> ttsActive=false, buffer=0, lip sync completely dead - Added setupAudioAnalysis()/startLipSyncLoop() for FFT - unnecessary with LAMAvatar - Called clearFrameBuffer() in stopAvatarAnimation() - breaks LAMAvatar fade-out Fix: Use the exact gourmet-sp version which correctly: - Links ttsPlayer to LAMAvatar via setExternalTtsPlayer() in init() - Sends A2E frames via applyExpressionFromTts() -> lamAvatarController.queueExpressionFrames() - Lets LAMAvatar handle all lip sync rendering internally - Does NOT call clearFrameBuffer() in stopAvatarAnimation() https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

…rpolate frames Changes to applyExpressionFromTts(): 1. Mouth blendshape amplification: Scale jawOpen (1.4x), mouthFunnel/Pucker (1.5x), mouthSmile (1.3x), mouthStretch (1.2x) etc. for more visible Japanese vowel distinctions (あ/い/う/え/お) 2. Frame interpolation: 30fps→60fps via linear interpolation between consecutive frames, matching the renderer's ~60fps render loop for smoother animation 3. Diagnostic logging: jawOpen/mouthFunnel/mouthSmile max/avg values logged per expression segment for live quality monitoring 4. LinkTtsPlayer retry: Multiple retry attempts (500ms, 1s, 2s, 4s) with logging to reliably connect ttsPlayer to LAMAvatar even with async initialization Quality context: A2E streaming model (wav2vec2-base-960h, no transformer) produces subtle Japanese phoneme variations. Frontend amplification makes these visible. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

… objects) The user rewrote audio2exp-service with a2e_engine.py (Flask) which returns frames as plain arrays [[0.1, ...], ...] instead of the old FastAPI format [{"weights": [0.1, ...]}, ...]. Frontend now detects both formats: Array.isArray(f) ? f : f.weights https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Step 1: Add __testLipSync() diagnostic to concierge-controller.ts patch - Generates 5 Japanese vowel patterns (あいうえお) with known ARKit values - Creates silent WAV audio, queues frames to LAMAvatar, plays through ttsPlayer - Verifies whether renderer supports full 52-dim blendshapes Step 3: Fix a2e_engine.py to use the proper LAM INFER pipeline - Restore LAM_Audio2Expression module (engines, models, utils, configs) - Rewrite _load_a2e_decoder → _try_load_infer_pipeline using INFER.build() - Use infer_streaming_audio() with context for chunked processing - Includes full postprocessing: smooth_mouth, frame_blending, savitzky_golay, symmetrize, eye_blinks - Falls back to Wav2Vec2 energy-based approximation when INFER unavailable - Add librosa, scipy, addict to requirements.txt - Add libsndfile to Dockerfile https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Three issues fixed during local testing: 1. transformers v5.x requires ignore_mismatched_sizes=True and attn_implementation="eager" for Wav2Vec2Model.from_pretrained() 2. HuggingFace checkpoint is double-wrapped (tar.gz containing pretrained_models/lam_audio2exp_streaming.tar) - auto-extract 3. Bare except in infer.py swallowed tracebacks and crashed on uninitialized output_dict - now logs actual error and recovers Result: audio2exp-service starts with mode="infer" and produces 52-dim ARKit blendshapes from audio input. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Exclude downloaded model weights (wav2vec2, LAM checkpoint ~1.1GB) from version control. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Flask's app.run() auto-loads .env files, which crashes with UnicodeDecodeError if a non-UTF-8 .env exists in the path. Pass load_dotenv=False since env vars are set externally. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

pucker reaching 1.0 (raw ~0.35 × 2.5x boost) caused FLAME LBS numerical overflow (jaw=1.56e+23), destroying the face mesh mid-speech. Changes: - Add BLENDSHAPE_SAFE_MAX=0.7 clamp for all amplified channels - Reduce pucker boost from 2.5→1.0 (raw already sufficient at ~0.35) - Reduce funnel boost from 2.5→2.0 (prevent approaching limit) https://claude.ai/code/session_01TUEGRBQaaga67AXVGbNUs3

…en motion A2E model outputs weak jawOpen (avg~0.05) but excessive mouthLowerDown (raw~0.84), causing "lower lip pull" instead of natural jaw opening. Changes: - jawOpen: 0.85→1.0 (restore to let jaw drive mouth opening) - mouthLowerDown: 0.75→0.35 (suppress dominant lip-pull artifact) - mouthUpperUp: 0.85→0.5 (suppress similarly excessive upper lip) This shifts visual motion from "lip pulling" to "jaw opening" which is anatomically correct for speech. https://claude.ai/code/session_01TUEGRBQaaga67AXVGbNUs3

__testLipSync() diagnostic proved LAMAvatar.astro only passes jawOpen and mouthLowerDown to the WebGL renderer - all other 50 blendshapes (funnel, pucker, smile, stretch) are silently ignored. - Reorganize MOUTH_AMPLIFY with clear sections: rendered vs pending-patch - Adjust mouthLowerDown 0.35→0.5 (one of only 2 rendered values) - Add LAMAVATAR_PATCH.md: documents the getExpressionData() fix needed in gourmet-sp repo to pass full 52-dim blendshape dict https://claude.ai/code/session_01TUEGRBQaaga67AXVGbNUs3

The closed-source SDK (gaussian-splat-renderer-for-lam) only uses jawOpen and mouthLowerDown for FLAME mesh deformation — all other 50 blendshapes are silently ignored. LAMAvatar.astro was already returning full 52-dim data; the bottleneck is the SDK, not our code. Added remapForSdkLimitation() to synthesize composite jawOpen/lowerDown from the full blendshape data: - jawOpen += smile*0.5 + funnel*0.35 + pucker*0.2 + stretch*0.3 - lowerDown += smile*0.6 + stretch*0.4 - pucker*0.15 This encodes vowel-specific mouth shapes into the 2 available channels. Also enhanced diagnostic logging (health check + TTS-Sync) to show funnel, smile, pucker, stretch alongside jaw/mouth values. Updated LAMAVATAR_PATCH.md to correct the initial hypothesis and document the actual SDK limitation and composite workaround. https://claude.ai/code/session_01TUEGRBQaaga67AXVGbNUs3

Two fixes for LAMAvatar.astro: 1. SDK feedback loop explosion (jaw=5億): - Root cause: SDK writes FLAME LBS overflow values back to the returned expressionData object (passed by reference) - Fix: sanitizeExpressionData() at frame start breaks feedback loop - Fix: safeReturnExpression() returns shallow copy to prevent SDK from polluting internal state 2. Mouth over-opening (mouth=0.58 for natural speech): - Composite weights were too aggressive for real A2E output - Reduced: smile*0.5→0.3, funnel*0.35→0.2, stretch*0.3→0.2 (jaw) - Reduced: smile*0.6→0.35, stretch*0.4→0.25, pucker*0.15→0.1 (mouth) - Expected mouth range: 0.3-0.4 (vs 0.5-0.6 before) https://claude.ai/code/session_01TUEGRBQaaga67AXVGbNUs3

…suppress A2E model outputs weak jawOpen (avg~0.055) but excessive lowerDown (raw~0.87). Previous 1.0x/0.5x amplify still resulted in mouth dominating jaw by 1.8-3.3x. Changes: - concierge-controller.ts: jawOpen 1.0→2.5, lowerDown 0.5→0.25 - LAMAvatar.astro remap: jaw composite weights increased (smile 0.3→0.35, funnel 0.2→0.25, stretch 0.2→0.25), mouth composite decreased (smile 0.35→0.2, stretch 0.25→0.15) Expected: jaw > mouth in most frames (natural jaw-led lip movement) https://claude.ai/code/session_01TUEGRBQaaga67AXVGbNUs3

jawOpen 2.5x caused frequent 0.7 clamp hits (mouth opening too wide). 1.5x keeps jaw > mouth ratio while staying in natural range (~0.3-0.55). Composite jaw weights reverted to conservative: smile*0.3, funnel*0.2, stretch*0.2. https://claude.ai/code/session_01TUEGRBQaaga67AXVGbNUs3

…ition fade-out bug Issue 1: Mouth movement too large - jawOpen amplify: 1.5→1.0 (no boost, let JAW_MAX cap handle peaks) - Added JAW_MAX=0.30 and MOUTH_MAX=0.20 explicit caps in remapForSdkLimitation() - Previous approach (amplify tuning) couldn't prevent peaks from exceeding 0.5+ - Cap approach: preserves natural dynamic range for moderate frames, clips peaks Issue 2: Lip sync stopping during TTS chunk transitions - Root cause: clearFrameBuffer() sets ttsActive=false, but old audio's ended=true remains → getExpressionData() triggers premature fade-out to neutral - Fix: added ttsTransitioning flag set in clearFrameBuffer(), cleared in play handler - Fade-out logic now checks !ttsTransitioning to avoid false triggers https://claude.ai/code/session_01TUEGRBQaaga67AXVGbNUs3

Original TEST_PROCEDURE.md assumed the rendering pipeline would use all 52 ARKit blendshapes. Live testing revealed the SDK only uses jawOpen + mouthLowerDown (2 channels). A2E data quality for Japanese is sufficient; the bottleneck is rendering. Revised plan focuses on: - Phase 0: SDK internal investigation (shader intercept, npm decompile) - Phase 1B: WebGL shader patch to enable 52-channel rendering - Phase 2: Alternative renderer (Three.js + custom FLAME) as fallback https://claude.ai/code/session_01TUEGRBQaaga67AXVGbNUs3

…-channel limitation findings - Clarify WebGL 2.0 (not WebGPU), device GPU (not CPU-only) - Add section 1.4: SDK 2-channel limitation details and verification history - Add section 4.5: lip sync tuning rounds (4 iterations) and TTS bug fix - Add section 7.2: SDK internal structure investigation results - Update section 5.1: SDK 2-channel breakthrough as top priority - Update section 8: Phase 0 shader intercept as next action - Update section 9: commit history for tuning and investigation phases https://claude.ai/code/session_01TUEGRBQaaga67AXVGbNUs3

Monkey-patch WebGL2RenderingContext methods BEFORE SDK import to capture: - shaderSource: all vertex/fragment shader source code - createShader: shader type tracking - transformFeedbackVaryings: TF program output variables (key for FLAME deformation) - linkProgram: shader-to-program mapping - texImage2D: expression texture candidates (52-dim related sizes) - getUniformLocation/getAttribLocation: expression-related uniform/attribute names Features: - Auto-classifies shaders by expression/blendshape keywords - Highlights Transform Feedback programs (where FLAME LBS happens) - Flags potential expression textures (52-channel data) - Full shader source dump for expression-related shaders - window.__LAM_SHADER_INTERCEPT.analyze() for structured analysis in DevTools - All data accessible via window.__LAM_SHADER_INTERCEPT global Purpose: Identify which blendshape indices the SDK reads in its TF vertex shader, confirming the 2-channel (jawOpen + mouthLowerDown) limitation and evaluating feasibility of a 52-channel shader patch (Phase 1B). https://claude.ai/code/session_01TUEGRBQaaga67AXVGbNUs3

…boneTexture writes Add hooks for uniform1i, uniform1f, and texSubImage2D to determine: - bsCount runtime value (how many blendshapes the shader processes) - gaussianSplatCount value - boneTexture weight data at texel 20+ (where BS weights are packed) This is critical for Phase 0 SDK investigation: the shader's blendshape loop is generic (for i < bsCount), so the 2-channel limitation must be in the JS code that sets bsCount and packs weights into boneTexture. https://claude.ai/code/session_01TUEGRBQaaga67AXVGbNUs3

Full 26,292-char vertex shader aigc3d#2 captured via WebGL intercept. Contains the critical blendshape loop: for(i < bsCount) that proves the shader supports N channels, not hardcoded to 2. https://claude.ai/code/session_01TUEGRBQaaga67AXVGbNUs3

…endshapes SDK source analysis revealed the 2-channel limitation was a misdiagnosis: - SDK's setExpression() copies ALL expression data to splatMesh.bsWeight - morphTargetDictionary maps ALL blendshape names to boneTexture indices - Shader's for(i < bsCount) loop applies ALL blendshapes from the model Changes: - Remove remapForSdkLimitation() calls (was degrading quality by compressing 52 channels into jaw/lowerDown) - Add logSdkInternals() to verify morphTargetDictionary, bsCount, useFlame, gaussianSplatCount at runtime - Update MOUTH_AMPLIFY to natural values (no more extreme suppression of lowerDown or extreme boost of smile) - Remove JAW_MAX/MOUTH_MAX caps (no longer needed without remapping) https://claude.ai/code/session_01TUEGRBQaaga67AXVGbNUs3

…akthrough - Correct 2-channel misdiagnosis: SDK actually supports all 52 blendshapes - Document complete data flow: getExpressionData → updateBS → setExpression → bsWeight → boneTexture → shader for(i<bsCount) loop - Update next actions: focus on quality evaluation and MOUTH_AMPLIFY tuning - Mark Phase 0 SDK investigation as complete - Add commit history for SDK analysis phase https://claude.ai/code/session_01TUEGRBQaaga67AXVGbNUs3

…pam) Only log float uniforms once (headBoneIndex, splatScale) instead of every frame. Removes visibleRegionFadeStartRadius and other per-frame uniforms that cause console spam. https://claude.ai/code/session_01TUEGRBQaaga67AXVGbNUs3

Phase 0 investigation confirmed bsCount=51 at runtime - SDK supports all 52 ARKit blendshape channels. The shader intercept is no longer needed and was causing console log spam (visibleRegionFadeStartRadius). https://claude.ai/code/session_01TUEGRBQaaga67AXVGbNUs3

A2E outputs raw mouthLowerDown ~0.84, which even after SAFE_MAX clamp (0.7) causes unnaturally wide mouth opening. jawOpen effective max is only ~0.42 (0.28*1.5), creating an imbalance. Reduce mouthLowerDown amplify from 1.0 to 0.5 so effective max becomes ~0.42, matching jawOpen for natural lip sync balance. https://claude.ai/code/session_01TUEGRBQaaga67AXVGbNUs3

jawOpen (amp 1.5x) spikes to 0.48-0.60 while mouthLowerDown adds 0.17-0.22 on top. Combined 0.6-0.7 creates unnatural wide mouth. Reducing to 1.0x keeps raw max ~0.40, giving natural jaw+mouth sum. https://claude.ai/code/session_01TUEGRBQaaga67AXVGbNUs3

mouthLowerDown was dominating all phonemes, making every sound look like "あ" (mouth open). Reduce lowerDown 0.5→0.3x and boost vowel channels: funnel 1.5→2.5 (う/お), pucker 1.0→1.5 (う), smile 2.0→3.5 (い), stretch 1.5→2.0 (え). This increases the relative visibility of vowel-specific mouth shapes. https://claude.ai/code/session_01TUEGRBQaaga67AXVGbNUs3

…lip sync Two issues fixed: 1. A2E output has "dead zones" (values drop to 0.001) during speech, causing mumbling effect. Per-segment dynamic range compression boosts weak frames toward segment mean (40% blend). 2. Symmetric EMA allowed rapid mouth closing during brief zero frames. Asymmetric EMA: fast attack (α=0.82) for crisp openings, slow decay (α=0.45) to prevent jarring closures. https://claude.ai/code/session_01TUEGRBQaaga67AXVGbNUs3

The mouth movements were too small ("ごにょごにょ") because jawOpen and lowerDown were over-reduced. Now safe to increase them because bidirectional dynamic range compression controls peaks while boosting valleys. Changes: - jawOpen: 1.0→2.0 (compression tames raw 0.40→0.70 peaks to ~0.35) - lowerDown: 0.3→0.5 (compression brings raw 0.84 peaks to ~0.30) - Compression: one-sided→bidirectional, factor 0.4→0.5 Both peaks AND valleys compressed toward segment mean https://claude.ai/code/session_01TUEGRBQaaga67AXVGbNUs3

Per-channel compression (compressing each channel independently toward its mean) was destroying vowel shape differences - "あ" and "い" ended up looking the same. New approach: normalize total mouth energy per frame while preserving relative channel proportions (= phoneme shape). - Energy floor 0.35: weak frames get proportionally scaled up - Energy ceiling 1.8: peak frames get proportionally scaled down - Channel RATIOS preserved: jaw/smile/pucker proportions stay intact - Result: consistent amplitude WITH phoneme differentiation https://claude.ai/code/session_01TUEGRBQaaga67AXVGbNUs3

Reduce exaggeration and snappiness for natural-looking speech: - jawOpen: 2.0→1.5, lowerDown: 0.5→0.4 (less extreme opening) - smile: 3.5→3.0, funnel: 2.5→2.0, stretch: 2.0→1.8 (softer shapes) - Energy floor: 0.35→0.15 (allow natural quiet moments) - Energy ceiling: 1.8→1.2 (reduce peak exaggeration) - EMA attack: 0.82→0.60 (smoother transitions, less snapping) - EMA decay: 0.45→0.50 (slightly more responsive closing) https://claude.ai/code/session_01TUEGRBQaaga67AXVGbNUs3

前回(c1cb895)は誇張/テキパキ過ぎ、直前(c03e410)はもごもご過ぎ。両者の中間値に調整: - jawOpen: 1.5→1.7, lowerDown: 0.4→0.45 - smile: 3.0→3.2, funnel: 2.0→2.2, stretch: 1.8→1.9 - energyFloor: 0.15→0.25, energyCeiling: 1.2→1.5 - attackAlpha: 0.60→0.70, decayAlpha: 0.50→0.48 https://claude.ai/code/session_01TUEGRBQaaga67AXVGbNUs3

中間値でもまだ「もごもご」→振幅不足が主因。前回「テキパキ過ぎ」の問題はEMA速度が原因だった。 → 振幅は誇張版の90%に引き上げ、EMAは65%に留めてはっきり動きつつ自然な遷移を目指す。 - jawOpen: 1.7→1.9, lowerDown: 0.45→0.48 - smile: 3.2→3.4, funnel: 2.2→2.4, stretch: 1.9→2.0 - energyFloor: 0.25→0.30, energyCeiling: 1.5→1.6 - attackAlpha: 0.70→0.75 (テキパキ版0.82より控えめ) - decayAlpha: 0.48→0.47 https://claude.ai/code/session_01TUEGRBQaaga67AXVGbNUs3

根本原因: energyFloorはEMA前に適用されるが、decayAlpha=0.47では弱フレームが3-4個続くとEMAが値を引き下げてしまい、せっかくの floor効果が打ち消されていた。対策: - decayAlpha: 0.47→0.35 (核心修正: 口が閉じる速度を大幅に遅くし、 A2Eの死区間でも前フレームの口形状が維持される) - jawOpen: 1.9→2.2 (誇張版超え: raw値が低いA2E出力を補償) - energyFloor: 0.30→0.45 (弱フレームをより積極的に引き上げ) - energyCeiling: 1.6→1.8, attackAlpha: 0.75→0.78 - smile: 3.4→3.5, funnel: 2.4→2.5, lowerDown: 0.48→0.50 decay低下後のシミュレーション (jaw=0.2→弱frame×4): 旧: 0.13→0.09→0.07→0.06 (もごもご) 新: 0.16→0.13→0.11→0.10 (口形状維持) https://claude.ai/code/session_01TUEGRBQaaga67AXVGbNUs3

前回の問題: decayAlpha=0.35（遅い減衰）で死区間のもごもごは改善したが、前の母音形状が長く残り、母音遷移のタイミングがズレた。根本的にdecay速度だけでは「もごもご防止」と「母音同期」を同時に達成できない。新方式: 2段階フロア 1. Step 2.3: プレEMAフロア (0.45) — 振幅ブースト 2. Step 2.5: EMAスムージング — decay=0.50で母音遷移に追従 3. Step 2.7: ポストEMAフロア (0.18) — EMA後の死区間を救済 → チャンネル比率維持で母音形状を壊さず最低エネルギー保証 EMA decay=0.50に戻して母音切替のタイミングを回復しつつ、ポストEMAフロアで「decay後に口が閉じ切る」問題を解決。 https://claude.ai/code/session_01TUEGRBQaaga67AXVGbNUs3

claude added 30 commits February 20, 2026 03:00

Add standalone A2E Japanese audio test script

081f904

Simple test script that verifies environment, model files, data_bundle.py fix, Wav2Vec2 loading, and A2E module import. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Update Gemini model to gemini-2.5-flash (2.0-flash deprecated)

b50178e

gemini-2.0-flash returns 404 "no longer available to new users". The error dict then cascades into the set_main_data TypeError. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

chore: add .gitignore for audio2exp-service model files

a8a68c3

Exclude downloaded model weights (wav2vec2, LAM checkpoint ~1.1GB) from version control. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

claude and others added 30 commits February 24, 2026 01:24

Add files via upload

16d3eef

Add files via upload

aa316e5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Claude/fix health check ap3 de#88

Claude/fix health check ap3 de#88
mirai-gpro wants to merge 70 commits intoaigc3d:masterfrom
mirai-gpro:claude/fix-health-check-ap3De

mirai-gpro commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mirai-gpro commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants