Claude/poc testing support continuation ja dm o by mirai-gpro · Pull Request #90 · aigc3d/LAM

mirai-gpro · 2026-02-25T05:13:08Z

No description provided.

Test scripts to verify A2E (Audio2Expression) lip sync quality with Japanese audio input, before investing in ZIP motion replacement or VHAP Japanese FLAME params. Includes: - generate_test_audio.py: EdgeTTS Japanese/English/Chinese audio samples - test_a2e_cpu.py: A2E model loading, Wav2Vec2 feature extraction, ZIP validation - save_a2e_output.py: Capture A2E 52-dim ARKit blendshape output - analyze_blendshapes.py: Lip sync quality scoring and language comparison - setup_oac_env.py: Auto-detect known OpenAvatarChat issues (CPU mode, deps, config) - chat_with_lam_jp.yaml: Corrected config (Gemini API + EdgeTTS ja-JP-NanamiNeural) - run_all_tests.py: Master test runner - TEST_PROCEDURE.md: Step-by-step test procedure https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Fix RuntimeError: Input data type <class 'list'> is not supported. - diagnose_onnx_error.py: Tests SileroVAD ONNX, SenseVoice, data flow - patch_vad_handler.py: Fixes timestamp[0] NoneType bug, adds defensive numpy type checking on ONNX inputs, handles 2/3-output model variants - setup_oac_env.py: Adds VAD handler bug detection (check 7/7) https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Simple test script that verifies environment, model files, data_bundle.py fix, Wav2Vec2 loading, and A2E module import. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Gemini's OpenAI-compatible API sometimes returns delta.content as dict/list instead of string, causing TypeError in set_main_data(). This patch script detects and safely converts non-string content before passing to data_bundle. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

gemini-2.0-flash returns 404 "no longer available to new users". The error dict then cascades into the set_main_data TypeError. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

SenseVoice auto-detection defaults to Chinese (<|zh|>), causing Japanese speech to be misrecognized as Chinese text. This patch forces language="ja" in the generate() call. - patch_asr_language.py: Auto-patches asr_handler_sensevoice.py - chat_with_lam_jp.yaml: Added language: "ja" to SenseVoice config - TEST_PROCEDURE.md: Added Step 4.5 for patch application https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Instead of creating a separate config file, this script patches the existing working config/chat_with_lam.yaml with 3 changes: 1. TTS voice → ja-JP-NanamiNeural 2. LLM system_prompt → Japanese 3. ASR language → ja https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Root cause analysis from production logs: - 1st ASR call: rtf=0.629 (1.25s) - OK - 2nd ASR call: rtf=15.027 (29.83s) - GPU memory exhausted, CPU fallback - fastrtc 60s timeout triggers, resets frame pipeline → system unresponsive Fix: Add torch.cuda.empty_cache() + gc.collect() after each SenseVoice and LAM inference to free GPU memory between calls. Also adds startup wrapper with PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Create the missing Audio2Expression inference service that bridges gourmet-support backend (which already has A2E hooks in /api/tts/synthesize) with the actual Wav2Vec2 + LAM A2E decoder pipeline. Services: - audio2exp-service: Flask API accepting MP3 audio, returning 52-dim ARKit blendshape coefficients at 30fps. Includes Wav2Vec2 feature extraction and fallback mode when A2E decoder is unavailable. - Frontend ExpressionManager: Maps A2E blendshapes to GVRM bone system, syncing with audio playback via currentTime. Architecture: TTS → MP3 → audio2exp-service → 52-dim blendshapes → frontend https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

The a2e_engine now searches multiple patterns for the checkpoint: - models/LAM_audio2exp_streaming.tar (flat, user's actual layout) - models/LAM_audio2exp/pretrained_models/*.tar (OpenAvatarChat layout) - models/LAM_audio2exp/*.tar (intermediate layout) Falls back to rglob search if none match. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Full drop-in replacement for gourmet-sp's concierge-controller.ts with Audio2Expression integration applied. Key changes marked with ★ comments: - ExpressionManager import and initialization - session_id added to /api/tts/synthesize requests - A2E expression data used for lip sync when available - FFT-based lip sync preserved as fallback - Proper cleanup in stopAvatarAnimation() and dispose() https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Replaces the scaffold version with the real concierge-controller.ts from gourmet-sp (claude/test-concierge-modal-rewGs branch). A2E integration is already built-in via applyExpressionFromTts() + lamAvatarController. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

uvicorn is an ASGI server (FastAPI/Starlette) and cannot serve Flask (WSGI). This caused the Cloud Run container to fail to start and listen on the port, resulting in deployment timeout. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Covers all components: backend (gourmet-support), frontend (gourmet-sp), audio2exp-service, A2E frontend patches, official HF Spaces ZIP generation procedure, test suite, deployment config, and end-to-end data flow diagrams. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

The audio2exp-service returns frames as arrays of numbers (number[][]), but applyExpressionFromTts expected objects with a .weights property ({weights: number[]}[]), causing TypeError and empty frame buffer. Changed f.weights[i] to frameData[i] to match the actual backend format. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

…AvatarController) The previous implementation used window.lamAvatarController which doesn't exist in this codebase, causing lip sync to completely fail (buffer=0, jaw=0, mouth=0). Additionally, the data format was wrong (f.weights[i] vs the actual number[][] response). Now uses ExpressionManager (vrm-expression-manager.ts) which: - Correctly handles the number[][] frame format from audio2exp-service - Syncs to audioElement.currentTime for accurate lip sync timing - Maps ARKit blendshapes (jawOpen, mouthFunnel, etc.) to GVRM bone system - Calls renderer.updateLipSync() directly Changes: - Import ExpressionManager and initialize in init() - Replace lamAvatarController dependency with ExpressionManager - Add expressionManager.stop() in stopAvatarAnimation() - All 5 call sites (speakTextGCP, speakResponseInChunks x2, shop TTS x2) now correctly drive lip sync through ExpressionManager https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

The import '../avatar/vrm-expression-manager' caused a Vite build error because that file doesn't exist in gourmet-sp's src/scripts/avatar/. Solution: inline the ExpressionManager class directly into concierge-controller.ts. This eliminates the need to copy a separate file into gourmet-sp and avoids import resolution issues. The ARKIT_INDEX map is trimmed to only the 7 mouth-related blendshapes actually used for lip sync (jawOpen, mouthFunnel, mouthPucker, etc.) https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Root cause: this.guavaRenderer doesn't exist on CoreController. LAMAvatar.astro has its own animation loop with buffer/ttsActive state. The ExpressionManager approach was completely wrong architecture. Correct approach: use window.lamAvatarController exposed by LAMAvatar.astro - setExternalTtsPlayer(): links ttsPlayer so LAMAvatar can track playback - queueExpressionFrames(): feeds A2E frames into LAMAvatar's buffer - clearFrameBuffer(): clears buffer on stop/new segment Changes: - Remove inlined ExpressionManager class (120 lines of dead code) - Restore lamAvatarController.setExternalTtsPlayer() with retry (500ms x 20) - applyExpressionFromTts: convert number[][] → {name: value}[] and queue - stopAvatarAnimation: call clearFrameBuffer() to close mouth Console should now show: - "[Concierge] ✅ Linked ttsPlayer with LAMAvatar controller" - "[Concierge] A2E: N frames queued @ 30fps" - LAM Health: buffer>0, ttsActive=true during speech https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

… code Read the ACTUAL LAMAvatar.astro, lam-websocket-manager.ts, and audio-sync-player.ts from gourmet-sp to understand the real architecture. Key findings: - LAMAvatar.getExpressionData() is called at 60fps by renderer - It reads frameBuffer[floor(ttsPlayer.currentTime * frameRate)] - Requires: externalTtsPlayer linked, frameBuffer filled, ttsActive=true - ttsActive is set by play event (requires setExternalTtsPlayer first) 4 chains must ALL work for lip sync: Chain1: Backend must return expression data (needs AUDIO2EXP_SERVICE_URL) Chain2: setExternalTtsPlayer must link ttsPlayer with LAMAvatar Chain3: applyExpressionFromTts must convert & queue frames Chain4: LAMAvatar renders from frameBuffer synced to currentTime Added diagnostic logs at each chain point: [A2E Chain1] expression received or null (backend config issue) [A2E Chain2] setExternalTtsPlayer success or LAMAvatar not found [A2E Chain3] frames queued with jawOpen sample value https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

…meBuffer, support both frame formats Compared with the ORIGINAL gourmet-sp concierge-controller.ts (from claude/test-concierge-modal-rewGs branch) and found 2 bugs: 1. stopAvatarAnimation() called clearFrameBuffer() which resets fadeOutStartTime=null, breaking LAMAvatar's graceful 200ms fade-out. The ORIGINAL code trusts LAMAvatar's own ended event handler. → Removed clearFrameBuffer() from stopAvatarAnimation() 2. Frame data format mismatch: - Original gourmet-sp: f.weights[i] (expects {weights: number[]}[]) - audio2exp-service: number[][] (raw arrays) → Now supports BOTH formats: Array.isArray(f) ? f : f.weights Key fact: before A2E changes, lip sync was working via the renderer's built-in FFT analysis. The A2E code path was dead code (AUDIO2EXP_SERVICE_URL not set). These changes ensure A2E is a pure overlay that doesn't break the existing FFT lip sync. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Root cause: When AUDIO2EXP_SERVICE_URL is set, the backend returns expression data. The original code's applyExpressionFromTts used f.weights[i] on raw number[] arrays, causing TypeError → caught by outer try/catch → isAISpeaking=false → STT worked (lucky bug). My both-format fix removed this error, so audio playback proceeds. But if the browser blocks autoplay (fires play then immediate pause), onended never fires → playPromise never resolves → initializeSession hangs → buttons never enabled → STT completely broken. Fix: Add onpause deadlock prevention to ALL 8 play-and-wait patterns, matching the existing pattern in ack playback (line 588): this.ttsPlayer.onpause = () => { if (this.ttsPlayer.currentTime < 0.1) done(); }; This detects "play then immediate pause" (autoplay block) and resolves the promise, preventing deadlock. Normal mid-playback pauses (currentTime > 0.1) are not affected. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

オリジナルのgourmet-sp concierge-controller.tsとの差分を最小化。唯一の実質変更は applyExpressionFromTts メソッドのみ: - フレーム形式: f.weights[i] → Array.isArray(f) ? f : (f.weights || []) (audio2exp-service の number[][] 形式に対応) - try/catch で非致命的エラーとして処理 - その他全メソッド(speakTextGCP, STT, sendMessage等)はオリジナルと同一 https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

…ration Previous patches removed all GVRM renderer integration (import, guavaRenderer, setupAudioAnalysis, startLipSyncLoop) and replaced with non-existent window.lamAvatarController calls, causing all A2E data to be silently dropped and lip sync to degrade to basic jaw flapping. This rewrite is based on the actual production concierge-controller.ts with minimal A2E additions: - Restore GVRM import, guavaRenderer, setupAudioAnalysis, startLipSyncLoop - Add a2eFrames/a2eFrameRate/a2eNames properties for expression storage - Add setA2EFrames() to store expression data from TTS response - Add computeMouthOpenness() to convert 52-dim ARKit blendshapes to scalar - Modify startLipSyncLoop() to use A2E frames when available, FFT as fallback - Override speakTextGCP() with inline fetch to include session_id - Add session_id to ALL TTS requests (ack, chunks, shop flow) https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

…t GVRM) Root cause: The patch was based on gourmet-support's concierge-controller.ts which uses GVRM renderer, but the actual deployed frontend (gourmet-sp) uses LAMAvatar.astro with a completely different rendering pipeline. Previous patch problems: - Added GVRM import/renderer that doesn't exist in gourmet-sp - Missing linkTtsPlayer() - LAMAvatar never received ttsPlayer reference -> ttsActive=false, buffer=0, lip sync completely dead - Added setupAudioAnalysis()/startLipSyncLoop() for FFT - unnecessary with LAMAvatar - Called clearFrameBuffer() in stopAvatarAnimation() - breaks LAMAvatar fade-out Fix: Use the exact gourmet-sp version which correctly: - Links ttsPlayer to LAMAvatar via setExternalTtsPlayer() in init() - Sends A2E frames via applyExpressionFromTts() -> lamAvatarController.queueExpressionFrames() - Lets LAMAvatar handle all lip sync rendering internally - Does NOT call clearFrameBuffer() in stopAvatarAnimation() https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

…rpolate frames Changes to applyExpressionFromTts(): 1. Mouth blendshape amplification: Scale jawOpen (1.4x), mouthFunnel/Pucker (1.5x), mouthSmile (1.3x), mouthStretch (1.2x) etc. for more visible Japanese vowel distinctions (あ/い/う/え/お) 2. Frame interpolation: 30fps→60fps via linear interpolation between consecutive frames, matching the renderer's ~60fps render loop for smoother animation 3. Diagnostic logging: jawOpen/mouthFunnel/mouthSmile max/avg values logged per expression segment for live quality monitoring 4. LinkTtsPlayer retry: Multiple retry attempts (500ms, 1s, 2s, 4s) with logging to reliably connect ttsPlayer to LAMAvatar even with async initialization Quality context: A2E streaming model (wav2vec2-base-960h, no transformer) produces subtle Japanese phoneme variations. Frontend amplification makes these visible. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

… objects) The user rewrote audio2exp-service with a2e_engine.py (Flask) which returns frames as plain arrays [[0.1, ...], ...] instead of the old FastAPI format [{"weights": [0.1, ...]}, ...]. Frontend now detects both formats: Array.isArray(f) ? f : f.weights https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Step 1: Add __testLipSync() diagnostic to concierge-controller.ts patch - Generates 5 Japanese vowel patterns (あいうえお) with known ARKit values - Creates silent WAV audio, queues frames to LAMAvatar, plays through ttsPlayer - Verifies whether renderer supports full 52-dim blendshapes Step 3: Fix a2e_engine.py to use the proper LAM INFER pipeline - Restore LAM_Audio2Expression module (engines, models, utils, configs) - Rewrite _load_a2e_decoder → _try_load_infer_pipeline using INFER.build() - Use infer_streaming_audio() with context for chunked processing - Includes full postprocessing: smooth_mouth, frame_blending, savitzky_golay, symmetrize, eye_blinks - Falls back to Wav2Vec2 energy-based approximation when INFER unavailable - Add librosa, scipy, addict to requirements.txt - Add libsndfile to Dockerfile https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Three issues fixed during local testing: 1. transformers v5.x requires ignore_mismatched_sizes=True and attn_implementation="eager" for Wav2Vec2Model.from_pretrained() 2. HuggingFace checkpoint is double-wrapped (tar.gz containing pretrained_models/lam_audio2exp_streaming.tar) - auto-extract 3. Bare except in infer.py swallowed tracebacks and crashed on uninitialized output_dict - now logs actual error and recovers Result: audio2exp-service starts with mode="infer" and produces 52-dim ARKit blendshapes from audio input. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Exclude downloaded model weights (wav2vec2, LAM checkpoint ~1.1GB) from version control. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Flask's app.run() auto-loads .env files, which crashes with UnicodeDecodeError if a non-UTF-8 .env exists in the path. Pass load_dotenv=False since env vars are set externally. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Cloud Run injects PORT=8080 but gunicorn was hardcoded to bind 8081. Changed to shell-form CMD so $PORT is expanded at runtime. Root cause of: "container failed to start and listen on PORT=8080" https://claude.ai/code/session_011UtE7wm23psDQFEy7rcJBp

Model loading (PyTorch + Wav2Vec2 + A2E) blocks gunicorn from binding the port, causing Cloud Run startup timeout. Now gunicorn starts immediately and the engine loads in a background thread. /health returns 200 with status=loading until ready. /api/audio2expression returns 503 until engine is loaded. https://claude.ai/code/session_011UtE7wm23psDQFEy7rcJBp

The model was being downloaded from HuggingFace at container startup, causing the engine to hang indefinitely on Cloud Run. Now the model is downloaded and saved during docker build. https://claude.ai/code/session_011UtE7wm23psDQFEy7rcJBp

- Install torch/torchaudio from CPU-only index (~200MB vs ~2GB) to dramatically reduce import time (6min → ~30s) - Add torchaudio to fix "No module named 'torchaudio'" error enabling the INFER pipeline instead of fallback mode - Remove pre-baked wav2vec2 download (models already in build context) - Move torch/torchaudio out of requirements.txt into Dockerfile for reliable CPU-only index resolution https://claude.ai/code/session_011UtE7wm23psDQFEy7rcJBp

Warmup inference was hanging indefinitely on Cloud Run (2 vCPU). - Add 120s timeout to warmup so engine still becomes ready even if warmup is slow - Add per-stage timing logs in Wav2Vec2 and Audio2Expression forward passes to identify the bottleneck https://claude.ai/code/session_011UtE7wm23psDQFEy7rcJBp

https://claude.ai/code/session_011UtE7wm23psDQFEy7rcJBp

2Gi caused OOM failures on Cloud Run. 4Gi confirmed working. CLAUDE.md added to persist critical decisions across context compression. https://claude.ai/code/session_011UtE7wm23psDQFEy7rcJBp

Current status: deploy completed with 4Gi memory but health check NG. https://claude.ai/code/session_011UtE7wm23psDQFEy7rcJBp

…mer + 5016 identity) 案C: Streaming軽量モデル → Non-Streamingフルモデルへの切替 - a2e_engine.py: checkpoint検索でNon-Streaming優先、config自動判定 - infer.py: infer_batch_audio() バッチ推論メソッド追加 - a2e_engine.py: streaming chunk推論 → batch推論に切替 - WARMUP_TIMEOUT/ENGINE_LOAD_TIMEOUT 環境変数対応 - Dockerfile/start.sh: GUNICORN_TIMEOUT環境変数対応 - DEPLOYMENT_GUIDE.md: フルモデル用パラメータに更新 Streaming (旧): Transformer無し, 12 identity, チャンク推論 Non-Streaming (新): 6層Transformer, 5016 identity, バッチ推論 https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

… ポストプロセッシング強化 Non-Streaming フルモデル (LAM_audio2exp.tar) は3DAIGC未公開のため、 Streaming モデルを継続使用しつつ以下で品質向上: 1. バッチ推論 (infer_batch_audio): 全音声一括入力、チャンク分割なし 2. cfg override: movement_smooth=True, brow_movement=True を強制有効化 3. フルポストプロセッシング: smooth_mouth + brow_movement + savgol + symmetrize + eye_blinks DEPLOYMENT_GUIDE.md: Streaming モデル用のダウンロード手順に修正 https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

…ion and adding smoothing - jawOpen amplification: 1.4x → 1.0x (raw A2E output is sufficient) - mouthLowerDown amplification: 1.3x → 1.0x (reduces downward jaw pull) - Add jawOpen hard cap at 0.35 (prevents spikes like 0.559/0.632) - Add mouthLowerDown cap at 0.25 - Add EMA smoothing (alpha=0.4) to dampen sudden jaw spikes - Enhanced diagnostic logging with lowerDown/pucker/stretch stats https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

concierge-controller.ts: - MOUTH_AMPLIFY: all values reset to 1.0 (no amplification) - Remove energy normalization (Step 2.3) - Remove asymmetric EMA smoothing (Step 2.5) - Remove post-EMA energy floor (Step 2.7) - Keep BLENDSHAPE_SAFE_MAX=0.7 clamp for FLAME LBS stability - Keep 30→60fps linear interpolation - Clean diagnostic logging with all channel stats LAMAvatar.astro: - Remove dead remapForSdkLimitation() method (referenced undefined JAW_MAX/MOUTH_MAX) - Remove commented-out JAW_MAX/MOUTH_MAX constants Pipeline is now: A2E raw → scale(1.0) → clamp(0.7) → interpolate(60fps) → queue Ready for clean parameter tuning from baseline. https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

A2E model output characteristics: - jawOpen: very weak (avg ~0.05) → 1.8x to prevent mumbling - mouthLowerDown: very strong (raw ~0.84) → 0.45x to prevent jaw pull - All other channels: 1.0 (neutral baseline) https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

Previous tuning (jawOpen 1.8x, mouthLowerDown 0.45x) caused lipsync to appear completely stopped. Reverting to 1.0 baseline to restore working state before re-tuning. https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

…ic logs Root cause: INFER.build() hangs indefinitely on Cloud Run CPU, blocking the engine from ever becoming ready. All /api/audio2expression requests return 503, so TTS responses have no expression data → no lipsync. Changes: 1. a2e_engine.py: wrap _try_load_infer_pipeline() in a timeout thread (INFER_LOAD_TIMEOUT env var, default 600s). On timeout, fall back to Wav2Vec2 mode which provides approximate lipsync immediately. 2. a2e_engine.py: add timing logs at each step (import, config parse, INFER.build, model.to) to pinpoint the bottleneck. 3. Dockerfile: pre-extract model tar archive at build time, saving ~7 minutes of runtime extraction on every cold start. https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

INFER.build() can complete on CPU given enough time (previous instance logs showed successful weight loading). Default 600s was too short. With tar pre-extraction saving 7 min, INFER.build() needs ~10-15 min. 1200s (20 min) provides sufficient margin. Wav2Vec2 fallback remains as safety net but should not normally be needed. https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

Previous successful deployment used ENGINE_LOAD_TIMEOUT=1500. Match the INFER_LOAD_TIMEOUT default to the same proven value. https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

INFER model outputs jawOpen in 0.13-0.32 range, causing mumbling appearance. Scale all blendshapes by 1.8x (clamped to 0-1) to improve mouth visibility. Tunable via EXPRESSION_SCALE env var without redeploying. https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

…ode architecture Design document covering the plan to evolve the gourmet-support system into a reusable platform supporting multiple AI application modes (gourmet concierge, customer support, interview) with Gemini Live API integration, while preserving existing endpoints for alpha testing. https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

Root cause: A2E fallback mode outputs noisy per-frame blendshape values (e.g., jaw oscillating 0.5→0.09→0.27 between frames) which are applied directly to the 3D avatar without any frame-to-frame smoothing, causing visible choppy vibration (カクカク). Frontend fix (LAMAvatar.astro): - Add exponential moving average (EMA) with alpha=0.35 to getExpressionData() - Each frame blends smoothly with the previous: smoothed = prev + 0.35*(target-prev) - At 60fps this gives ~95% convergence in ~130ms — smooth yet responsive - Reset EMA state on buffer clear and expression reset Backend fix (a2e_engine.py): - Upgrade fallback smoothing from 3-frame uniform to 2-pass filter: Pass 1: 5-frame Gaussian-like kernel [0.06, 0.24, 0.40, 0.24, 0.06] Pass 2: 3-frame uniform for additional smoothness - Approximates the INFER pipeline's savitzky_golay post-processing https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

Claude による 7コミット分の変更を全て取り消し、前回ヘルスチェック OK だった e36190d の状態に完全復元。取り消し対象: - 770dfd1 INFER load timeout + Wav2Vec2 fallback - 84902f6 INFER_LOAD_TIMEOUT 1200s - 38e9f24 INFER_LOAD_TIMEOUT 1500s - ce103ad conservative expression scaling - 2964376 EMA temporal smoothing - d466f6a revert to Streaming model - 36bf69b switch to Non-Streaming full model https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

Claude による2コミット分の変更を取り消し: - bae5578 reset all parameters to neutral baseline - 2964376 EMA temporal smoothing https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

前回AI作成の PLATFORM_DESIGN.md の信頼性問題を踏まえ、別のAI/エンジニアに設計を一からやり直してもらうための指示書。内容: - 現状構成の確定事実と未確認事項の明確な区別 - 前回設計書の各セクション信頼性評価 - プラットフォーム化・LiveAPI統合の要件 - 参照すべきリポジトリ・論文・OSSの一覧 https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

- §0: プラットフォーム化の目的と直近のゴール（なぜやるのか） - §0.4: 短期記憶（stt_stream.py）と長期記憶（gourmet-support）が別々に開発された経緯と、統一仕様化の要件を明示 - §2.3: LiveAPI導入理由（レイテンシ・割り込み・相槌の問題解決） - §2.3: FLASH版の累積文字数制限と回避ロジックの詳細（コード行番号付き） - §2.3: Live/RESTハイブリッド方式の説明 - §4: 設計書必須セクションに「記憶機能の統一設計」を追加 https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

claude added 30 commits February 20, 2026 03:00

Add standalone A2E Japanese audio test script

081f904

Simple test script that verifies environment, model files, data_bundle.py fix, Wav2Vec2 loading, and A2E module import. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

Update Gemini model to gemini-2.5-flash (2.0-flash deprecated)

b50178e

gemini-2.0-flash returns 404 "no longer available to new users". The error dict then cascades into the set_main_data TypeError. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

chore: add .gitignore for audio2exp-service model files

a8a68c3

Exclude downloaded model weights (wav2vec2, LAM checkpoint ~1.1GB) from version control. https://claude.ai/code/session_01RyVVZ8QGYAn4hoWN6YBteM

claude and others added 30 commits February 22, 2026 13:15

fix(docs): correct Cloud Run region from asia-northeast1 to us-central1

2bd1198

https://claude.ai/code/session_011UtE7wm23psDQFEy7rcJBp

fix(deploy): update memory to 4Gi and add CLAUDE.md project notes

81fac9b

2Gi caused OOM failures on Cloud Run. 4Gi confirmed working. CLAUDE.md added to persist critical decisions across context compression. https://claude.ai/code/session_011UtE7wm23psDQFEy7rcJBp

docs: update CLAUDE.md as handover document for next session

528dd5f

Current status: deploy completed with 4Gi memory but health check NG. https://claude.ai/code/session_011UtE7wm23psDQFEy7rcJBp

Add files via upload

dbcd169

Add files via upload

e955778

revert(lipsync): restore MOUTH_AMPLIFY to 1.0 baseline

a6d81a1

Previous tuning (jawOpen 1.8x, mouthLowerDown 0.45x) caused lipsync to appear completely stopped. Reverting to 1.0 baseline to restore working state before re-tuning. https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

fix(audio2exp): match INFER_LOAD_TIMEOUT to proven 1500s

38e9f24

Previous successful deployment used ENGINE_LOAD_TIMEOUT=1500. Match the INFER_LOAD_TIMEOUT default to the same proven value. https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

Create stt_stream

d16e30c

Add files via upload

014c1d2

Delete AI_Meeting_App/stt_stream

bce4c6a

docs: update CLAUDE.md - audio2exp-service health check now OK

1e80e3f

https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

revert(frontend): restore LAMAvatar.astro to original (e955778)

2b8109b

Claude による2コミット分の変更を取り消し: - bae5578 reset all parameters to neutral baseline - 2964376 EMA temporal smoothing https://claude.ai/code/session_01TLsuhpFcQYgv6ijcrhKLfJ

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Claude/poc testing support continuation ja dm o#90

Claude/poc testing support continuation ja dm o#90
mirai-gpro wants to merge 65 commits intoaigc3d:masterfrom
mirai-gpro:claude/poc-testing-support-continuation-JaDmO

mirai-gpro commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mirai-gpro commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants