Andrew Koh dhecloud

👋 Hi, I'm Andrew

I'm a Computer Science PhD specializing in audio processing, multimodal retrieval, and machine learning systems.
These days, I enjoy building AI-powered Telegram bots, working on quantitative trading tools, and exploring computer vision + LLM agent workflows.

I also love photography (Sony A6700), gaming, and flying my drone.
This GitHub is a mix of research code, practical ML pipelines, and personal side-projects

📫 Contact

GitHub: https://github.com/dhecloud
Email: andrewkjj@gmail.com
📄 Check out my CV / Resume

🧠 Research Background

Ph.D. Computer Science – Nanyang Technological University (2020–2024)
Thesis: Audio Captioning and Retrieval with Improved Cross-Modal Objectives

Published research in:

Automated Audio Captioning
Language-based Audio Retrieval
Acoustic Event Detection
Word Sense Disambiguation (BERT)

See publications below 👇

🚀 Personal Projects

🗣️ SakuraSensei — A Japanese Conversational AI Tutor

A context-aware Telegram bot built with LangChain, RAG pipelines, and multilingual embeddings.
Features include:

JLPT grammar/vocab scraping + JMDICT + Tatoeba + JaQuAD
Automated Japanese news explanations via multi-agent workflows
Cloze-question generation from YouTube transcripts
Memory persistence + metadata filtering improvements
👉 Bot: https://t.me/SakuraSenseiNoBot

😆 FaceChangerGIFBot — Real-Time Face Swap Bot

A production-grade Telegram bot that swaps user faces into GIFs, short videos, and “Featured Clips”.
Highlights:

ONNX-based inference for fast face swapping
Usage tracking, quotas, and Stripe premium tier
Watermarking + file size/duration validation
Migrated webhooks from Ngrok → Cloudflare Tunnel
👉 Bot: https://t.me/FaceChangerGIFBot

📚 Publications

Language-based Audio Retrieval with Converging Tied Layers and Contrastive Loss (APSIPA 2022)
Automated Audio Captioning with Epochal Difficult Captions for Curriculum Learning (APSIPA 2022)
Audio Captioning with Reconstruction Latent Space Regularization (ICASSP 2022)
Sound Event Detection with Weakified Strong Labels & Frequency Dynamic Convolution (arXiv 2023)
Adapting BERT for Word Sense Disambiguation with Gloss Selection (EMNLP Findings)

🛠️ Skills

Machine Learning: PyTorch, TensorFlow, ONNX, transformers, LangChain
Domains: Audio Processing, Computer Vision, NLP, Cross-Modal Retrieval
Tools: Docker, Cloudflare Tunnel, Whisper, VAD, BeautifulSoup
Languages: Python, English (Native), Chinese (Conversational), Japanese (Conversational)

Thanks for visiting!
Feel free to explore my projects or reach out if you'd like to collaborate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly