I'm a Computer Science PhD specializing in audio processing, multimodal retrieval, and machine learning systems.
These days, I enjoy building AI-powered Telegram bots, working on quantitative trading tools, and exploring computer vision + LLM agent workflows.
I also love photography (Sony A6700), gaming, and flying my drone.
This GitHub is a mix of research code, practical ML pipelines, and personal side-projects
- GitHub: https://github.com/dhecloud
- Email: andrewkjj@gmail.com
- π Check out my CV / Resume
Ph.D. Computer Science β Nanyang Technological University (2020β2024)
Thesis: Audio Captioning and Retrieval with Improved Cross-Modal Objectives
Published research in:
- Automated Audio Captioning
- Language-based Audio Retrieval
- Acoustic Event Detection
- Word Sense Disambiguation (BERT)
See publications below π
A context-aware Telegram bot built with LangChain, RAG pipelines, and multilingual embeddings.
Features include:
- JLPT grammar/vocab scraping + JMDICT + Tatoeba + JaQuAD
- Automated Japanese news explanations via multi-agent workflows
- Cloze-question generation from YouTube transcripts
- Memory persistence + metadata filtering improvements
π Bot: https://t.me/SakuraSenseiNoBot
A production-grade Telegram bot that swaps user faces into GIFs, short videos, and βFeatured Clipsβ.
Highlights:
- ONNX-based inference for fast face swapping
- Usage tracking, quotas, and Stripe premium tier
- Watermarking + file size/duration validation
- Migrated webhooks from Ngrok β Cloudflare Tunnel
π Bot: https://t.me/FaceChangerGIFBot
- Language-based Audio Retrieval with Converging Tied Layers and Contrastive Loss (APSIPA 2022)
- Automated Audio Captioning with Epochal Difficult Captions for Curriculum Learning (APSIPA 2022)
- Audio Captioning with Reconstruction Latent Space Regularization (ICASSP 2022)
- Sound Event Detection with Weakified Strong Labels & Frequency Dynamic Convolution (arXiv 2023)
- Adapting BERT for Word Sense Disambiguation with Gloss Selection (EMNLP Findings)
Machine Learning: PyTorch, TensorFlow, ONNX, transformers, LangChain
Domains: Audio Processing, Computer Vision, NLP, Cross-Modal Retrieval
Tools: Docker, Cloudflare Tunnel, Whisper, VAD, BeautifulSoup
Languages: Python, English (Native), Chinese (Conversational), Japanese (Conversational)
Thanks for visiting!
Feel free to explore my projects or reach out if you'd like to collaborate.

