Previously, I'm a Java Senior Software Engineer with over 7 years+ of experience in designing and building large-scale, high-concurrency distributed systems using Java microservices.
Currently, I'm expanding my expertise into AI Infrastructure and LLM Applications, applying my deep engineering background to build robust and efficient systems that power intelligent solutions. I'm passionate about creating value at the intersection of solid software architecture and cutting-edge AI.
Based in Singapore right now
My skills cover the full spectrum from foundational backend architecture to modern AI/ML infrastructure.
| Core Java & Distributed Systems | AI/ML & Big Data |
|---|---|
| Java, Spring Boot, Spring Cloud, mybatis, Mybatis-Plus, mybatis | Python, PyTorch, vLLM, DeepSpeed, Triton, aiter, ROCm, ray, Transformers |
| Microservices, SaaS, PaaS, DDD | LLM Fine-tuning (LoRA), RAG |
| Docker, Kubernetes, DevOps, gRPC, OpenFeign, SkyWalking | Vector DB (FAISS, ElasticSearch) |
| rocketMQ, pulsar, Kafka, AWS/sqs, Zookeeper, Alibaba nacos/dubbo/Canal/other, WebSocket, Netty | Delta Lake, Apache Flink, Apache Hudi, Apache Iceberg, kettle, Flink-CDC, Debezium, Prometheus+Grafana+Alertmanager |
| MySQL, MongoDB, Neo4j, PostgreSQL, ElasticSearch, Redis, MinIO, OSS, SolrCloud, Hbase | ELK, Flume, Clickhouse, Drios |
| System Design & Scalable Architecture | MLOps & Inference Optimization |
Here are some projects that highlight my capabilities across both domains.
-
RAG System for Domain-Specific Q&A
- Engineered a Retrieval-Augmented Generation (RAG) pipeline using
Mistral-7B,chatGPT-o4-mini,ElasticSearchfor a specialized knowledge domain. - Optimized the system for real-time interaction through efficient data processing and a high-throughput inference server deployment.
- Engineered a Retrieval-Augmented Generation (RAG) pipeline using
-
LLM Inference Acceleration & Fine-tuning
- Customized open-source LLMs (
Mistral,Qwen,Llama) using LoRA fine-tuning techniques on specific datasets. - Accelerated model inference significantly using vLLM and FlashAttention, deploying them as scalable API endpoints on cloud platforms like GCP and Azure.
- Customized open-source LLMs (
-
pytorch & vLLM & DeepSpeed & Transformers & ray Contribution (Ongoing)
-
Group-Level Multi-functional Payment Platform
- Architected and developed a highly available, enterprise-grade payment center using Domain-Driven Design (DDD) and a robust microservices architecture.
- The system reliably handles millions of transactions in a specific period time and smoothly process tens of thousands of transactions or more every day, ensuring data consistency and security across various payment channels.
-
High-Concurrency Instant Messaging System
- Built a distributed IM system from the ground up to support millions of concurrent users.
- Leveraged a powerful tech stack including
Spring Boot,WebSocket,Kafkafor message queuing, andZookeeperfor service coordination, achieving high throughput and low latency.
-
Enterprise Search & Real-Time Data Platform
- Designed a high-performance search engine using
ElasticsearchandFlink-CDCcapable of indexing and searching billions of records with sub-second latency. - Built the underlying real-time data synchronization pipeline, providing a unified data backbone for multiple business units.
- Designed a high-performance search engine using
-
Enterprise Flink computing Platform
- Enterprise-level Flink Cluster: Built a unified Flink-based computing center for real-time data lake, stats center, and ETL pipelines.
-
Microservice technology system Architecture
- The project comprises over 28 microservices, including those for users, orders, payments, logistics, merchants, products, instant messaging, basic services, search, gateways, SMS messaging, file services, a computing platform, and a data synchronization center etc.
- A "precise and rapid response" approach was proposed for the new project, and the implementation of the project scaffolding was arranged, encompassing both Spring Boot + Spring Cloud microservice projects and DDD (Domain-Driven Design) project models.
- A binary package format and specification for inter-microservice calls were defined, and the company's private server underlying infrastructure package was encapsulated.
- The business processing message middleware technologies selected were Kafka, relational database MySQL, non-relational databases Redis and Elasticsearch, and the service registry Nacos.
- Core components such as Prometheus + Grafana, Node Exporter, and Alertmanager were introduced to provide visualized, multi-faceted monitoring and alerting of data center and cloud server resources. Anomaly alerts were pushed to DingTalk and related SMS messages.
- Flink-CDC was introduced, MySQL binlog was configured, and Flink and related connector packages were added to the data synchronization center. The development team provides relevant code. The CDC (change data capture) acquires real-time data changes from monitoring business database tables, performs data processing across different dimensions such as grouping and JOIN, and synchronizes the data to different business data destination sinks or writes it to Kafka.
- Order and product data are written to Kafka via the upsert-kafka connector. The core processing mechanism in the search service listens to relevant Kafka topics and consumes the data, writing it to Elasticsearch. The search service provides API interfaces for business services to call and query.
- Chat and payment transaction data are processed using the SQL API and DataStream API syntax in the CDC, synchronized to ClickHouse, and used for analysis and logging of user behavior/actions.
- Real-time data visualization dashboards are provided and displayed through the computing platform service, based on Flink stream computing, and involve statistics such as order amount, sales volume, and merchant brand ranking.
etc........
I believe in continuous learning and sharing knowledge. I write about my journey in software architecture, distributed systems, and AI infrastructure on my Medium blog.

