Posts by Collection

portfolio

Shantou AI – AI Agent for Job Application

A Chrome extension was developed to automate job applications by auto-filling forms with high accuracy and integrating an AI agent to generate personalized responses aligned with job descriptions.

Visit Shantou AI

Robot AI Engineer: Dragon Dialogue Navigation Port on Stretch Robot

The Dragon dialogue navigation system was ported to ROS2 Python on the Stretch robot, integrating perception, planning, and dialogue modules to enable seamless voice-driven navigation and fetch commands.

MythoVerse AI – AI-Powered Video Storytelling

MythoVerse AI is an AI-powered storytelling platform that transforms text, images, and ideas into dynamic, high-quality videos with voiceovers, subtitles, and sound effects. Our text-to-video generator leverages advanced semantic understanding to produce engaging, visually compelling content effortlessly. With real-time AI-generated voiceovers and customizable scene editing, MythoVerse empowers creators to bring their stories, animations, and cinematic visions to life with minimal effort. Whether for anime, action sequences, cyberpunk aesthetics, or personalized narratives, MythoVerse AI redefines content creation by merging visual storytelling and AI-driven narration into a seamless, creative experience.

Visit Mythoverse AI

1Shan AI – Aesthetic Intelligence for Visual Content

1Shan AI is building an AI-powered platform that helps users design and retrieve custom clothing based on their aesthetic preferences. By leveraging image ranking, enhancement, and retrieval, our model personalizes the design process, making custom apparel more accessible and tailored to individual styles. We fine-tune a text-to-image generation model with reinforcement learning to ensure high-quality visuals that align with user preferences. This technology streamlines the custom fashion design process, making it easier to create unique outfits and explore new styles—whether for personal use or fashion e-commerce.

Visit 1Shan AI

YOLO DeepSORT Waste Tracking and Counting

Effective object detection and tracking are critical for a wide range of applications, from industrial automation to environmental conservation. This project focuses on advancing these capabilities by developing algorithms that accurately track and count objects in video footage, as well as enhancing the performance of detection models. By leveraging state-of-the-art methods and optimizing model training, this work aims to improve object detection accuracy and efficiency in real-world scenarios.

Automated Codebase Modification with Large Language Model

Large Language Models are very powerful to debug and generate codes. However, users still need to have some programming knowledge and integrate these codes into the codebase to achieve certain purposes. To resolve this problem, my teammate and I tried to investigate the capability of LLM to change the codebase directly based on users' demands.

Keylogger with Large Language Model

Nowadays, the outputs generated by computer recording tools, such as keylogger, are voluminous and complicated. Authorized users often spend subtantial time and effort in extracting useful and comprehensive information from these output logs. To address this challenge, large language models, known for their robust text analysis capabilities, offer a promising solution to improve both quality and efficiency of data analysis. This project seeks to investigate the potential and efficacy of LLMs in optimizing the analysis of keylogger results, aiming to improve the overall effectiveness of interpreting and leveraging recorded data.

UR3 Robot Arms Tools Recognition and Handling in Collaborative Work Environments

UR3 Robot Arms have become increasingly used in various industries due to their versatility, ease of programming, and collaborative nature. Today, UR3 Robot Arms are widely used to assist human work, support laboratory automation, validate product designs, and ensure quality control in manufacturing processes. In this project, we explore one of the use cases of UR3 Robot Arms in assisting human work.

publications

HEIGHT: Heterogeneous Interaction Graph Transformer for Robot Navigation in Crowded and Constrained Environments

Published in arXiv preprint, 2024

This paper introduces HEIGHT, a novel navigation policy network that models distinct interactions among humans, robots, and obstacles to enhance robot navigation in dense and constrained environments.

[Paper]

HomE: A Homogeneous Ensemble Framework for Dynamic Hand Gesture Recognition

Published in arXiv preprint, 2025

This paper presents HomE, a homogeneous ensemble framework that leverages clustering and LLM-driven sampling to partition gesture classes, enabling expert learners to improve accuracy and robustness in dynamic hand gesture recognition.

[Paper]

research

HomE: A Homogeneous Ensemble Framework for Dynamic Hand Gesture Recognition

HEIGHT: Heterogeneous Interaction Graph Transformer for Robot Navigation in Crowded and Constrained Environments

We study the problem of robot navigation in dense and interactive crowds with environmental constraints such as corridors and furniture. Previous methods fail to consider all types of interactions among agents and obstacles, leading to unsafe and inefficient robot paths. In this article, we leverage a graph-based representation of crowded and constrained scenarios and propose a structured framework to learn robot navigation policies with deep reinforcement learning. We first split the representations of different components in the environment, and propose a heterogeneous spatio-temporal graph to model distinct interactions among humans, robots, and obstacles. Based on the heterogeneous st-graph, we propose HEIGHT, a novel navigation policy network architecture with different components to capture heterogeneous interactions among entities through space and time. HEIGHT utilizes attention mechanisms to prioritize important interactions and a recurrent network to track changes in the dynamic scene over time, encouraging the robot to avoid collisions adaptively. Through extensive simulation and real-world experiments, we demonstrate that HEIGHT outperforms state-of-the-art baselines in terms of success and efficiency in challenging navigation scenarios. Furthermore, we demonstrate that our pipeline achieves better zero-shot generalization capability than previous works when the densities of humans and obstacles change.

Generative Model for In-car Hand Gesture Video Generation

The demand for advanced video generation techniques has surged with the increasing application of artificial intelligence in multimedia and human-computer interaction systems. In this context, video generation models that can produce high-quality outputs with specific class or textual content are gaining prominence. In this research, we propose novel structured models for generating in-car hand gesture videos based on specific classes and explore the capability of generating new and diverse gesture videos, contributing to the growing field of automated video generation.

High-Performance Fault Tolerant Communication Protocols in Safety-Critical Industry

SmartNICs have become increasingly used to support intensive data center operations in the cloud computing industry. Today, SmartNICs are used to perform high-data-rate as well as computationally heavy tasks such as load balancing, DNS filtering, and implementing firewalls for intrusion detection. However, SmartNICs have not been explored to support use cases of resiliency that are often seen in safety-critical industries such as the Power Grid and Industrial Automation. This research aims to evaluate the feasibility of using SmartNICs to support safety-critical applications, such as fault-tolerant routing and cryptographical operations.

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.

HaoChen (Simon) Xia

Posts by Collection

portfolio

Shantou AI – AI Agent for Job Application

Robot AI Engineer: Dragon Dialogue Navigation Port on Stretch Robot

MythoVerse AI – AI-Powered Video Storytelling

1Shan AI – Aesthetic Intelligence for Visual Content

YOLO DeepSORT Waste Tracking and Counting

Automated Codebase Modification with Large Language Model

Keylogger with Large Language Model

UR3 Robot Arms Tools Recognition and Handling in Collaborative Work Environments

publications

HEIGHT: Heterogeneous Interaction Graph Transformer for Robot Navigation in Crowded and Constrained Environments

HomE: A Homogeneous Ensemble Framework for Dynamic Hand Gesture Recognition

research

HomE: A Homogeneous Ensemble Framework for Dynamic Hand Gesture Recognition

HEIGHT: Heterogeneous Interaction Graph Transformer for Robot Navigation in Crowded and Constrained Environments

Generative Model for In-car Hand Gesture Video Generation

High-Performance Fault Tolerant Communication Protocols in Safety-Critical Industry

talks

Talk 1 on Relevant Topic in Your Field

Tutorial 1 on Relevant Topic in Your Field

Talk 2 on Relevant Topic in Your Field

Conference Proceeding talk 3 on Relevant Topic in Your Field

teaching

Teaching experience 1

Teaching experience 2