Shantou AI – AI Agent for Job Application
A Chrome extension was developed to automate job applications by auto-filling forms with high accuracy and integrating an AI agent to generate personalized responses aligned with job descriptions.
A Chrome extension was developed to automate job applications by auto-filling forms with high accuracy and integrating an AI agent to generate personalized responses aligned with job descriptions.
The Dragon dialogue navigation system was ported to ROS2 Python on the Stretch robot, integrating perception, planning, and dialogue modules to enable seamless voice-driven navigation and fetch commands.
MythoVerse AI is an AI-powered storytelling platform that transforms text, images, and ideas into dynamic, high-quality videos with voiceovers, subtitles, and sound effects. Our text-to-video generator leverages advanced semantic understanding to produce engaging, visually compelling content effortlessly. With real-time AI-generated voiceovers and customizable scene editing, MythoVerse empowers creators to bring their stories, animations, and cinematic visions to life with minimal effort. Whether for anime, action sequences, cyberpunk aesthetics, or personalized narratives, MythoVerse AI redefines content creation by merging visual storytelling and AI-driven narration into a seamless, creative experience.
1Shan AI is building an AI-powered platform that helps users design and retrieve custom clothing based on their aesthetic preferences. By leveraging image ranking, enhancement, and retrieval, our model personalizes the design process, making custom apparel more accessible and tailored to individual styles. We fine-tune a text-to-image generation model with reinforcement learning to ensure high-quality visuals that align with user preferences. This technology streamlines the custom fashion design process, making it easier to create unique outfits and explore new styles—whether for personal use or fashion e-commerce.
Effective object detection and tracking are critical for a wide range of applications, from industrial automation to environmental conservation. This project focuses on advancing these capabilities by developing algorithms that accurately track and count objects in video footage, as well as enhancing the performance of detection models. By leveraging state-of-the-art methods and optimizing model training, this work aims to improve object detection accuracy and efficiency in real-world scenarios.
Large Language Models are very powerful to debug and generate codes. However, users still need to have some programming knowledge and integrate these codes into the codebase to achieve certain purposes. To resolve this problem, my teammate and I tried to investigate the capability of LLM to change the codebase directly based on users' demands.
Nowadays, the outputs generated by computer recording tools, such as keylogger, are voluminous and complicated. Authorized users often spend subtantial time and effort in extracting useful and comprehensive information from these output logs. To address this challenge, large language models, known for their robust text analysis capabilities, offer a promising solution to improve both quality and efficiency of data analysis. This project seeks to investigate the potential and efficacy of LLMs in optimizing the analysis of keylogger results, aiming to improve the overall effectiveness of interpreting and leveraging recorded data.
UR3 Robot Arms have become increasingly used in various industries due to their versatility, ease of programming, and collaborative nature. Today, UR3 Robot Arms are widely used to assist human work, support laboratory automation, validate product designs, and ensure quality control in manufacturing processes. In this project, we explore one of the use cases of UR3 Robot Arms in assisting human work.
Published in arXiv preprint, 2024
This paper introduces HEIGHT, a novel navigation policy network that models distinct interactions among humans, robots, and obstacles to enhance robot navigation in dense and constrained environments.
Published in arXiv preprint, 2025
This paper presents HomE, a homogeneous ensemble framework that leverages clustering and LLM-driven sampling to partition gesture classes, enabling expert learners to improve accuracy and robustness in dynamic hand gesture recognition.
This paper presents HomE, a homogeneous ensemble framework that leverages clustering and LLM-driven sampling to partition gesture classes, enabling expert learners to improve accuracy and robustness in dynamic hand gesture recognition.
We study the problem of robot navigation in dense and interactive crowds with environmental constraints such as corridors and furniture. Previous methods fail to consider all types of interactions among agents and obstacles, leading to unsafe and inefficient robot paths. In this article, we leverage a graph-based representation of crowded and constrained scenarios and propose a structured framework to learn robot navigation policies with deep reinforcement learning. We first split the representations of different components in the environment, and propose a heterogeneous spatio-temporal graph to model distinct interactions among humans, robots, and obstacles. Based on the heterogeneous st-graph, we propose HEIGHT, a novel navigation policy network architecture with different components to capture heterogeneous interactions among entities through space and time. HEIGHT utilizes attention mechanisms to prioritize important interactions and a recurrent network to track changes in the dynamic scene over time, encouraging the robot to avoid collisions adaptively. Through extensive simulation and real-world experiments, we demonstrate that HEIGHT outperforms state-of-the-art baselines in terms of success and efficiency in challenging navigation scenarios. Furthermore, we demonstrate that our pipeline achieves better zero-shot generalization capability than previous works when the densities of humans and obstacles change.
The demand for advanced video generation techniques has surged with the increasing application of artificial intelligence in multimedia and human-computer interaction systems. In this context, video generation models that can produce high-quality outputs with specific class or textual content are gaining prominence. In this research, we propose novel structured models for generating in-car hand gesture videos based on specific classes and explore the capability of generating new and diverse gesture videos, contributing to the growing field of automated video generation.
SmartNICs have become increasingly used to support intensive data center operations in the cloud computing industry. Today, SmartNICs are used to perform high-data-rate as well as computationally heavy tasks such as load balancing, DNS filtering, and implementing firewalls for intrusion detection. However, SmartNICs have not been explored to support use cases of resiliency that are often seen in safety-critical industries such as the Power Grid and Industrial Automation. This research aims to evaluate the feasibility of using SmartNICs to support safety-critical applications, such as fault-tolerant routing and cryptographical operations.
Published:
This is a description of your talk, which is a markdown files that can be all markdown-ified like any other post. Yay markdown!
Published:
This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post.
Workshop, University 1, Department, 2015
This is a description of a teaching experience. You can use markdown like any other post.