Seth Z. Zhao

I am a first-year Computer Science PhD student at UCLA, advised by Professor Bolei Zhou and Professor Jiaqi Ma.

Previously, I received my M.S. and B.A. in Computer Science at UC Berkeley where I was fortunate to work with Professor Masayoshi Tomizuka, Professor Allen Yang, and Professor Constance Chang-Hasnain.

Google Scholar  /  LinkedIn  /  Twitter

profile photo
Research

My research interest focuses on Embodied AI, full-stack autonomous driving, and domain-specific Foundation Model. Currently my research focus is on 3D outdoor scene understanding.

Robust Digital-Twin Localization via An RGBD-based Transformer Network and A Comprehensive Evaluation on a Mobile Dataset
Zixun Huang*, Keling Yao*, Seth Z. Zhao*, Chuanyu Pan*, Tianjian Xu, Weiyu Feng, Allen Y. Yang
arXiv 2023
project page  /  arXiv

A continuous work of our previous work Digital Twin Tracking Dataset (DTTD). In this work, we extend Digital-Twin Tracking into mobile AR scenarios by creating a comprehensive pose estimation dataset captured by iPhone sensor and a depth-robust pose estimator against noisy depth input.

Pre-training on Synthetic Driving Data for Trajectory Prediction
Yiheng Li*, Seth Z. Zhao*, Chenfeng Xu, Chen Tang, Chenran Li, Mingyu Ding, Masayoshi Tomizuka, Wei Zhan
arXiv 2023
arXiv

In this work, we introduce a novel map augmentation and trajectory data generation process to effortlessly address data scarcity issues for trajectory forecasting. We comprehensively explore the different pretraining strategies and extend the concept of Masked AutoEncoder (MAE), utilizing it for trajectory forecasting. Extensive experiments demonstrate the effectiveness of our data expansion and pretraining strategies, which outperform the baseline by large margins.

Enhancing GAN-Based Vocoders with Contrastive Learning Under Data-limited Condition
Haoming Guo, Seth Z. Zhao, Jiachen Lian, Gopala Anumanchipalli, Gerald Friedland
ICASSP 2024 Workshop on Self-supervision in Audio, Speech and Beyond
arXiv

In this work, we apply contrastive learning methods in training the vocoder to improve the perceptual quality of the vocoder without modifying its architecture or adding more data. Experimental result shows that the tasks improve vocoder's fidelity substantially in data-limited settings.

Digital Twin Tracking Dataset (DTTD): A New RGB+Depth 3D Dataset for Longer-Range Object Tracking Applications
Weiyu Feng*, Seth Z. Zhao*, Chuanyu Pan*, Adam Chang, Yichen Chen, Zekun Wang, Allen Y. Yang
CVPR 2023 Workshop on Vision Datasets Understanding (Oral Presentation)
paper  /  project page  /  arXiv  /  YouTube

In this work, we create a novel RGB-D dataset to enable further research of the 3D Object Tracking problem and extend potential solutions towards longer ranges and mm localization accuracy. Through experiments, we demonstrate that our dataset can help researchers develop future object tracking methods and analyze new challenges.

Multimodal Semantic Mismatch Detection in Social Media Posts
Kehan Wang, Seth Z. Zhao, David Chan, Avideh Zakhor, John Canny
MMSP 2022
paper  /  project page

In this work, we focus on the threat scenario where video, audio, and their text description are semantically mismatched to mislead the audience. We develop self-supervised methods to detect semantic mismatch across multiple modalities, namely video, audio and text, which outperform the state-of-the-art detectors by large margins.

Academic Services

Reviewer for ICRA, T-IV.



Website Template