PI: Asim Smailagic
Co-PI(s): Dan Siewiorek
University: Carnegie Mellon University

Throughout North America there are estimated to be 500,000 individuals who rely on American Sign Language (ASL) as their primary language. With fewer than 15,000 registered ASL interpreters in the United States, access to adequate language resources is a consistent struggle for these individuals. Automatic sign recognition and interpreting will alleviate this issue, but many of ASL's grammatical features, particularly those expressed via non-manual features (e.g., facial expressions, body postures) have not been well studied and modeled.

This research will build on the real-time body tracking system developed by Sign Track and will focus specifically on recognizing questions, hypothetical conditionals, assertions, and negations to appropriately model meaning in real-time continuous sign sequences. We apply handtracking techniques that are being developed for use with augmented reality (AR) and virtual reality (VR) applications and tailor them specifically for the set of meaningful handshapes used in ASL. The depth camera that we use also provides a wide enough field of view to extract the facial expressions and body postures necessary to fully communicate in ASL. The combination of depth cameras and generative model-based hand tracking also offer the following advantages: real-time performance, signer independent classification, and pathway for improved performance. The evaluation of specific system parameters against ASL-task specific outputs to provide guidelines for improved recognition rates for ASL tasks will be another contribution.