I recently graduated from Stanford University, currently do Data Science at Harvard Medical School, and created the hot sauce that made Jennifer Lawrence cry.
An entrepreneur at heart, I spun my love for flavor (and professional background in food science) into a hot sauce company while in college. (You may have seen my sauces on Hot Ones or the shelves of Erewhon.)
In the classroom, I primarily studied ancient languages and linguistics, which introduced me to NLP, AI, and Data Science (what I do now at Harvard Medical School).
I'm particularly passionate about linguistics, learning, entrepreneurship, design, product, AI, and information.
Data Science, Computer Science, Psychology, Survey Design, Marketing, Entrepreneurship, Miscellaneous Languages.
DATASCI 112: Principles of Data Science. Taught Machine Learning/AI, data processing, visualization, & analysis using Python.
CLASSICS 1g, 2g, & 3g: First-year Ancient Greek. Taught grammar, syntax, morphology, and historical linguistics of Ancient Greek.
Cantonese is considered to be among the hardest languages for English speakers to learn due to its linguistic dissimilarities and incredibly complex writing system.
As a passionate learner of Cantonese with the goal of conversational fluency, I wanted to focus all my efforts on the spoken language, rather than memorizing characters.
Since resources for learning Cantonese are sparse and no product existed that matched my learning style, I chose to code it up myself.
Towards the end of high school, I worked as a food scientist specializing in extracts and olfaction. I always had a particular interest in peppers because of their split personality (being simultaneously delicious and deadly).
After experimenting with peppers and craft beer, I launched a line of hops-infused hot sauces that ended up on the shelves of Erewhon and in the lineups for Season 21 & 27 of Hot Ones; Steph Curry, Kai Cenat, Jennifer Lawrence, Gal Gadot, Owen Wilson and many more have all had my sauce!
Many speech-to-text models that claim to support Cantonese often output Standard Written Chinese (effectively Mandarin), fundamentally failing at their core task. Finetuned models exist, but tend to be less semantically accurate.
By ensembling multiple ASR, CantoScribe achieves the lexical accuracy of fine-tuned Cantonese models with the semantic accuracy of flagship models.
The result? An ASR pipeline that outperforms default Whisper and Scribe in verbatim Cantonese transcription.