Operated the FinBench Arena, in collaboration with the Chatbot Arena team. Pairwise evaluation of model performance across domain- and vertical- specific tasks.
Proposed the AI Sports Analytics Framework (AI-SAF), an interpretability framework for multi-agent systems, investigating misaligned behaviors with metrics like assists and turnovers.
Led the creation of a 0-to-1 electronic health records system for the Sonar Bangla Foundation, powering the operations of 22 nonprofit kidney and dialysis centers in Bangladesh.
Created an LLM medical intake bot with GPT-3.5, Twilio, and Vocode that you could call, text, or email for a simulated patient registration experience.
Spent a semester as a data consultant for Tencent, doing sentiment analysis for livestreamers on Trovo, a Chinese alternative to Twitch.