Research

Conference Proceedings

Do Large Language Models (LLMs) Understand Chronology?

[View Abstract] Large language models have shown great potential as forecasting tools in finance and economics, but backtesting performance is subject to look-ahead bias if the period overlaps with an LLM’s training window. Prompt-based attempts to avoid look-ahead bias require that LLMs understand chronology. We test LLMs’ ability to understand and enforce chronological order in three types of tasks: sorting randomly shuffled historical events; conditional sorting of events defined by some conditions; and anachronism detection based on intersections of multiple timelines. Our experiments use events that we first confirm are known to the LLM; this ensures that we test chronological understanding on an LLM’s pretrained internal knowledge. Across three LLM families— GPT-4.1 (standard), GPT-5 (hybrid-reasoning), and Claude 3.7 Sonnet (large-reasoning, with and without Extended Thinking), we find that performance degrades rapidly with problem complexity but improves greatly for reasoning models with test-time extended reasoning. These patterns are important for the real-time application of LLMs in finance.

Working Paper & Preprints

Decoding Human-AI Augmentation: Measuring the Value of LLM Assistance on Professional Tasks Using Simulation

[View Abstract] Static LLM leaderboards measure end-to-end task execution, yet many real world deployments rely on centaur workflows where one model provides structured assistance to a human or another model. We investigate whether a model’s direct task-solving ability (”automation”) correlates with its capacity to guide a weaker worker (”augmentation”), proposing an evaluation framework that treats assistance as a distinct capability. Across diverse professional tasks grounded in O\*NET and the Anthropic Economic Index (e.g., operations research, counseling, and creative writing), we compare direct model performance against assistant-to-worker pipelines using a fixed worker (GPT-3.5 Turbo) and varying scaffold models. LLM-based rubric grading reveals that scaffolding reliably improves output structure and usefulness. However, model rankings diverge sharply depending on the task: for open-ended professional tasks, models that are weak standalone solvers often serve as highly effective ”teachers.” Expert human validation confirms the directional benefits of scaffolding but highlights calibration gaps in LLM-as-a-judge scoring, emphasizing the need for robust evaluation protocols. Ultimately, our findings demonstrate that supervisory skill is distinct from direct execution capability and represents an economically meaningful dimension of LLM performance missed by conventional benchmarks.
  • Working Paper, February 2026

Contributions as an RA

Calyber: A Ridesharing Game.

INFORMS Transactions on Education.

[View Abstract] This case introduces Calyber, a simulation-based game designed to provide a hands-on and engaging experience in developing real-time pricing and matching decisions for shared ride services, where multiple riders are pooled into a single vehicle. Students design and implement dynamic pricing and matching policies using a rich historical ridesharing data set, competing for top performance on a holdout test set. Through this case, students gain practical insight into stochastic dynamic decision making within a modern, relevant, and data-driven context. Results from previous class implementations provide strong evidence of enhanced learning and engagement.
  • 🏆 Runner-up, 2025 INFORMS Case Competition [case]

On-Off Systems with Strategic Customers

[View Abstract] Motivated by applications such as urban traffic control and make-to-order systems, we study a fluid model of a single-server, on-off system that can accommodate multiple queues. The server visits each queue in order: when a queue is served, it is "on", and when the server is serving another queue or transitioning between queues, it is "off". Customers arrive over time, observe the state of the system, and decide whether to join. We consider two regimes for the formation of the on and off durations. In the exogenous setting, each queue's on and off durations are predetermined. We explicitly characterize the equilibrium outcome in closed form and give a compact linear program to compute the optimal on-off durations that maximizes total reward collected from serving customers. In the endogenous setting, the durations depend on customers' joining decisions under an exhaustive service policy where the server never leaves a non-empty queue. We show that an optimal policy in this case extends service beyond the first clearance for at most one queue. Using this property, we introduce a closed-form procedure that computes an optimal policy in no more than 2n steps for a system with n queues.
  • Under revision.

Other Research & Awards

Data-Driven Evaluation of Board of Directors Effectiveness: Unsupervised Learning and Predictive Modeling of “Skills Matrices”

  • 🏆 Best Data Visualization Award at the 2025 Berkeley CDSS Data Discovery Symposium

Mixed-Integer Linear Program for Options Pricing and Portfolio Optimization

  • 🏆 1st Runner-Up at the 2025 Wells Fargo & Berkeley IEOR Bay Area Decision Sciences Summit
  • Also presented at the 2025 Berkeley IEOR Community Celebration & Alumni Achievement Ceremony

Works in Progress

Flex or Fast? Incentive-Compatible Demand Allocation in Destination-Mode Ride-Hailing Networks

Please refer to my CV for more detailed and complete research assistantships & publications.

Professional Experience

Please refer to my resume for more recent industry experiences.