Research

Conference Proceedings

Do Large Language Models (LLMs) Understand Chronology?

Wongchamcharoen, P. K., & Glasserman, P.

[View Abstract]

Large language models have shown great potential as forecasting tools in finance and economics, but backtesting performance is subject to look-ahead bias if the period overlaps with an LLM’s training window. Prompt-based attempts to avoid look-ahead bias require that LLMs understand chronology. We test LLMs’ ability to understand and enforce chronological order in three types of tasks: sorting randomly shuffled historical events; conditional sorting of events defined by some conditions; and anachronism detection based on intersections of multiple timelines. Our experiments use events that we first confirm are known to the LLM; this ensures that we test chronological understanding on an LLM’s pretrained internal knowledge. Across three LLM families— GPT-4.1 (standard), GPT-5 (hybrid-reasoning), and Claude 3.7 Sonnet (large-reasoning, with and without Extended Thinking), we find that performance degrades rapidly with problem complexity but improves greatly for reasoning models with test-time extended reasoning. These patterns are important for the real-time application of LLMs in finance.

Proceedings of the 2026 AAAI Conference on Artificial Intelligence (AAAI-26) [paper], [code]
🏆 Oral Presentation - Student Abstract & Poster Program (Top 11%) [poster], [talk]
Extended paper also accepted at AI4TS: AI for Time Series Analysis (AAAI-26 Workshop) [extended paper]
Invited presentation at Yale Undergraduate Research Conference (YURC 2026), IISE Annual Conference 2026, 2026 Berkeley IEOR Annual Advisory Board Meeting
Featured as foundational literature in OpenAI’s “Scaling Social Science Research” (2026) paper - GPT as a measurement tool.

Working Paper & Preprints

CentaurBench: Benchmarking LLM Augmentation on Occupational Tasks

Wongchamcharoen, P. K., Gulati, K., Fong, M. M., & Nagaraj, A.

[View Abstract]

The best player is not always the best coach. Most LLM benchmarks rank models on a single ability: how well a model completes a task on its own. But that is rarely how AI is used at work. Far more often, a model assists a worker by planning the approach, breaking the task into steps, or prompting self-review while another agent produces the final output. The question that drives model selection is therefore not only which model does the best work, but which model most improves the work of another agent. These are distinct capabilities, and existing benchmarks measure only the first. We introduce a framework that measures both. In automation, a model completes a professional task end-to-end. In augmentation, a model writes a process scaffold - a plan, checklist, and self-review - that is passed to a fixed lower-capacity worker model that produces the output. Holding the worker fixed separates the value of the assistant's guidance from the worker's own ability. Across seven economically grounded tasks spanning structured planning, analytical work, and human-facing services, every output is scored through blind pairwise comparisons by a panel of LLM judges, each barred from grading its own model family, with rankings replicated across three independent runs. Our results suggest that the two regimes rank models very differently. Automation rankings are tight and aligned with conventional capability orderings; the usual frontier models win. Augmentation rankings scatter, no model dominates as an assistant, and several of the strongest solvers are among the weakest coaches. On three of seven tasks, the unaided worker outperforms every assisted condition. For human-in-the-loop and multi-agent systems, “which model is best?” is the wrong question; “best for which role, on which task?” is the right one.

Accepted at the 2026 Wharton Generative AI & Business Conference. Preprint coming soon.

Contributions as an RA

Calyber: A Ridesharing Game.

Shen, Y., Yan, C. and Yan, J.

INFORMS Transactions on Education.

[View Abstract]

This case introduces Calyber, a simulation-based game designed to provide a hands-on and engaging experience in developing real-time pricing and matching decisions for shared ride services, where multiple riders are pooled into a single vehicle. Students design and implement dynamic pricing and matching policies using a rich historical ridesharing data set, competing for top performance on a holdout test set. Through this case, students gain practical insight into stochastic dynamic decision making within a modern, relevant, and data-driven context. Results from previous class implementations provide strong evidence of enhanced learning and engagement.

🏆 Runner-up, 2025 INFORMS Case Competition [case]

On-Off Systems with Strategic Customers

Sun, Y., Liu, Z., Yan, C.

[View Abstract]

Motivated by applications such as urban traffic control and make-to-order systems, we study a fluid model of a single-server, on-off system that can accommodate multiple queues. The server visits each queue in order: when a queue is served, it is "on", and when the server is serving another queue or transitioning between queues, it is "off". Customers arrive over time, observe the state of the system, and decide whether to join. We consider two regimes for the formation of the on and off durations. In the exogenous setting, each queue's on and off durations are predetermined. We explicitly characterize the equilibrium outcome in closed form and give a compact linear program to compute the optimal on-off durations that maximizes total reward collected from serving customers. In the endogenous setting, the durations depend on customers' joining decisions under an exhaustive service policy where the server never leaves a non-empty queue. We show that an optimal policy in this case extends service beyond the first clearance for at most one queue. Using this property, we introduce a closed-form procedure that computes an optimal policy in no more than 2n steps for a system with n queues.

Under revision.

Other Research & Awards

Data-Driven Evaluation of Board of Directors Effectiveness: Unsupervised Learning and Predictive Modeling of “Skills Matrices”

Wongchamcharoen, P. K., Stringer, C., Hwang B., Liu, I., & Li, K. [poster]

🏆 Best Data Visualization Award at the 2025 Berkeley CDSS Data Discovery Symposium

Mixed-Integer Linear Program for Options Pricing and Portfolio Optimization

Bin Abdulla, Q. M., Wongchamcharoen, P. K., Jamari, A., Lee J. [code], [poster]

🏆 1st Runner-Up at the 2025 Wells Fargo & Berkeley IEOR Bay Area Decision Sciences Summit
Also presented at the 2025 Berkeley IEOR Community Celebration & Alumni Achievement Ceremony

Works in Progress

Flex or Fast? Incentive-Compatible Demand Allocation in Destination-Mode Ride-Hailing Networks

Please refer to my CV for more detailed and complete research assistantships & publications.

Professional Experience

Please refer to my resume for more recent industry experiences.

Kenny Wongchamcharoen

Conference Proceedings

Working Paper & Preprints

Contributions as an RA

Other Research & Awards

Works in Progress

Professional Experience