publications | Andreas Opedal

2025

ICLR
MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs

Andreas Opedal*, Haruki Shirakami*, Bernhard Schölkopf, Abulhair Saparov, and Mrinmaya Sachan

In The Thirteenth International Conference on Learning Representations, 2025

Abs Bib PDF

Large language models (LLMs) can solve arithmetic word problems with high accuracy, but little is known about how well they generalize to more complex problems. This is difficult to study, as (i) much of the available evaluation data has already been seen by the most capable models during training, and (ii) existing benchmarks do not capture how problem proofs may be arbitrarily complex in various ways. In this paper, we present a data-generation framework for evaluating LLMs on problems with arbitrarily complex arithmetic proofs, called MathGAP. MathGAP generates problem statements and chain-of-thought reasoning traces according to specifications about their arithmetic proof structure, enabling systematic studies on easy-to-hard generalization with respect to complexity of proof trees. Using MathGAP, we find that LLMs show a significant decrease in performance as proofs get deeper and wider. This effect is more pronounced in complex, nonlinear proof structures, which are challenging even for the most capable models. The models are also sensitive to simple changes in sentence ordering. However, they remain capable of solving some complex problems, suggesting that reasoning generalization is noisy.
@inproceedings{opedal2025mathgap, title = {Math{GAP}: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs}, author = {Opedal*, Andreas and Shirakami*, Haruki and Schölkopf, Bernhard and Saparov, Abulhair and Sachan, Mrinmaya}, year = {2025}, booktitle = {The Thirteenth International Conference on Learning Representations}, eprint = {2410.13502}, archiveprefix = {arXiv}, primaryclass = {cs.LG}, url = {https://arxiv.org/abs/2410.13502}, }

2024

EMNLP
On the Role of Context in Reading Time Prediction

Andreas Opedal, Eleanor Chodroff, Ryan Cotterell, and Ethan Wilcox

In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Nov 2024

Abs Bib PDF

We present a new perspective on how readers integrate context during real-time language comprehension. Our proposals build on surprisal theory, which posits that the processing effort of a linguistic unit (e.g., a word) is an affine function of its in-context information content. We first observe that surprisal is only one out of many potential ways that a contextual predictor can be derived from a language model. Another one is the pointwise mutual information (PMI) between a unit and its context, which turns out to yield the same predictive power as surprisal when controlling for unigram frequency. Moreover, both PMI and surprisal are correlated with frequency. This means that neither PMI nor surprisal contains information about context alone. In response to this, we propose a technique where we project surprisal onto the orthogonal complement of frequency, yielding a new contextual predictor that is uncorrelated with frequency. Our experiments show that the proportion of variance in reading times explained by context is a lot smaller when context is represented by the orthogonalized predictor. From an interpretability standpoint, this indicates that previous studies may have overstated the role that context has in predicting reading times.
@inproceedings{opedal2024role, title = {On the Role of Context in Reading Time Prediction}, author = {Opedal, Andreas and Chodroff, Eleanor and Cotterell, Ryan and Wilcox, Ethan}, booktitle = {Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing}, month = nov, year = {2024}, publisher = {Association for Computational Linguistics}, address = {Miami}, eprint = {2409.08160}, archiveprefix = {arXiv}, primaryclass = {cs.CL}, url = {https://arxiv.org/abs/2409.08160}, }
EMNLP
Generalized Measures of Anticipation and Responsivity in Online Language Processing

Mario Giulianelli, Andreas Opedal, and Ryan Cotterell

In Findings of the Association for Computational Linguistics: EMNLP 2024, Nov 2024

Abs Bib PDF

We introduce a generalization of classic information-theoretic measures of predictive uncertainty in online language processing, based on the simulation of expected continuations of incremental linguistic contexts. Our framework provides a formal definition of anticipatory and responsive measures, and it equips experimenters with the tools to define new, more expressive measures beyond standard next-symbol entropy and surprisal. While extracting these standard quantities from language models is convenient, we demonstrate that using Monte Carlo simulation to estimate alternative responsive and anticipatory measures pays off empirically: New special cases of our generalized formula exhibit enhanced predictive power compared to surprisal for human cloze completion probability as well as ELAN, LAN, and N400 amplitudes, and greater complementarity with surprisal in predicting reading times.
@inproceedings{giulianelli2024generalized, title = {Generalized Measures of Anticipation and Responsivity in Online Language Processing}, author = {Giulianelli, Mario and Opedal, Andreas and Cotterell, Ryan}, booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2024}, month = nov, year = {2024}, publisher = {Association for Computational Linguistics}, address = {Miami}, eprint = {2409.10728}, archiveprefix = {arXiv}, primaryclass = {cs.CL}, url = {https://arxiv.org/abs/2409.10728}, }
ICML
Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners?

Andreas Opedal*, Alessandro Stolfo*, Haruki Shirakami, Ying Jiao, Ryan Cotterell, Bernhard Schölkopf, Abulhair Saparov, and Mrinmaya Sachan

In Forty-first International Conference on Machine Learning, Jul 2024

Abs Bib PDF

There is increasing interest in employing large language models (LLMs) as cognitive models. For such purposes, it is central to understand which properties of human cognition are well-modeled by LLMs, and which are not. In this work, we study the biases of LLMs in relation to those known in children when solving arithmetic word problems. Surveying the learning science literature, we posit that the problem-solving process can be split into three distinct steps: text comprehension, solution planning and solution execution. We construct tests for each one in order to understand whether current LLMs display the same cognitive biases as children in these steps. We generate a novel set of word problems for each of these tests, using a neuro-symbolic approach that enables fine-grained control over the problem features. We find evidence that LLMs, with and without instruction-tuning, exhibit human-like biases in both the text-comprehension and the solution-planning steps of the solving process, but not in the final step, in which the arithmetic expressions are executed to obtain the answer.
@inproceedings{opedal2024language, title = {Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners?}, author = {Opedal*, Andreas and Stolfo*, Alessandro and Shirakami, Haruki and Jiao, Ying and Cotterell, Ryan and Schölkopf, Bernhard and Saparov, Abulhair and Sachan, Mrinmaya}, booktitle = {Forty-first International Conference on Machine Learning}, month = jul, year = {2024}, url = {https://arxiv.org/abs/2401.18070}, }

2023

EMNLP
An Exploration of Left-Corner Transformations

Andreas Opedal*, Eleftheria Tsipidi*, Tiago Pimentel, Ryan Cotterell, and Tim Vieira

In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Dec 2023

Abs Bib PDF

The left-corner transformation (Rosenkrantz and Lewis, 1970) is used to remove left recursion from context-free grammars, which is an important step towards making the grammar parsable top-down with simple techniques. This paper generalizes prior left-corner transformations to support semiring-weighted production rules and to provide finer-grained control over which left corners may be moved. Our generalized left-corner transformation (GLCT) arose from unifying the left-corner transformation and speculation transformation (Eisner and Blatz, 2007), originally for logic programming. Our new transformation and speculation define equivalent weighted languages. Yet, their derivation trees are structurally different in an important way: GLCT replaces left recursion with right recursion, and speculation does not. We also provide several technical results regarding the formal relationships between the outputs of GLCT, speculation, and the original grammar. Lastly, we empirically investigate the efficiency of GLCT for left-recursion elimination from grammars of nine languages. Code: https://github.com/rycolab/left-corner
@inproceedings{opedal2023lct, title = {An Exploration of Left-Corner Transformations}, author = {Opedal*, Andreas and Tsipidi*, Eleftheria and Pimentel, Tiago and Cotterell, Ryan and Vieira, Tim}, booktitle = {Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing}, month = dec, year = {2023}, address = {Singapore}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2023.emnlp-main.827}, pages = {13393--13427}, }
ACL
Efficient Semiring-Weighted Earley Parsing

Andreas Opedal, Ran Zmigrod, Tim Vieira, Ryan Cotterell, and Jason Eisner

In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul 2023

Abs Bib PDF

We present Earley’s (1970) context-free parsing algorithm as a deduction system, incorporating various known and new speed-ups. In particular, our presentation supports a known worst-case runtime improvement from Earley’s (1970) O(N3|G||R|), which is unworkable for the large grammars that arise in natural language processing, to O(N3|G|), which matches the complexity of CKY on a binarized version of the grammar G. Here N is the length of the sentence, |R| is the number of productions in G, and |G| is the total length of those productions. We also provide a version that achieves runtime of O(N3|M|) with |M| ≤ |G| when the grammar is represented compactly as a single finite-state automaton M (this is partly novel). We carefully treat the generalization to semiring-weighted deduction, preprocessing the grammar like Stolcke (1995) to eliminate the possibility of deduction cycles, and further generalize Stolcke’s method to compute the weights of sentence prefixes. We also provide implementation details for efficient execution, ensuring that on a preprocessed grammar, the semiring-weighted versions of our methods have the same asymptotic runtime and space requirements as the unweighted methods, including sub-cubic runtime on some grammars.
@inproceedings{opedal-etal-2023-efficient, title = {Efficient Semiring-Weighted {E}arley Parsing}, author = {Opedal, Andreas and Zmigrod, Ran and Vieira, Tim and Cotterell, Ryan and Eisner, Jason}, booktitle = {Proceedings of the Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, month = jul, year = {2023}, url = {https://aclanthology.org/2023.acl-long.204}, doi = {10.18653/v1/2023.acl-long.204}, }
ACL
World Models for Math Story Problems

Andreas Opedal, Niklas Stoehr, Abulhair Saparov, and Mrinmaya Sachan

In Findings of the Association for Computational Linguistics: ACL 2023, Jul 2023

Abs Bib PDF

Solving math story problems is a complex task for students and NLP models alike, requiring them to understand the world as described in the story and reason over it to compute an answer. Recent years have seen impressive performance on automatically solving these problems with large pre-trained language models and innovative techniques to prompt them. However, it remains unclear if these models possess accurate representations of mathematical concepts. This leads to lack of interpretability and trustworthiness which impedes their usefulness in various applications. In this paper, we consolidate previous work on categorizing and representing math story problems and develop MathWorld, which is a graph-based semantic formalism specific for the domain of math story problems. With MathWorld, we can assign world models to math story problems which represent the situations and actions introduced in the text and their mathematical relationships. We combine math story problems from several existing datasets and annotate a corpus of 1,019 problems and 3,204 logical forms with MathWorld. Using this data, we demonstrate the following use cases of MathWorld: (1) prompting language models with synthetically generated question-answer pairs to probe their reasoning and world modeling abilities, and (2) generating new problems by using the world models as a design space.
@inproceedings{opedal-etal-2023-world, title = {World Models for Math Story Problems}, author = {Opedal, Andreas and Stoehr, Niklas and Saparov, Abulhair and Sachan, Mrinmaya}, booktitle = {Findings of the Association for Computational Linguistics: ACL 2023}, month = jul, year = {2023}, address = {Toronto, Canada}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2023.findings-acl.579}, doi = {10.18653/v1/2023.findings-acl.579}, pages = {9088--9115}, }
EACL
On the Intersection of Context-Free and Regular Languages

Clemente Pasti, Andreas Opedal, Tiago Pimentel, Tim Vieira, Jason Eisner, and Ryan Cotterell

In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics, May 2023

Abs Bib PDF

The Bar-Hillel construction is a classic result in formal language theory. It shows, by a simple construction, that the intersection of a context-free language and a regular language is itself context-free. In the construction, the regular language is specified by a finite-state automaton. However, neither the original construction (Bar-Hillel et al., 1961) nor its weighted extension (Nederhof and Satta, 2003) can handle finite-state automata with ε-arcs. While it is possible to remove ε-arcs from a finite-state automaton efficiently without modifying the language, such an operation modifies the automaton’s set of paths. We give a construction that generalizes the Bar- Hillel in the case the desired automaton has ε-arcs, and further prove that our generalized construction leads to a grammar that encodes the structure of both the input automaton and grammar while retaining the asymptotic size of the original construction.
@inproceedings{pasti-etal-2023-intersection, title = {On the Intersection of Context-Free and Regular Languages}, author = {Pasti, Clemente and Opedal, Andreas and Pimentel, Tiago and Vieira, Tim and Eisner, Jason and Cotterell, Ryan}, booktitle = {Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics}, month = may, year = {2023}, url = {https://aclanthology.org/2023.eacl-main.52}, doi = {10.18653/v1/2023.eacl-main.52}, }

2022

ACL
Slangvolution: A Causal Analysis of Semantic Change and Frequency Dynamics in Slang

Daphna Keidar*, Andreas Opedal*, Zhijing Jin, and Mrinmaya Sachan

In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), May 2022

Abs Bib PDF

Languages are continuously undergoing changes, and the mechanisms that underlie these changes are still a matter of debate. In this work, we approach language evolution through the lens of causality in order to model not only how various distributional factors associate with language change, but how they causally affect it. In particular, we study slang, which is an informal language that is typically restricted to a specific group or social setting. We analyze the semantic change and frequency shift of slang words and compare them to those of standard, nonslang words. With causal discovery and causal inference techniques, we measure the effect that word type (slang/nonslang) has on both semantic change and frequency shift, as well as its relationship to frequency, polysemy and part of speech. Our analysis provides some new insights in the study of language change, e.g., we show that slang words undergo less semantic change but tend to have larger frequency shifts over time.
@inproceedings{keidar-etal-2022-slangvolution, title = {Slangvolution: {A} Causal Analysis of Semantic Change and Frequency Dynamics in Slang}, author = {Keidar*, Daphna and Opedal*, Andreas and Jin, Zhijing and Sachan, Mrinmaya}, editor = {Muresan, Smaranda and Nakov, Preslav and Villavicencio, Aline}, booktitle = {Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, month = may, year = {2022}, address = {Dublin, Ireland}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2022.acl-long.101}, doi = {10.18653/v1/2022.acl-long.101}, pages = {1422--1442}, }