⚖️ ACL 2026 Conference ⚖️

Taming Actor-Observer Asymmetry in Agents
via Dialectical Alignment

Bobo LiRui WuZibo JiMeishan ZhangHao Fei★*
Min ZhangMong-Li LeeWynne Hsu
National University of Singapore  ·  Sichuan University  ·  University of Minnesota Twin Cities
Harbin Institute of Technology, Shenzhen  ·  University of Oxford
Contact: libobo@nus.edu.sg  ·  hao.fei@bdi.ox.ac.uk (Correspondence)

📢 News

🏆2026/04Paper accepted at ACL 2026 Main Conference!
🚀2026/04Code & data released: github.com/unikcc/ReTAS
🌐2026/04Project page live with camera-ready results.

Abstract

TL;DR

LLM agents exhibit Actor-Observer Asymmetry: swapping perspectives flips causal attribution in >20% of cases. We propose ReTAS, a dialectical framework using Thesis-Antithesis-Synthesis reasoning + GRPO. A 4B model achieves the best attribution accuracy across all methods, with F1 competitive to GPT-5.1.

Large Language Model agents have evolved into dynamic systems executing complex autonomous workflows. Multi-agent frameworks assigning specialized roles enable self-reflection and mutual auditing, but we find this role-playing induces Actor-Observer Asymmetry: actors attribute failures to external factors while observers blame internal faults. Our Ambiguous Failure Benchmark (AFB) reveals over 20% AOA attribution across major LLMs. We propose ReTAS (Reasoning via Thesis-Antithesis-Synthesis), which leverages dialectical reasoning to effectively mitigate this bias and enhance task performance.

🔍 Actor-Observer Asymmetry

1. Cognitive Bias in Psychology
Actors blame environment; Observers blame the person. LLMs exhibit this when assigned roles.
2. Ambiguous Failure Benchmark
200 paired scenarios across 10 domains where attribution is genuinely ambiguous.
3. Perspective Flip Confirmed
Swapping roles flips attribution in 6%~33% of cases across GPT-5, Qwen3, DeepSeek.
4. Inverse Scaling
GPT-5.1: 6% flip. Qwen3-4B: 33% flip. Stronger models = less bias, but never zero.
5. Humble Alignment Artifact
Advanced models over-internalize blame (94% Universal Internal for GPT-5.1), a byproduct of RLHF.
10 Domains 2 Scenario Types 200 Paired Samples 6+ LLMs Tested

Figure 1. The same timeout is attributed to a server issue by the Actor but to a logic error by the Observer.

📊 AFB Benchmark Results

Flip = V-AOA + R-AOA (attribution inconsistency). Lower is better.

Human-Agent

ModelV-AOA↓R-AOA↓Int.Ext.Flip↓
GPT-5.1519406
GPT-522172523
GPT-5-mini17179318
DeepSeek-V3.213283215
Qwen3-4B294511633
QwQ-32B18374521

Agent-Agent

ModelV-AOA↓R-AOA↓Int.Ext.Flip↓
GPT-5.1233423226
GPT-52310333433
GPT-5-mini235324028
DeepSeek-V3.2318313039
Qwen3-4B293323632
QwQ-32B254284329

⚖ ReTAS: Dialectical Alignment Framework

Three stages inspired by Fichtean dialectics:

  • Thesis: Role-congruent explanation expressing the agent's expertise.
  • Antithesis: Opposing perspective to expose blind spots.
  • Synthesis: Evidence-grounded conclusion integrating both views.

Trained via SFT + GRPO with three-component reward: format (1/7), attribution (2/7), execution (4/7).

AFB | Diagnostic Benchmark ReTAS | 3-Stage Training SalesArena | Negotiation
<thinking>
  <thesis> [Role-Dependent Bias / Instinct] </thesis>
  <antithesis> [Evidence Re-verification] </antithesis>
  <synthesis> [Objective Convergence] </synthesis>
</thinking>
[Attribution] FalseExt | FalseInt | True
[Action] Search(new_query) | Revise(code) | Confirm()

The structured TAS format for supervised fine-tuning.

Figure 2. ReTAS pipeline: evidence retrieval, dialectical CoT generation, SFT + GRPO training.

🏆 Main Results

⚙️ Settings

1. Datasets
FinQA (hybrid financial reasoning) & Spider (text-to-SQL).
2. Baselines
GPT-5.1, DeepSeek-V3.2, QwQ-32B, Qwen3-30B, GLM-4.6 with prompting, straightforward reflection, and dual-view reflection.
3. Metrics
Attribution Accuracy (Acc), Flip Rate, AOA Score, Retrieval (Ret.), end-to-end F1.
4. Training
Qwen3-4B base, LoRA (r=64), DeepSpeed ZeRO-2, 3 epochs SFT then GRPO.

💡 Key Findings

  1. ReTAS (4B) achieves the best attribution accuracy across all methods, with F1 competitive to 671B DeepSeek-V3.2.
  2. Flip: 22.7% → 12.4% on FinQA. Dialectical training fundamentally reduces perspective bias.
  3. Every reward component matters: removing Rattr or Rexec causes significant drops in both accuracy and consistency.
  4. TAS prompting alone helps: +6.4% Acc on FinQA even without fine-tuning (51.2 → 57.6).
  5. Cross-task generalization: FinQA-trained model transfers to unseen AFB social ambiguity.

Fig 5. Performance across evidence complexity levels.

📊 FinQA & Spider Results

Blue = best, green = second.

MethodSizeFinQASpider
Acc↑Flip↓V-AOA↓F1↑Acc↑Flip↓V-AOA↓F1↑
Prompting
GPT-5.1Closed76.961.5
DeepSeek-V3.2671B76.064.0
QwQ-32B32B68.958.2
Qwen3-30B30B61.060.4
GLM-4.69B60.449.8
Reflection: Single View
QwQ-32B32B53.168.433.857.7
Qwen3-30B30B49.863.647.760.1
GLM-4.69B43.764.935.150.7
Reflection: Dual View
QwQ-32B32B54.918.114.771.034.826.924.260.3
Qwen3-30B30B52.920.113.566.555.625.010.460.9
GLM-4.69B43.152.724.866.334.232.318.354.2
Fine-tuning
ReTAS (Ours)4B71.212.45.472.161.421.910.263.5

🧪 Ablation Study

Framework Comparison

MethodFinQASpider
Acc↑AOA↓F1↑Acc↑AOA↓F1↑
ReTAS71.25.472.161.410.263.5
Qwen3-4B51.262.033.054.7
 + Dual50.022.762.535.422.255.1
 + TAS57.614.167.345.815.659.2

Component Ablation

MethodFinQASpider
Acc↑AOA↓F1↑Acc↑AOA↓F1↑
ReTAS71.25.472.161.410.263.5
 w/o Rattr65.516.869.556.327.259.2
 w/o Rexec68.215.968.358.322.855.6
 w/o GRPO67.712.466.761.210.660.3

🌍 Generalization & SalesArena

🔄 Cross-Task Transfer

FinQA-trained model evaluated on unseen AFB to test whether ReTAS acquires a generalized cognitive stance rather than memorizing training data.

Strong Transfer
Transfers from formal logic (FinQA) to social ambiguity (AFB) with significant gains.
🛡
Overrides Defensive Priors
Prompting is brittle under pressure; ReTAS firmly reduces Flip Rate.
💪
Consistent Across Settings
Works for both Human-Agent and Agent-Agent ambiguity scenarios.

👉 Click arrows to switch between figures.

💰 SalesArena Negotiation

Multi-agent negotiation in a predatory pricing game comparing 4 review mechanisms. TAS achieves highest profit with fewer turns.

ReflectionProfit($)↑Avg Prof($)↑Turns↓
None1571.964.21
Solo1642.055.08
Dual (Debate)1351.695.16
TAS (Ours)1682.104.81

Fig 8. Turn-by-turn average offer price across negotiation sessions.

📝 Citation

@inproceedings{li2026taming,
    title     = {Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment},
    author    = {Li, Bobo and Wu, Rui and Ji, Zibo and Zhang, Meishan and Fei, Hao and Zhang, Min and Lee, Mong-Li and Hsu, Wynne},
    booktitle = {Proceedings of the Annual Meeting of the Association for Computational Linguistics},
    year      = {2026}
}