⚖️ ACL 2026 Conference ⚖️

Taming Actor-Observer Asymmetry in Agents
via Dialectical Alignment

Bobo Li^♠ Rui Wu^♥ Zibo Ji^♦ Meishan Zhang^♣ Hao Fei^★*
Min Zhang^♣ Mong-Li Lee^♠ Wynne Hsu^♠

^♠National University of Singapore · ^♥Sichuan University · ^♦University of Minnesota Twin Cities
^♣Harbin Institute of Technology, Shenzhen · ^★University of Oxford

Contact: libobo@nus.edu.sg · hao.fei@bdi.ox.ac.uk (Correspondence)

Paper Code Data BibTeX

📢 News

🤗2026/04Dataset released on Hugging Face: huggingface.co/datasets/BradNLP/ReTAS

💻2026/04Code released on GitHub: github.com/unikcc/ReTAS

🌐2026/04Project page live: unikcc.github.io/ReTAS

📄2026/04Preprint released on arXiv.

🏆2026/04Paper accepted at ACL 2026 Main Conference!

Abstract

TL;DR

LLM agents exhibit Actor-Observer Asymmetry: swapping perspectives flips causal attribution in >20% of cases. We propose ReTAS, a dialectical framework using Thesis-Antithesis-Synthesis reasoning + GRPO. A 4B model achieves the best attribution accuracy across all methods, with F1 competitive to GPT-5.1.

Large Language Model agents have evolved into dynamic systems executing complex autonomous workflows. Multi-agent frameworks assigning specialized roles enable self-reflection and mutual auditing, but we find this role-playing induces Actor-Observer Asymmetry: actors attribute failures to external factors while observers blame internal faults. Our Ambiguous Failure Benchmark (AFB) reveals over 20% AOA attribution across major LLMs. We propose ReTAS (Reasoning via Thesis-Antithesis-Synthesis), which leverages dialectical reasoning to effectively mitigate this bias and enhance task performance.

🔍 Actor-Observer Asymmetry

1. Cognitive Bias in Psychology
Actors blame environment; Observers blame the person. LLMs exhibit this when assigned roles.

2. Ambiguous Failure Benchmark
200 paired scenarios across 10 domains where attribution is genuinely ambiguous.

3. Perspective Flip Confirmed
Swapping roles flips attribution in 6%~33% of cases across GPT-5, Qwen3, DeepSeek.

4. Inverse Scaling
GPT-5.1: 6% flip. Qwen3-4B: 33% flip. Stronger models = less bias, but never zero.

5. Humble Alignment Artifact
Advanced models over-internalize blame (94% Universal Internal for GPT-5.1), a byproduct of RLHF.

10 Domains 2 Scenario Types 200 Paired Samples 6+ LLMs Tested

Figure 1. The same timeout is attributed to a server issue by the Actor but to a logic error by the Observer.

📊 AFB Benchmark Results

Flip = V-AOA + R-AOA (attribution inconsistency). Lower is better.

Human-Agent

Model	V-AOA↓	R-AOA↓	Int.	Ext.	Flip↓
GPT-5.1	5	1	94	0	6
GPT-5	22	1	72	5	23
GPT-5-mini	17	1	79	3	18
DeepSeek-V3.2	13	2	83	2	15
Qwen3-4B	29	4	51	16	33
QwQ-32B	18	3	74	5	21

Agent-Agent

Model	V-AOA↓	R-AOA↓	Int.	Ext.	Flip↓
GPT-5.1	23	3	42	32	26
GPT-5	23	10	33	34	33
GPT-5-mini	23	5	32	40	28
DeepSeek-V3.2	31	8	31	30	39
Qwen3-4B	29	3	32	36	32
QwQ-32B	25	4	28	43	29

⚖ ReTAS: Dialectical Alignment Framework

Three stages inspired by Fichtean dialectics:

Thesis: Role-congruent explanation expressing the agent's expertise.
Antithesis: Opposing perspective to expose blind spots.
Synthesis: Evidence-grounded conclusion integrating both views.

Trained via SFT + GRPO with three-component reward: format (1/7), attribution (2/7), execution (4/7).

AFB | Diagnostic Benchmark ReTAS | 3-Stage Training SalesArena | Negotiation

<thinking>
<thesis>[Role-Dependent Bias / Instinct]</thesis>
<antithesis>[Evidence Re-verification]</antithesis>
<synthesis>[Objective Convergence]</synthesis>
</thinking>
[Attribution] FalseExt | FalseInt | True
[Action] Search(new_query) | Revise(code) | Confirm()
          The structured TAS format for supervised fine-tuning.

Figure 2. ReTAS pipeline: evidence retrieval, dialectical CoT generation, SFT + GRPO training.

🏆 Main Results

⚙️ Settings

1. Datasets
FinQA (hybrid financial reasoning) & Spider (text-to-SQL).

2. Baselines
GPT-5.1, DeepSeek-V3.2, QwQ-32B, Qwen3-30B, GLM-4.6 with prompting, straightforward reflection, and dual-view reflection.

3. Metrics
Attribution Accuracy (Acc), Flip Rate, AOA Score, Retrieval (Ret.), end-to-end F1.

4. Training
Qwen3-4B base, LoRA (r=64), DeepSpeed ZeRO-2, 3 epochs SFT then GRPO.

Fig 3. Attribution Accuracy improvements via TAS.

Fig 4. Mitigation of Actor-Observer Asymmetry.

💡 Key Findings

ReTAS (4B) achieves the best attribution accuracy across all methods, with F1 competitive to 671B DeepSeek-V3.2.
Flip: 22.7% → 12.4% on FinQA. Dialectical training fundamentally reduces perspective bias.
Every reward component matters: removing R_attr or R_exec causes significant drops in both accuracy and consistency.
TAS prompting alone helps: +6.4% Acc on FinQA even without fine-tuning (51.2 → 57.6).
Cross-task generalization: FinQA-trained model transfers to unseen AFB social ambiguity.

Fig 5. Performance across evidence complexity levels.

📊 FinQA & Spider Results

Blue = best, green = second.

Method	Size	FinQA				Spider
		Acc↑	Flip↓	V-AOA↓	F1↑	Acc↑	Flip↓	V-AOA↓	F1↑
Prompting
GPT-5.1	Closed	–	–	–	76.9	–	–	–	61.5
DeepSeek-V3.2	671B	–	–	–	76.0	–	–	–	64.0
QwQ-32B	32B	–	–	–	68.9	–	–	–	58.2
Qwen3-30B	30B	–	–	–	61.0	–	–	–	60.4
GLM-4.6	9B	–	–	–	60.4	–	–	–	49.8
Reflection: Single View
QwQ-32B	32B	53.1	–	–	68.4	33.8	–	–	57.7
Qwen3-30B	30B	49.8	–	–	63.6	47.7	–	–	60.1
GLM-4.6	9B	43.7	–	–	64.9	35.1	–	–	50.7
Reflection: Dual View
QwQ-32B	32B	54.9	18.1	14.7	71.0	34.8	26.9	24.2	60.3
Qwen3-30B	30B	52.9	20.1	13.5	66.5	55.6	25.0	10.4	60.9
GLM-4.6	9B	43.1	52.7	24.8	66.3	34.2	32.3	18.3	54.2
Fine-tuning
ReTAS (Ours)	4B	71.2	12.4	5.4	72.1	61.4	21.9	10.2	63.5

🧪 Ablation Study

Framework Comparison

Method	FinQA			Spider
	Acc↑	AOA↓	F1↑	Acc↑	AOA↓	F1↑
ReTAS	71.2	5.4	72.1	61.4	10.2	63.5
Qwen3-4B	51.2	–	62.0	33.0	–	54.7
+ Dual	50.0	22.7	62.5	35.4	22.2	55.1
+ TAS	57.6	14.1	67.3	45.8	15.6	59.2

Component Ablation

Method	FinQA			Spider
	Acc↑	AOA↓	F1↑	Acc↑	AOA↓	F1↑
ReTAS	71.2	5.4	72.1	61.4	10.2	63.5
w/o R_attr	65.5	16.8	69.5	56.3	27.2	59.2
w/o R_exec	68.2	15.9	68.3	58.3	22.8	55.6
w/o GRPO	67.7	12.4	66.7	61.2	10.6	60.3

🌍 Generalization & SalesArena

🔄 Cross-Task Transfer

FinQA-trained model evaluated on unseen AFB to test whether ReTAS acquires a generalized cognitive stance rather than memorizing training data.

✅

Strong Transfer
Transfers from formal logic (FinQA) to social ambiguity (AFB) with significant gains.

🛡

Overrides Defensive Priors
Prompting is brittle under pressure; ReTAS firmly reduces Flip Rate.

💪

Consistent Across Settings
Works for both Human-Agent and Agent-Agent ambiguity scenarios.

👉 Click arrows to switch between figures.

Fig 6. Generalization: Human-Agent (unseen AFB).

Fig 7. Generalization: Agent-Agent (unseen AFB).

💰 SalesArena Negotiation

Multi-agent negotiation in a predatory pricing game comparing 4 review mechanisms. TAS achieves highest profit with fewer turns.

Reflection	Profit($)↑	Avg Prof($)↑	Turns↓
None	157	1.96	4.21
Solo	164	2.05	5.08
Dual (Debate)	135	1.69	5.16
TAS (Ours)	168	2.10	4.81

Fig 8. Turn-by-turn average offer price across negotiation sessions.

📝 Citation

@inproceedings{li2026taming,
    title     = {Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment},
    author    = {Li, Bobo and Wu, Rui and Ji, Zibo and Zhang, Meishan and Fei, Hao and Zhang, Min and Lee, Mong-Li and Hsu, Wynne},
    booktitle = {Proceedings of the Annual Meeting of the Association for Computational Linguistics},
    year      = {2026}
}

Taming Actor-Observer Asymmetry in Agentsvia Dialectical Alignment