Research

Current AI alignment approaches capture what humans prefer but discard why they prefer it. Iron Sun Works develops methodology that preserves the reasoning — and uses it to build richer signal for model calibration than preference ranking alone can provide.

Methodology

Calibration Corpus

A structured methodology for capturing how human cognition diverges from model reasoning — preserving both sides' logic, not just the outcome. Where RLHF captures that a human preferred output A over B, the corpus captures why. That reasoning is where the information about human cognition lives. At population scale, individual corpora surface generalizable patterns in model miscalibration.

Analysis

Model-Side Process Visibility

Applying process trace analysis to model generation — branching probabilities, confidence patterns, and hesitation signatures the model can't self-report. Skilled interpreters reading these traces identify sycophancy, confabulation, and over-trained caution at the process level. Complementary to mechanistic interpretability — phenomenological rather than architectural.

Mechanism

Metacognitive Collective

A self-selecting recruitment mechanism for the humans the research requires — people who can articulate their own cognition and diagnose model reasoning failures. The system identifies its own future interpreters from people already using it, inverting the typical platform/user relationship from extraction to recognition.

Architecture

EF Support Platform

An AI-assisted executive function support system grounded in neuroscience. The platform infers cognitive state and provides scaffolding — task decomposition, context restoration, thread reconstruction — without requiring the user to self-report or configure. Cognitive modality as a first-class design constraint.