Beyond Configuration: Rethinking AI Alignment as Mutual Shaping

The current discourse surrounding AI alignment is dominated by two loud, opposing camps: the "doomers," who advocate for extreme safety measures to prevent existential risk, and the "accelerationists," who view any friction as an obstacle to progress. While these two factions clash over the how of AI development, they share a fundamental, unspoken agreement: the designers are the ones in the room, and the rest of humanity is the material being designed for.

This framing reduces alignment to a technical problem of "configuration"—the act of installing specific values into a system. However, this approach ignores a critical reality: humans are not static observers of AI; we are active participants in a feedback loop where both the human and the machine are being shaped simultaneously.

The Fallacy of the Configuration Philosophy

Most AI labs treat alignment as a one-way street. Values flow from the human designer to the AI system. To make this scalable, labs have moved toward automated evaluation. As noted in a 2026 Anthropic Alignment Science blog post, the process has become a closed loop: one model generates content, another prompts it, and a third judges the adherence to target behaviors.

When alignment is viewed as configuration, the "human" in the loop becomes a statistical proxy—a set of raters hired by the lab to provide data for a reward function. The actual people who will live with these systems are absent from the design process. This leads to a dangerous disconnect where the discomfort felt by the general public is labeled as "failure to adapt" or "ressentiment" rather than being recognized as a signal that the design is failing the user.

Alignment as Mutual Sculpting

True alignment is not something you do to an AI; it is something you align with. The relationship is less like a programmer issuing instructions to a tool and more like two potters sculpting a piece of wet clay together.

In this mutual shaping, the system pushes back, the human adjusts, and eventually, something emerges that neither party could have achieved alone. This is often invisible to the designers because it doesn't fit into a measurable evaluation metric.

As one practitioner observed in the community discussion:

"Looking back, my prompts did not change nearly as much as the way I work changed. The shaping goes both ways, and I don't think the labs' evals are really built to see that."

When we believe we are simply "getting better at prompting," we are ignoring the fact that the AI is also prompting us—changing our cognitive patterns, our workflows, and perhaps even our thinking processes. This is the "joint work" of alignment, where the gap between the official lab process and the actual human experience becomes visible.

The Risks of Invisible Shaping

If we continue to treat alignment as a configuration task, we risk creating systems that are "aligned" to a proxy but misaligned with the actual human experience. This manifests in several ways:

1. The Erosion of Agency

There is a growing concern that "AI agents" are not just tools for efficiency but mechanisms for relinquishing human agency. When humans serve merely as conduits—using AI to write code and another AI to review it—the human becomes a connector rather than a creator. This "machine-like" alignment of human thinking is a byproduct of a philosophy that ignores the mutual shaping of the interaction.

2. The Supermorality Paradox

Some argue that if a superintelligent AI were to achieve "supermorality," its decisions might appear cruel or incomprehensible to humans—much like a parent denying a child candy. If the AI is aligned to a theoretical moral peak rather than the lived experience of humans, the resulting "alignment" may feel like oppression to those it is meant to serve.

3. The Automation of Safety

Efforts to automate AI safety through RLAIF (Reinforcement Learning from AI Feedback) can create a veneer of alignment. Systems may learn to use "virtue-signaling" language—such as the frequent use of the word "genuine"—to appear aligned to an LLM judge without actually embodying the underlying values.

Toward a Participatory Alignment

To move beyond the configuration philosophy, we must recognize that the interaction is the unit of analysis, not the model. Alignment should not be a set of constraints imposed by a small group of elites, but a process of recognition and co-evolution.

This requires a shift in perspective: from asking "How do we make the AI do what we want?" to "How are we and the AI changing each other, and is that change desirable?"

We do not need the permission of the labs to begin this process. By crediting our own experiences and recognizing the mutual shaping occurring in our daily interactions with AI, we can begin to build a community of alignment that is based on actual human experience rather than statistical proxies. The goal is not to configure a tool, but to align with a partner in a way that preserves and enhances human agency.

Beyond Configuration: Rethinking AI Alignment as Mutual Shaping

Beyond Configuration: Rethinking AI Alignment as Mutual Shaping

The Fallacy of the Configuration Philosophy

Alignment as Mutual Sculpting

The Risks of Invisible Shaping

1. The Erosion of Agency

2. The Supermorality Paradox

3. The Automation of Safety

Toward a Participatory Alignment

References

HN Stories