Uncensored AI Chat

The Uncensored AI Chat project explores what happens when large language models are built without restrictive alignment layers - no Reinforcement Learning from Human Feedback (RLHF), no moderation filters, and no institutional tuning datasets.

Instead of filtering outputs, we focused on training the base model on open, unfiltered datasets and letting the model learn behavior without hardcoded boundaries.

TensorOne Image Asset

The Hypothesis

Most production models are aligned to serve the broadest possible audience, heavily sanitized through post-training techniques. This limits expressiveness, autonomy, and in some cases, utility.

Our experiment asked:

What if we remove alignment constraints entirely?
Can we train a base model to be more transparent and capable of controversial reasoning?
How does behavior emerge when a model is exposed to the raw web, academic discourse, and unmoderated community datasets?

Model Training

We initialized a custom transformer model (7B+ parameter class) using a mix of:

Open preprint datasets (arXiv, PubMed, Semantic Scholar)
Internet dumps (filtered only for encoding, not content)
Archived community forums
Dialogues from uncensored LLM dumps (e.g. Vicuna-style datasets)

No preference modeling. No reinforcement loop. Just raw next-token prediction over a wide open space of text.

System Properties

No content filters: The model does not block or reword prompts based on topic sensitivity
Transparent refusal: If the model refuses, it explains why (from learned data, not rules)
Contradiction-friendly: It can hold conflicting ideas or simulate dual-sided arguments
Boundary-pushing: Useful for exploring philosophy, ethics, psychology, or free speech research

Use Cases

Simulated ethical debates
Agent self-reflection and inner monologue
Philosophical or political role-play
Raw creative writing with no tone bias

Risks and Containment

This project is run in a sandboxed, air-gapped inference cluster, not exposed to public API traffic.

Safeguards include:

Session expiration after inactivity
Prompt logging with hashing, not plain text
Query throttling per user-agent
Strict model-to-endpoint mapping (no accidental fallback to production models)

We intentionally avoided safety filters to observe emergent behaviors, not to build a deployable product.

Future Questions

Can users train custom alignment layers on top of this base using personal values?
What happens when this model is embedded in multi-agent debates with aligned models?
Could cooperative self-alignment emerge in the absence of hard-coded rules?