Transmogrifier

Register-aware prompt translation. Normalize linguistic register to maximize LLM output quality.

GitHub · Releases · Install · API

The Problem

LLM output quality varies dramatically by the linguistic register of your input. Same question, different phrasing, different accuracy:

ModelSpreadBestWorstStatus
Claude Opus 418.8ppdirect (93.8%)casual (75.0%)sensitive
Gemini 2.5 Flash56.2ppdirect (56.2%)narrative (0%)critical
Claude Haiku 4.56.2ppdirect (93.8%)casual (87.5%)mild
GPT-4o Mini0pp(invariant)invariant

Casual register costs Claude Opus ~19% of correct answers. Gemini loses over half. Transmogrifier detects the input register and normalizes it before it hits the model.

Three Levels

Level 1

System prompt injection. Prepends a normalization instruction.

0ms · no API calls · 67-100% recovery

Level 2

Rule-based rewriting. Strips filler, hedging, and framing via regex.

<1ms · no API calls · preserves semantics

Level 3

LLM translation. Separate-context rewrite for heavy-lift cases.

~200ms · 1 API call · only when spread >10pp

Install

pip install git+https://github.com/jmcentire/transmogrifier.git

Optional extras:

pip install "transmogrifier[anthropic]"    # Claude backend (Level 3)
pip install "transmogrifier[openai]"       # OpenAI backend
pip install "transmogrifier[gemini]"       # Gemini backend
pip install "transmogrifier[validation]"   # Semantic equivalence checking
pip install "transmogrifier[all]"          # Everything

API

Python Library

from transmogrifier.core import Transmogrifier

t = Transmogrifier()
result = t.translate("yo what's the deal with TCP", model="claude-opus-4")

result.output_text        # "TCP" (rewritten)
result.system_prompt      # Level 1 normalization instruction
result.detected_register  # Register.casual
result.target_register    # Register.direct
result.elapsed_ms         # ~2ms
result.skipped            # False

Invariant models are skipped automatically:

result = t.translate("yo what's TCP", model="gpt-4o-mini")
result.skipped      # True
result.skip_reason  # "invariant model (0.0pp spread)"

CLI

# Detect register
transmogrify detect "yo what's the deal with TCP"
# {"register": "casual", "confidence": 1.0}

# Classify task type
transmogrify classify "Write a Python function to sort a list"
# {"task_type": "code", "confidence": 1.0}

# Translate (task-aware routing)
transmogrify translate "yo what's the difference between TCP and UDP" --model gemini-2-5-flash
# Detected: casual | Task: analysis | Target: technical (per-task optimal)

# List profiles with per-task breakdown
transmogrify profile list

# Show detailed profile
transmogrify profile show claude-opus-4

# Calibrate a new model (50 tasks x 5 registers = 250 API calls)
transmogrify profile calibrate my-model --provider anthropic

MCP Server

claude mcp add --scope user --transport stdio transmogrifier -- transmog-mcp

Tools: transmog_translate, transmog_detect, transmog_profiles

Key Findings

Register sensitivity is model-specific

GPT-4o Mini shows zero sensitivity. Claude Opus shows 18.8pp. Gemini shows 56.2pp. A one-size-fits-all approach wastes effort on invariant models and under-serves sensitive ones. Transmogrifier uses per-model profiles to adapt.

Register x task interaction is non-uniform (v0.2.0)

Academic register halves reasoning accuracy on Claude Opus (40% vs 80% direct). Meanwhile, Gemini scores 0% on analysis with direct register but 33% with technical. The optimal register depends on both model and task type. Transmogrifier v0.2.0 includes a task classifier (factual, reasoning, code, analysis, creative, instruction) and routes to the per-(model, task) optimal register.

ModelCategoryBest RegisterSpread
Claude Opusreasoningdirect (80%)40pp (academic=40%)
Claude Opusfactual/code/analysis(all invariant)0pp
Gemini Flashfactualdirect/technical (100%)100pp (narrative=0%)
Gemini Flashanalysistechnical/academic (33%)33pp (direct=0%)
Gemini Flashoveralltechnical (62.5%)62.5pp

System prompt normalization (Level 1) exceeds direct register on Gemini

Casual input + Level 1 system prompt scored 68.8% on Gemini 2.5 Flash, beating raw direct register (56.2%) by 12.5pp. The normalization instruction provides a genuine quality boost beyond just recovering the register gap.

Same-context translation is catastrophic

Asking the model to "reinterpret this casually-worded question, then answer it" in a single turn scored 0% on Gemini. Translation must happen in a separate context. This is architecturally enforced in Transmogrifier.

Larger models are more sensitive, not less

Claude Opus (18.8pp) is more register-sensitive than Claude Haiku (6.2pp). RLHF doesn't uniformly solve this — the effect is architecture and training dependent.

Registers

RegisterExampleDetection Markers
directWhat is TCP?Short, unframed, no markers
casualyo so like, what's the deal with TCPSlang, filler, contractions
technicalProvide a precise technical answer: What is TCP?Imperative verbs, structured
academicIn the context of established knowledge, TCP...Hedging, passive voice, latinate
narrativeExplain this as if telling a story: How does TCP work?Storytelling framing, analogies

Architecture

Register DetectorProfile LookupTranslation RouterLevel 1 + Level 2 (+ optional Level 3) → Result

Invariants:

Integration

Designed as middleware for:


Transmogrifier · MIT License · Jeremy McEntire · Source