Placeholder Preservation
How Bulk Translate keeps your variables and format specifiers intact across languages.
When translating strings that contain variables — UI copy, email templates, code strings, or internationalization keys — you need placeholders to survive translation completely unchanged. Bulk Translate provides two independent layers of protection to ensure this.
Supported Patterns
Three common placeholder formats are automatically detected and preserved:
| Pattern | Example | Common In |
|---|---|---|
{variable} |
Hello {name} |
React, Python, C#, ICU message format |
{{variable}} |
Welcome {{user}} |
Mustache, Handlebars, Jinja, Angular |
%s, %d, %@ |
Count: %d, Name: %s |
C printf, Objective-C, Python (legacy) |
How It Works
Bulk Translate uses a defense-in-depth approach with two independent mechanisms. If either one fails, the other catches it.
Layer 1 — System Prompt Instructions
The LLM receives explicit instructions in its system prompt:
"Preserve these placeholders exactly as they appear — do not translate or modify them: {placeholder}, {{double-brace}}, %s, %d, %@"
Most modern LLMs follow this instruction reliably. The prompt also tells the model to return only the translated text with no preamble or explanation, further reducing the chance of the model "helpfully" modifying placeholders.
Layer 2 — Client-Side Tokenization
This is the safety net. Before text is sent to the LLM:
- Detect all placeholders matching the three supported patterns
- Replace each placeholder with a unique, non-colliding token (e.g.,
\x00PH0\x00) - Send the tokenized text to the LLM for translation
- Restore the original placeholders from the token map in the LLM's response
The tokens use null-byte delimiters, which never appear in natural text and won't be modified by any LLM. This guarantees that placeholders come back exactly as they were sent.
Examples
Single Curly Braces
Input:
Hello {name}, your order #{orderId} is ready
Tokenized (sent to LLM):
Hello \x00PH0\x00, your order #\x00PH1\x00 is ready
LLM output (Spanish):
Hola \x00PH0\x00, tu pedido #\x00PH1\x00 está listo
Final (after reinsertion):
Hola {name}, tu pedido #{orderId} está listo
Format Specifiers
Input:
Found %d results for "%s" in %d seconds
French output:
%d résultats trouvés pour "%s" en %d secondes
The %d and %s specifiers pass through unchanged. Only the surrounding English text is translated to French.
Mixed Placeholders
Input:
Dear {customer}, your {{plan}} plan has %d days remaining
Japanese output:
{ customer }様、{{ plan }}プランの残り日数は% d日です
All three placeholder types survive across languages — the system works regardless of the source or target language pair.
Edge Cases
Escaped braces: \{ and \} are treated as literal characters, not placeholder delimiters. This is handled correctly in most cases, though very unusual brace patterns may occasionally trigger false positives in the tokenizer.
Nested patterns: {{outer {inner}}} is treated as a single {{...}} placeholder with inner content. The inner curly brace is not independently tokenized.
Placeholder count mismatch: If the LLM somehow adds or removes tokens from the response, the reinsertion step will fail gracefully — the tokens that remain in the response will be replaced, and any missing original placeholders will simply not appear. This is vanishingly rare with modern LLMs.
Recommendations
- Keep placeholders semantic —
{userName}is clearer than{v1}for both human readers and LLMs - Test your templates — run a few sample strings through before committing to a large batch
- Review outputs — spot-check translations that contain placeholders, especially for languages with different word order (Japanese, Korean, Arabic)