The escape rules that actually bite
Most builds that fail because of strings.xml fail for the same handful of reasons,
and almost none of them are obvious from the AAPT error message. The compiler will tell you
something like Apostrophe not preceded by \ with a line number, but if your file
has been processed by a translation tool that re-encoded smart quotes, or if your string was
machine-pasted from a Google Sheet, the error points at one symptom of a deeper formatting
problem. Here's what actually causes the failures, in roughly the order I see them.
Apostrophes
The most common one. Don't tap here won't build. Android needs
Don\'t tap here, or alternatively the whole value wrapped in double quotes:
"Don't tap here". The quote-wrap form is sometimes preferable because it survives
a round-trip through a translator's CAT tool without being mangled, but it has a side effect:
it preserves leading and trailing whitespace, which the unquoted form trims. If you have a
string that's supposed to start with a space (rare but real, think prefixes like
" mins"), you want the quote wrap. Otherwise the backslash escape is fine.
Smart quotes are a separate trap. If your copy comes back from a copywriter as
Don't tap here with a curly apostrophe (U+2019), AAPT won't complain at all,
because U+2019 isn't the XML apostrophe. The build succeeds; the string just looks slightly
off on the device. That's fine for most contexts, but if you're then trying to find-and-
replace later you'll miss the curly variant. The tool above normalizes curly to straight on
input so you don't have to think about it.
The @ and ? prefix trap
Android treats a value starting with @ as a resource reference, so
@user mentioned you will be parsed as a reference to a (probably non-existent)
resource named user mentioned you and fail. Same with ?, which
gets parsed as a theme attribute reference. The fix is to escape only the first character,
\@user mentioned you, not every @ in the string. The same string
with the at-sign in the middle is fine: contact us @ support needs no escape.
The tool only escapes the leading character.
Format args, positional vs. not
If a string has exactly one format specifier, you can use the C-style shorthand:
Welcome, %s or You have %d unread messages. With more than one,
Android requires the positional form: %1$s gave you %2$d coins. The reason is
translation order. German might want the count first: %2$d Münzen kommen von
%1$s. The positional notation lets translators reorder. Non-positional with multiple
args will compile, but at runtime in some locales you'll get crashes or wrong output.
There's a subtler version of this. Suppose your string has a literal % that isn't
meant to be a format specifier, say Battery at 80%, and you call this string
with getString(R.string.battery, ...) with no args. You get
IllegalFormatException at runtime. The fix is either to escape it as
%% or to add formatted="false" to the <string>
element, which tells Android "treat this string as literal text, ignore percent signs." The
formatted="false" route is cleaner if you're sure you'll never use this string
as a format template.
Ampersands and the XML escapes
&, <, and > are XML's own special
characters. Coffee & Tea needs to become Coffee & Tea.
This one's usually caught quickly because the build error is clear, but it shows up in
pasted copy a lot. The tricky case is when your string is supposed to contain HTML markup,
like Tap <b>here</b>, because then you don't want to escape the
angle brackets. The convention is: if the bracketed tag is one of Android's supported
inline styling tags (b, i, u, br,
tt, big, small, sub, sup,
strike, font), leave it alone; for any other angle bracket,
escape. The tool defaults to this convention but offers a toggle in case you want to
escape everything aggressively.
Newlines and whitespace
A literal newline in your strings.xml value gets collapsed to a space by AAPT.
If you want an actual line break in the displayed string, use \n. If you have
a multi-line paragraph and want it to stay multi-line, every line break needs to be
\n. Same with tabs (\t), though tabs in user-facing strings are
rare. Trailing whitespace is the surprise here. Android trims it silently. If you want
trailing spaces for some layout reason, quote-wrap the value.
Things this tool deliberately doesn't do
It doesn't try to validate that your XML is well-formed. If your <string>
element is missing a closing tag, AAPT will catch that, and the error there is usually
clear. It doesn't try to detect duplicate string names, which is also better caught at
build time. It doesn't reformat or pretty-print your XML; whatever indentation you had on
input is what you get on output. And it doesn't touch <plurals> or
<string-array> elements. Those have their own escaping rules (mostly
the same as <string>) but their structure is different enough that
handling them well needs a dedicated tool. If there's demand, I'll add it.
One last note. If you're integrating with a translation management system, do the escaping
after the round-trip with the translator, not before. Most TMS platforms will
escape automatically on import, and if you escape on the way out, you get double-escapes
on the way back in (\\\' in your XML, which is even more confusing than the
original problem). Keep the source strings unescaped, do escaping at build time or at the
file-emit step, and you'll spend less time debugging file-format issues.
How Android actually parses strings.xml
Knowing the parsing model helps predict edge cases that pure rule-following won't. AAPT2
processes strings.xml in three passes. The first is XML parsing, a standard
XML parse, which is where the ampersand and angle-bracket rules come from. The second is
Android-specific normalization: collapsing whitespace, processing the backslash escapes
(\', \", \\, \n, \t,
\u00xx), and handling the leading-character escapes for @ and
?. The third is the format-args check: when AAPT sees a % in a
string, it decides whether the string is "formatted" and applies the positional / non-positional
rule from there.
The three-pass model explains a class of bugs that seem random. A string with both an
ampersand and a percent sign, like Welcome to %1$s & friends, needs the
ampersand XML-escaped (pass 1) and the percent positional (pass 3); if you forget either,
the error message only mentions one of them, you fix it, and the build still fails on the
next pass. The tool above runs all three checks at once so you see every problem before
committing.
Patterns the validation step doesn't catch
The escape rules guarantee your XML compiles. They don't guarantee your string actually works at runtime. A few patterns slip through every validator and cause production bugs worth knowing about.
Bidirectional text and RTL languages
Arabic and Hebrew strings contain Unicode bidirectional control characters (U+200E LTR mark,
U+200F RTL mark, U+202A–U+202E embedding and override codes) that survive every escape rule
because they're not in the set Android treats as special. Concatenating an LTR username
with an RTL phrase using %1$s plus Arabic glue produces visually-correct text
only if you've inserted the LTR mark explicitly: %1$s.
Without it, the username "John" rendered inside a right-to-left sentence appears as "nhoJ"
because the bidi algorithm assumed it belonged to the surrounding script. AAPT will not
warn you.
Trailing zero-width characters
When copy comes from Notion, Google Docs, or any rich-text source, you sometimes get a
U+200B zero-width space at the end that's invisible in every editor. The string compiles.
The string displays correctly. The string fails string equality checks against the same
value typed by hand, breaking analytics filters, A/B-test variant matching, and translation
glossary lookups. Defending against this means stripping [-]
from incoming copy before storing it; the tool above does this automatically on input.
Locale-sensitive uppercase
Turkish dotted vs dotless I (İ / ı vs I / i) are the canonical example. A string that reads
HELLO when localised to Turkish will not match "HELLO".uppercase()
from a default-locale Kotlin expression. Turkish locale's uppercase of "hello"
is HELLO with a regular I, but "İstanbul".lowercase() with the
Turkish locale gives istanbul with a dot. Mixing these in equality checks is a
common source of "this works on my emulator and crashes for users in Istanbul" bugs. Use
.uppercase(Locale.ROOT) for identifiers, .uppercase(Locale.getDefault())
only for display.
Build-time vs runtime escapes
A common confusion: when do you escape, and when does Android do it for you? The rule
is that strings.xml escapes are a build-time concern. By the time
getString(R.string.foo) returns a string at runtime, Android has already
decoded everything. \' is now a real apostrophe, \n is a real
newline, the angle brackets in <b> have become an HTML-styled span.
You don't need to undo escapes in Kotlin. You also don't need to re-escape strings you
pass to setText(); it's plain text from the framework's perspective.
Where this matters is when you're building strings dynamically and writing
them to a TextView or to a sharing intent. If you concatenate two getString results, you
get the two decoded strings glued together, which is fine. If you stringify a network
response and assign it to setText, the response might contain HTML markup
that setText won't render unless you pass it through
HtmlCompat.fromHtml(rawString, FROM_HTML_MODE_LEGACY) first. That's a
different escape regime entirely (HTML, not XML), and the rules are similar but not
identical: HTML cares about and named entities; the
strings.xml parse doesn't.