Why Assamese Unicode Looks Wrong on Windows: Rendering Problems Explained
A technical explanation of why Assamese Unicode text sometimes displays incorrectly on Windows — covering font fallback chains, OpenType GSUB/GPOS tables, conjunct shaping failures, vowel mark positioning, and PDF generation issues.
The Three Layers Where Assamese Rendering Can Fail
Correct Assamese Unicode display requires three systems to work correctly in sequence. A failure at any layer produces wrong output regardless of the other layers being correct.
Layer 1: Text encoding — The raw bytes represent correct Unicode code points for the intended Assamese characters. This layer is the most commonly verified and the least commonly faulty in modern systems.
Layer 2: OpenType shaping — The software applies glyph substitution (GSUB) and glyph positioning (GPOS) rules from the font’s OpenType tables. This is where most Assamese rendering failures occur.
Layer 3: Font glyph coverage — The font file contains the actual glyph artwork for every code point that needs to be displayed. Gaps in coverage trigger font fallback.
The OpenType Shaping Dependency
Assamese conjunct rendering is entirely dependent on the GSUB (Glyph Substitution) table in the font’s OpenType data. When you type ক্ষ (ksha), the text contains three Unicode code points: U+0995 (ক) + U+09CD (্ hasanta) + U+09B7 (ষ). The raw sequence, rendered without shaping, would display three separate glyphs: ক followed by a hasanta diacritic followed by ষ.
The GSUB table contains a lookup rule: when this specific sequence appears, substitute the three glyphs with the single conjunct glyph ক্ষ. Applications that implement OpenType Indic shaping (Uniscribe on Windows, HarfBuzz on Linux/macOS, CoreText on macOS) apply this rule automatically. Applications that do not implement Indic shaping do not apply GSUB rules and display the decomposed sequence.
Applications that typically apply correct Assamese OpenType shaping:
- Word (2013 and later, with Complex Script rendering)
- InDesign (all modern versions with Assamese/Bangla paragraph composer)
- Web browsers (Chromium, Firefox, WebKit — all use HarfBuzz for shaping)
- LibreOffice (HarfBuzz-based shaping)
Applications that may fail Assamese shaping:
- Older versions of Office (pre-2013) on Windows XP-era Uniscribe
- Some PDF export pipelines that flatten glyphs before shaping
- Custom-built applications that use GDI text rendering rather than DirectWrite
- Some versions of notepad and basic text editors
For professional Assamese input that is guaranteed to produce correctly-shaping Unicode sequences, the Jahnabi Pro Keyboard uses tested Unicode output that conforms to the shaping expectations of Uniscribe and HarfBuzz. The Assamese typing guide for DTP professionals covers which keyboard modes produce Unicode vs legacy Geetanjali encoding.
The Windows Font Fallback Chain
When an application encounters a Unicode code point that the active font does not contain a glyph for, Windows substitutes a fallback font from its font fallback chain. For the Assamese/Bangla Unicode block (U+0980–U+09FF), the Windows 10/11 fallback chain typically resolves to:
- Vrinda — Windows built-in Bangla/Assamese font; covers most basic characters but has incomplete conjunct coverage and no Assamese-specific U+09F0 (ৰ) and U+09F1 (ৱ) in older versions
- Nirmala UI — Modern Windows Indic font; includes Assamese coverage in Windows 10+
- Generic fallback — Usually a Latin font with no Assamese coverage; displays as boxes
The fallback substitution is invisible in most applications — you do not see which font rendered which character. This creates a situation where a word may have some characters from your selected font and others from Vrinda or Nirmala UI, producing visual inconsistencies: slight size differences, weight variations, and conjunct glyph design mismatches within a single word.
For consistent Assamese rendering, use a font that covers the entire Assamese Unicode range so that fallback is never triggered. Noto Serif Bengali and Noto Sans Bengali cover U+0980–U+09FF completely, including U+09F0 and U+09F1.
The GPOS Vowel Mark Positioning Problem
Even when conjunct substitution works correctly, vowel mark (matra) positioning can fail independently. The GPOS (Glyph Positioning) table defines exact attachment coordinates — where a matra anchors relative to its host consonant.
The ো vowel (o-matra), which consists of a mark to the left and right of the consonant, requires coordination between two glyph positions. If the GPOS anchor data is missing or incorrect, the split matra components may:
- Overlap the consonant glyph
- Appear at incorrect horizontal offsets
- Be positioned at the wrong vertical alignment relative to the Shirorekha
This is visible on Windows when using fonts with incomplete GPOS tables. The text is technically correct Unicode; the encoding is correct; the GSUB substitution worked — but the final rendered output shows a matra floating at the wrong position.
DTP Software: Font Embedding in PDFs
PDF generation from Assamese documents introduces an additional failure point: font embedding. If an Assamese DTP document is exported to PDF without embedding the fonts, the PDF viewer must locate a font on the viewer’s system to render the Assamese text. If that font is not Noto Serif Bengali (or whatever was used in the original), the substituted font may have different glyph designs, different GPOS positioning, or incomplete conjunct coverage.
Correct PDF export settings for Assamese:
- Embed all fonts (not subset-only if possible, or subset at 0% threshold)
- Use PDF/X-4 or PDF/A format for archival and print
- Verify that Assamese-specific characters U+09F0 and U+09F1 are included in the embedded subset
The Assamese keyboard and input guide covers Unicode input methods. For encoding architecture differences, see the Unicode vs Geetanjali comparison. For the practical DTP workflow implications — including PageMaker’s Geetanjali pipeline vs InDesign’s Unicode pipeline — see the Assamese DTP software guide and the PageMaker to InDesign migration guide.
Practical Diagnosis Steps
If Assamese Unicode is displaying incorrectly on your system:
- Check font coverage: Open Character Map, navigate to the Bangla/Assamese block, and verify that your selected font shows actual Assamese glyphs (not boxes) at U+09F0 and U+09F1
- Check shaping support: Type ক্ষ in the application. If you see three separate characters instead of one conjunct, the application is not applying OpenType Indic shaping
- Check fallback: In Windows Settings → Time & Language → Language → Add a Language, ensure “Assamese” is installed as a preferred language, which updates the system font fallback preferences
- Check PDF embedding: Open a generated PDF in a text viewer that shows embedded font information; confirm the Assamese font name appears in the embedded fonts list
For Assamese web pages, the most reliable rendering strategy is declaring the font stack explicitly with Noto Serif Bengali as the primary font and Google Fonts as the source, ensuring the shaping-capable font is loaded before the browser’s system font fallback is consulted.
For a full technical comparison of Geetanjali vs Unicode encoding — and why Geetanjali documents that look correct in PageMaker display as garbled ASCII on the web — see the Unicode vs Geetanjali architecture blog post. If your workflow involves converting legacy Geetanjali documents to Unicode for web or modern DTP use, Rupantarak handles bidirectional conversion with complete conjunct mapping.
Frequently Asked Questions
Why do some Assamese Unicode characters show as boxes or wrong glyphs on Windows?
Boxes (replacement characters) indicate the font in use does not contain a glyph for that Unicode code point. Wrong glyphs indicate font fallback — Windows has substituted a different font for missing characters, and the fallback font's Assamese glyphs may have incorrect proportions or positioning. Install Noto Serif Bengali or Noto Sans Bengali to ensure comprehensive Assamese Unicode coverage.
Why do Assamese conjuncts look wrong even when the text is correct Unicode?
Correct Unicode encoding is necessary but not sufficient for correct Assamese rendering. The software must also apply OpenType shaping rules (GSUB table for glyph substitution, GPOS table for positioning). Applications that do not use OpenType-aware text layout (older versions of Office, some DTP tools) will display the component Unicode code points without combining them into conjunct forms — showing a consonant + hasanta + consonant as three separate glyphs instead of one conjunct.
What is the correct font for Assamese Unicode on Windows?
Noto Serif Bengali (for formal/print contexts) and Noto Sans Bengali (for screen/UI) provide the most complete Assamese Unicode coverage. Windows 10 and 11 include Vrinda as the default Assamese/Bangla system font, which covers the basic range but has gaps in conjunct coverage. For professional publishing, install Noto fonts explicitly rather than relying on system fonts.
Why does Assamese PDF text look different from the screen display?
PDF rendering depends on which fonts are embedded in the PDF and how the PDF viewer applies shaping. If the PDF was generated without embedded fonts, the viewer substitutes its own fonts, which may have different glyph designs or incomplete conjunct tables. Always embed fonts when generating Assamese PDF documents. For InDesign, use the Subset fonts option with 0% threshold to embed all glyphs.