bzdww

Get answers and suggestions for various questions from here

Windows issues with font `cmap` table compatibility

cms
`cmap` is a table in the font responsible for character encoding mapping. It can contain multiple subtables. The functions and format definitions of these subtables are different.

The default use of [ cmap_format_4 platformID="3" platEncID="1"] in Windows is because this subtable uses UCS-2 storage encoding, so it can only support U+0000 - U+FFFF encoding, Unicode This paragraph is called BMP (Basic Multilingual Plane, basic multilingual plane).

Later, Unicode included more and more characters, BMP is not enough, so format 4 also evolved an extended version: [ cmap_format_12 platformID="3" platEncID="10"]. It uses UCS-4 storage encoding, encoding The range is U+00000000 - U+7FFFFFFF. For example, the encoding range of commonly used emoji characters is U+1FXXX. If the old format 4 is used, it cannot be supported.

The description of this in the Microsoft documentation is:

Fonts providing Unicode encoded UCS-4 character support for Windows 2000 and later, need to have a subtable with platform ID 3, platform specific encoding ID 1 in format 4; and in addition, need to have a subtable for platform ID 3, platform specific encoding ID 10 in format 12. Please note, that the content of format 12 subtable, needs to be a super set of the content in the format 4 subtable. The format 4 subtable needs to be in the cmap table to enable backward compatibility needs.

Apple's description is:

If a font has a 3/10 cmap (Windows, UCS-4), it should also have a 3/1 (Windows, BMP-only) cmap as well for backwards compatibility with Windows XP. The two should have identical mappings for Unicode's Basic Multilingual Plane.

That is, format 4 records the part inside the BMP, and format 12 is a superset of format 4, which not only contains the former content, but also records the part beyond the BMP, but also retains format 4 for backward compatibility. Fonts that contain characters outside the BMP range often contain both format 4 and format 12 subtables.

But the strange thing is that although the document says that format 4 is reserved for backward compatibility, in new Windows systems (even Windows 10), if a font does not have format 4 only format 12, it is not available. Apple and Google Other vendors obviously think that including two sub-tables at the same time is a kind of trousers, so some newer fonts, such as the Hiragino Sans series fonts included with OS X El Capitan , or the Noto Color Emoji fonts included with Android , are not. Contains format 4, which makes these fonts unusable in Windows, like that:

Although I have reflected this problem through the official channels of Windows 10, I hope to help.