ios - What is the meaning of 'Swift are Unicode correct and locale insensitive' in Swift's String document? - TagMerge
2What is the meaning of 'Swift are Unicode correct and locale insensitive' in Swift's String document?What is the meaning of 'Swift are Unicode correct and locale insensitive' in Swift's String document?

What is the meaning of 'Swift are Unicode correct and locale insensitive' in Swift's String document?

Asked 11 months ago
2 answers

To expand on @matt's answer a little:

The Unicode Consortium maintains certain standards for interoperation of data, and one of the most well-known standards is the Unicode string standard. This standard defines a huge list of characters and their properties, along with rules for how those characters interact with one another. (Like Matt notes: letters, emoji, combining characters [letters with diacritics, like é, etc.)

Swift strings being "Unicode-correct" means that Swift strings conform to this Unicode standard, offering the same characters, rules, and interactions as any other string implementation which conforms to the same standard. These days, being the main standard that many string implementations already conform to, this largely means that Swift strings will "just work" the way that you expect.

However, along with the character definitions, Unicode also defines many rules for how to perform certain common string actions, such as uppercasing and lowercasing strings, or sorting them. These rules can be very specific, and in many cases, depend entirely on context (e.g., the locale, or the language and region the text might belong to, or be displayed in). For instance:

  • Case conversion:
    • In English, the uppercase form of i ("LATIN SMALL LETTER I" in Unicode) is I ("LATIN CAPITAL LETTER I"), and vice versa
    • In Turkish, however, the uppercase form of i is actually İ ("LATIN CAPITAL LETTER I WITH DOT ABOVE"), and the lowercase form of I ("LATIN CAPITAL LETTER I") is ı ("LATIN SMALL LETTER DOTLESS I")
  • Collation (sorting):
    • In English, the letter Å ("LATIN CAPITAL LETTER A WITH RING ABOVE") is largely considered the same as the letter A ("LATIN CAPITAL LETTER A"), just with a modifier on it. Sorted in a list, words starting with Å would appear along with other A words, but before B words
    • In certain Scandinavian languages, however, Å is its own letter, distinct from A. In Danish and Norwegian, Å comes at the end of the alphabet: ... X, Y, Z, Æ, Ø, Å. In Swedish and Finnish, the alphabet ends with: ... X, Y, Z, Å, Ä, Ö. For these languages, words starting with Å would come after Z words in a list

In order to perform many string operations in a way that makes sense to users in various languages, those operations need to be performed within the context of their language and locale.

In the context of the documentation's description, "locale-insensitive" means that Swift strings do not offer locale-specific rules like these, and default to Unicode's default case conversion, case folding, and collation rules (effectively: English). So, in contexts where correct handling of these are needed (e.g. you are writing a localized app), you'll want to use the Foundation extensions to String methods which do take a Locale for correct handling:

among others.

Source: link


It basically just means that Swift strings are Unicode strings. A Swift string "character" is a character in a Unicode sense: a letter, an emoji, a combined letter-and-diacritic, whatever. A string can also be viewed not merely as a character sequence but as a sequence of UTF8, 16, or 32 code points. The "locale insensitive" stuff means they don't have a locale dependent encoding, as strings did in the bad old days before Unicode.

This is delightful but it has some downsides, most notably that strings qua character-sequence are not directly indexable by integers.

Source: link

Recent Questions on ios

    Programming Languages