Template talk:FFFF

=Documentation=

Synopsis
At [[adăpost]]:


 * ===Noun===

When to use
In certain languages' writing systems, there exist additional letters inserted into an otherwise translingual alphabet, such as the Latin or Cyrillic alphabet. For example, in Spanish, the letter ñ comes between the letters n and o, such that piña: comes between pinto: and piojo: in alphabetical order. By default, MediaWiki won't use this correct order when listing members of a category; but it does allow an entry's "sort key" to be specified, controlling the entry's placement. (If no sort key is specified, the entry's title is used.) In an entry for a Spanish word containing ñ, the sort key should be the headword, except replacing all instances of ñ with n.

Note that this approach doesn't work for all languages; it depends on the language's traditional alphabetization scheme. In some languages, such as Swedish, the additional letters are added to the end of the alphabet; and in other languages, such as French, diacritics are mostly ignored in the traditional alphabetization scheme (such that des: and dès: are mostly equivalent: dès: follows des:, but comes before dessiner: and detenir:). Languages in which it does work include Mongolian, Romanian, Spanish, and Turkish.

How it works
inserts a non-Unicode character, which MediaWiki sorts after all basic Unicode characters (including all Latin and Cyrillic letters); so, it's kind of like inserting a z — pinto: comes before pinza: comes before piojo: — but a bit more robust.

=Discussion=

~
While I definitely appreciate the clever funkiness of this solution, wouldn't it work just as well to use a tilde (which is the last ASCII glyphic character)? (For Cyrillic characters that would work less well — for example, we'd have to replace Ө with П~ instead of with О~ — but there's probably an easier character to use for that, too.) Relatedly, the name of this template describes its implementation rather than its functionality, which strikes me as sub-ideal; if we do want to go the template route, I think it would be better to have e.g., which would expand to a&#xFFFF;. —Ruakh TALK 19:29, 16 September 2008 (UTC)
 * No one said I was best at naming templates. I tend to think rather techy.  Anyway, &#XFFFF; wouldn't work on its face because that's an escape code for HTML display.  For the collation to actually work, it needs to be the raw character which itself collates at U+FFFF.  ...or am I misunderstanding what you're talking about?  And yes, tildes might work for Latin-only.  In fact, U+FFFF works only for UTF-16 (where >= U+10000 is encoded using surrogates), or if the alphabet is entirely <= U+FFFD.  If an alphabet with >= U+10000 is used, inevitably a different reservedly unused character would have to be used.  Oh, and thank you for reconstructing the usage notes in the talk page&mdash;I very much appreciate your effort. ^_^ - Gilgamesh 19:47, 16 September 2008 (UTC)


 * Don't worry, the whole method is a crock. See RFD. The FFFx characters cannot and should not be used; the correct, and s/w engineering standard solution and best practice is to insert the desired preceding sort letter or other sub-key. No template needed or wanted. Robert Ullmann 23:48, 16 September 2008 (UTC)

Just passing by...
Out of interest, is it completely impossible to sort these by convention. I have scripts here for generating the indices that use the convention of the language to determine which order to sort letters with diacritics (writing a method to generate a sort-key from a string isn't too hard if you know what the conventions to be used are). I do know that for some languages with double-letters, like Hungarian, it is impossible for a computer to guess completely accurately how to sort the word (as cs looks like cs etc.) but I would have thought that people would be more consistent with sorting the diacritics. Conrad.Irwin 00:09, 17 September 2008 (UTC)


 * As you note, it is usually possible to sort languages automatically; and I *suspect* that given any particular language, the people who produce dictionaries and such know how to do it. (Just like the typesetting systems I used in the '70s could hyphenate any word: they had about a dozen rules, and then a list of only a handful of exceptions.) Our general problem is doing things by-language-section, which we can't at the current MW state. We can't parse things out in templates, and can't set up a DEFAULTSORT for the entry, although we might get close in a lot of cases. 'Bots could certainly gen up sort keys for specific uses. Robert Ullmann 00:34, 17 September 2008 (UTC)