Wiktionary:Votes/2016-10/Redirect fullwidth and halfwidth characters

Redirect fullwidth and halfwidth characters
Voting on:


 * Redirecting all fullwidth and halfwidth characters to their normal forms. This includes redirecting all fullwidth letters, numbers, symbols and punctuation, halfwidth katakana, and hangul letters.

Special considerations:
 * If the normal form is a redirect itself, the fullwidth or halfwidth entry should be a redirect to the same entry. Ex.: halfwidth ｢ should redirect to 「 」, because the normal 「 already redirects to 「 」.
 * This proposal says nothing about whether the normal entry should exist. If the normal entry "→" should not exist, then the halfwidth "￫" need not be created as a pointless redirect.
 * This proposal says nothing about words written using these characters, such as ＣＤ, ｃｏｍｐａｃｔ　ｄｉｓｋ or ｳｨｸｼｮﾅﾘｰ ("Wiktionary" in halfwidth katakana).

Examples:

These specific examples are for illustrative purposes only. The full list of affected entries is believed to be linked at Appendix:Unicode/Halfwidth and Fullwidth Forms.


 * Ａ → A
 * ７ → 7
 * ＄ → $
 * ￢ → ¬
 * ﾗ → ラ
 * ﾢ → ㄲ

Rationale:
 * Fullwidth and halfwidth characters are encoded in Unicode for compatibility purposes. The fullwidth A is merely a typographical variant of a normal A. While there are some entries for typographical varieties like ᴀ (small caps A, used as a phonetic symbol) and ª (feminine ordinal indicator, which is a superscript a), we may argue that some of them have value as separate entries specifically because they represent different concepts. Having a separate entry for "fullwidth A" is arguably the same as having other separate entries for "italic A", "bold A", "underlined A", "yellow A", "blue A", "sans-serif A", "cursive A", etc. which probably don't represent separate concepts (until proven otherwise).
 * This vote does not deal with longer terms written in halfwidth and fullwidth characters such as ｃｏｍｐａｃｔ　ｄｉｓｋ because they are potentially infinite. Apparently, per Talk:CD, consensus is that these entries should not exist. If someone tries searching for ｃｏｍｐａｃｔ　ｄｉｓｋ and fails, there's a chance that they will try to search for the component characters of the term they seek.

Schedule:
 * Vote starts: 00:00, 28 October 2016 (UTC)
 * Vote ends: 23:59, 26 November 2016 (UTC)
 * Vote created: --Daniel Carrero (talk) 13:38, 21 October 2016 (UTC)

Older discussions:
 * [[Image:Wikt rei-artur3.svg|20px]] Beer parlour/2009/August
 * [[Image:Wikt rei-artur3.svg|20px]] Talk:CD (2013)
 * [[Image:Wikt rei-artur3.svg|20px]] Beer parlour/2015/January

Recent discussions:
 * [[Image:Wikt rei-artur3.svg|20px]] Beer parlour/2016/September
 * [[Image:Wikt rei-artur3.svg|20px]] Beer parlour/2016/October
 * [[Image:Wikt rei-artur3.svg|20px]] Wiktionary talk:Votes/2016-10/Redirect fullwidth and halfwidth characters

Support

 * 1)  --Daniel Carrero (talk) 00:01, 28 October 2016 (UTC)
 * 2)  —Μετάknowledge discuss/deeds 00:45, 28 October 2016 (UTC)
 * 3)  support redirecting existing entries, but oppose systematically creating new redirects from non-existing entries. --WikiTiki89 14:32, 28 October 2016 (UTC)
 * I am kind of hoping we could redirect all the halfwidth/fullwidth character entries whenever possible, including those pages that don't exist yet. There are 235 pages linked in Appendix:Unicode/Halfwidth and Fullwidth Forms. Of these, 187 pages already exist (defined as "fullwidth form of" or "halfwidth form of") and 48 don't exist, which is 20.4% (~= 1/5). Basically, most pages that don't exist are halfwidth hangul letters and most halfwidth hangul letters don't exist yet. "HALFWIDTH FORMS LIGHT VERTICAL" (￨) does not exist either.
 * My reason is: I'd like to have a consistent system if possible, which can be achieved in a number of ways: 1) having all [full/half]width entries as hard redirects to the normal entries (which is being voted right now), 2) having all [full/half]width entries as normal separate entries, 3) having all [full/half]width entries as hard redirects to the Unicode appendix, 4) outright deleting all [full/half]width entries without creating any redirects. I prefer the type 1 because this way readers can still search and use these characters. I believe, if we have a reason to keep ﾢ  as a redirect (which is a halfwidth hangul page that exists since 2009), then we have equal reason to keep ﾧ as a redirect (another halfwidth hangul letter that was not created yet). --Daniel Carrero (talk) 14:37, 3 November 2016 (UTC)
 * Note that this vote does not make it clear whether it is referring to existing entries or all potential entries. I think we should have each voter clarify which of those they are consenting to. --WikiTiki89 17:28, 3 November 2016 (UTC)
 * I understand. Actually, I may be mistaken in how clear the vote needed to be because I created it, but I intended "Redirecting all fullwidth and halfwidth characters to their normal forms." to mean all entries, whether they were already created or not. But even excess clarity may be a good thing, so I should probably have mentioned in the vote "all entries, whether they were already created or not".
 * I think the thing we are discussing at the moment is basically: "Should halfwidth hangul letters be redirects to their normal letters?" because these are the group mainly affected by your concern. If this vote passes, I'd say: why not? -- and I gave my reasons above as to why I think that creating these redirects is a good idea. If this vote fails (which seems unlikely at this point), we would presumably keep a lot of separate entries for fullwidth and halfwidth forms. In this case, I think that people would still be able to create new separate entries for halfwidth forms of Korean hangul letters, rendering the distinction of "created and uncreated entries" moot, until further discussion. --Daniel Carrero (talk) 17:39, 3 November 2016 (UTC)
 * 1)  redirecting existing entries; (very) weak support for creating redirects for non-existing entries.  — Andrew Sheedy (talk) 22:56, 31 October 2016 (UTC)
 * 2)  -Xbony2 (talk) 19:40, 2 November 2016 (UTC)
 * 3)  — SMUconlaw (talk) 11:06, 24 November 2016 (UTC)
 * 4)  – as long as character boxes are added to the entries to which the full- and half-width character entries redirect. I do not object to adding redirects from full-width or half-width character entries that do not yet exist. — Eru·tuon 18:31, 26 November 2016 (UTC)

Oppose

 * . They're used as sutegana in Miyako. ｽﾞ has no sutegana codepoint, and is present in the words ぴづｽﾞ (pidzɿ, "elbow") and ｽﾞざら (ɿzara, "sickle"). Nibiko (talk) 23:53, 28 October 2016 (UTC)
 * The vote says that it will have no effect on words written with these characters. This is only concerning the characters as they are used on their own. —Μετάknowledge discuss/deeds 00:00, 29 October 2016 (UTC)
 * Sutegana characters have entries on their own that describe their distinct phonetic usage, so it is not uniform to have some characters that are used as sutegana with entries on their own, but others with no entries on their own. The sutegana codepoints are listed under sutegana, and in case of a missing codepoint, halfwidth characters are used in place. Miyako is a Ryūkyūan language and has a distinct phonology from Japanese. Anyway, I managed to combine the Ainu consonant character with the rendaku mark to create ㇲﾞ. The standards for Miyako need to be drafted, but for now I'll use the Ainu characters and strike my vote. Nibiko (talk) 01:17, 29 October 2016 (UTC)
 * Just like we have ᴀ and ª as entries for typographical variants that represent separate concepts, I'd support keeping separate entries for any halfwidth or fullwidth characters that mean something other than "fullwidth (or halfwidth) form of something". Sorry I didn't add this caveat in the vote. I think we should create ㇲﾞ (Ainu consonant + rendaku mark), as a Miyako entry explaining the character. It looks better to me than using the halfwidth ｽﾞ, but if Miyako editors prefer using the halfwidth one as a normal entry, I'd support doing it. In my defense, at the moment there is no halfwidth katakana entry defined as anything other than a halfwidth katakana, so apparently they can all be safely redirected, I believe. --Daniel Carrero (talk) 01:30, 29 October 2016 (UTC)


 * : I think these are useful as separate entries on single character level; ｃｏｍｐａｃｔ　ｄｉｓｋ is a different story. Thus, I think it worthwhile if someone who accidentally sees a character somewhere is able to enter it into Wiktionary, and get an express information on that character or code point. I do not see how hard redirect is more useful for the reader than what we currently have. --Dan Polansky (talk) 08:57, 26 November 2016 (UTC)
 * Naturally, you and I may have different opinions. I do believe hard redirects are better, but we may want to discuss further what are the merits of each way to treat some codepoints.
 * For example, I would like the entry $ (which is the normal dollar sign) to have codepoint information for ＄, 💲 , ﹩.
 * I feel that, even if we had separate entries for each dollar sign, it's a good idea to have a complete list of codepoints in the main entry, because they're basically the same character. It follows that, if the main entry explains everything anyway, it may be a good idea to redirect the alternative codepoints to it.
 * The Wikipedia page dollar sign also has a list of multiple codepoints for that symbol.
 * Personally, I specially dislike when I see a character variation in the "Derived terms" of the normal character, because I feel that this invites people to waste time seeing multiple entries for basically the same thing. (Setting aside the trouble of citing a small character or a fullwidth character vs. citing the normal character.)
 * Similarly, considering ✝ as the main "Latin cross" entry, I already redirected these other Latin crosses to it: 🕇, 🕆 , ✟ , ✞ . The main Latin cross entry currently has information on all these codepoints. --Daniel Carrero (talk) 13:43, 26 November 2016 (UTC)
 * With hard redirect, the user may not even notice they were redirected; with soft redirect, they would at least notice since they stay on the original page until they click on the link. And these kinds of characters can be puzzling, depending on their rendering in the font. --Dan Polansky (talk) 18:10, 26 November 2016 (UTC)
 * I take your point: "the user may not even notice they were redirected". Still, I wonder if it's worth creating or otherwise keeping whole separate entries to make it clearer that the readers are dealing with separate codepoints.
 * For one, if we define "Ｄ" as "Fullwidth form of D", this would not make a lot of sense if Wiktionary were printed or if the reader does not care about separate codepoints. We might as well define " D " as "Comic Sans MS form of D".
 * There are other websites where we can search for each codepoint in Unicode. We could discuss the possibility of using Wiktionary as a huge Unicode database too, with one entry for each codepoint, even all the unattestable ones: ¤ and ⮴ failed RFV at some point and were deleted. Actually, even if we did that, personally I'd prefer redirecting fullwidth and other entries, but that's a separate discussion. As per Votes/2011-06/Redirecting combining characters, we redirect  to , because the distinction between them is meaningful to computers rather than people. Currently, Wiktionary is not a comprehensive codepoint database anyway: if readers can't access separate entries for ¤, ⮴ and the combining acute accent, I'm not sure why we would keep separate entries for fullwidth Ａ and Ｂ.
 * That said, I prefer using hard redirects, but if people want to have separate entries for fullwidth characters, I suppose we could have soft redirects using . At least, I'd like to make it clear that alternative typographies of symbols are not "real" entries like the others, if it's OK with everyone. --Daniel Carrero (talk) 22:33, 26 November 2016 (UTC)
 * Votes/2011-06/Redirecting combining characters was, in my opinion, a vote held without full knowledge of its effect. Consider, for example, and, which are, respectively, the combining forms of three and two different codepoints each; obviously, they can't just hard redirect to their spacing variants, hence the way they're currently defined. — I.S.M.E.T.A. 02:16, 30 November 2016 (UTC)
 * I understand. I think you're right. --Daniel Carrero (talk) 02:22, 30 November 2016 (UTC)
 * For consistency, I would prefer that all combining characters were handled like that. — I.S.M.E.T.A. 02:36, 30 November 2016 (UTC)
 * Personally, I don't like that idea very much. Still, if anything, I admire the motivation of seeking consistency in entries. We can surely discuss what you proposed.
 * Per Beer parlour/2016/October, I'd like if basically all single-character entries that are the same thing as other codepoints (or variations in typography/appearance) were redirected, which is a different attempt at having a consistent proposal. Some people seem to be accepting it well, judging by three redirect votes that already passed with zero or one oppose votes (1, 2 and the current one).
 * In my opinion, if the same combining form has multiple spacing forms, the combining form can exist as a "disambiguation" page, but I'd prefer not having two separate entries for the "combining acute accent" and the "acute accent", which are the same thing. This is related to the fact that, per Matched-pair entries (which was created as a result of 4 votes that passed), we need “ and ” as separate entries for disambiguation purposes, but ⌊ and ⌋ should be redirects.
 * The vote "Redirecting combining characters" probably should have mentioned "when possible, redirect the entry, otherwise the vote is void for that entry in particular", though it didn't.
 * The vote did say, though, "When the title of an entry is a combining character, common tasks such as typing the title and linking to the entry are usually very problematic. These entries are relatively unreachable to users below a certain level of knowledge of Unicode.", so in my opinion it may be a good idea to limit the number of entries with these problems. --Daniel Carrero (talk) 05:38, 30 November 2016 (UTC)
 * U+0301 COMBINING ACUTE ACCENT suffers from the same problem as U+0313 COMBINING COMMA ABOVE and U+0314 COMBINING REVERSED COMMA ABOVE: It is the combining form of U+00B4 ACUTE ACCENT, U+0384 GREEK TONOS, U+1FFD GREEK OXIA, and maybe others (like U+02B9 MODIFIER LETTER PRIME and U+02CA MODIFIER LETTER ACUTE ACCENT). I note that msh210 said in Votes/2011-06/Redirecting combining characters that “[t]here are 1199  characters”, that “[c]ertainly not all of them have non-combining counterparts”, and that he'd “venture a wild guess that most of the first eighty do, and that few of the others do.” It seems to me like a bad idea to create such a vote when it can only apply to about 7% of all combining characters and when it is unworkable even in its application to many of those. — I.S.M.E.T.A. 15:11, 6 December 2016 (UTC)
 * Ok, I acknowledge these points you made: "It seems to me like a bad idea to create such a vote when it can only apply to about 7% of all combining characters and when it is unworkable even in its application to many of those." and "For consistency, I would prefer that all combining characters were handled like that.  [as separate entries]".
 * Personally, what I think is best is: actually, I'd like to keep the redirects from combining forms to spacing forms whenever possible, and create new redirects in the future if possible too, even if the number of combining form redirects is small, for consistency with all other redirects created (and proposed to be created in the future) on the basis of two separate codepoints being basically the same single character.
 * Feel free to disagree with me. If you think undoing that vote (Votes/2011-06/Redirecting combining characters), is a good idea, maybe you would like to create a new BP discussion about it. --Daniel Carrero (talk) 09:57, 10 December 2016 (UTC)
 * I forgot to ping you in my message above. --Daniel Carrero (talk) 09:59, 10 December 2016 (UTC)
 * I respectfully do disagree with you, but this is just a niggle, and not something worth worrying about too much. Right now, I don't see a big problem with the redirects, but if one crops up in the future, it may be worth reviewing the decision come that time. — I.S.M.E.T.A. 11:15, 10 December 2016 (UTC)
 * : Ok, that sounds good to me. I'm fine with reviewing the redirects whenever seems appropriate. --Daniel Carrero (talk) 11:34, 10 December 2016 (UTC)


 * Late per Dan and because I do not like the silent nature of redirection and because two character boxes are just too geeky to understand. If traditional Chinese character entries are not redirects then this should not be either.--Dixtosa (talk) 08:34, 30 November 2016 (UTC)
 * Chinese redirects would conflict with Japanese. Nibiko (talk) 08:56, 30 November 2016 (UTC)
 * To be clear, I'd oppose redirecting traditional Chinese characters, per Nibiko's stated reason above. --Daniel Carrero (talk) 13:58, 30 November 2016 (UTC)
 * Re: "two character boxes are just too geeky to understand". I respect that you have an opinion that is different from mine. Still, if we mean to say "the character (...) is an alternate codepoint of the character (...)", the information presented is just as geeky if split into two separate entries, with the added drawback of having to visit both entries in order to understand the difference. If Unicode has multiple codepoints for the same (or basically the same) character, in my opinion the "main" character entry should explain all the codepoints, regardless of the existence or nonexistence of separate entries for them. I support using the usage notes section to explain codepoints when applicable, in addition to the charboxes. Click ‐|here too see the entry for the hyphen, where the usage notes section currently explains the "hyphen-minus", regular "hyphen", "non-breaking hyphen" and the "soft hyphen". --Daniel Carrero (talk) 14:43, 30 November 2016 (UTC)

Discussion

 * How does this affect the provision of for these fullwidth and halfwidth forms? Pinging  as vote creator. — I.S.M.E.T.A. 10:52, 21 November 2016 (UTC)
 * : See !, $, +, @, 🛇, 🌎, 💾, ⌛, ✓, ✗.
 * When I feel that a given codepoint is just alternative typography, appearance or simply the same character from a different codepoint, I like to redirect it to the "main" character and add multiple charboxes in the main character corresponding to each separate codepoint. (if it's OK with everyone, of course)
 * With that in mind, I'd like to redirect Ａ (fullwidth) to A (normal). In the entry A (normal), I'd like to add a new charbox about A (fullwidth).
 * Another example: I'd like to redirect ﾋ (halfwidth) to ヒ (normal). In the entry ヒ (normal), I'd like to add a new charbox about ﾋ (halfwidth).
 * How to handle charboxes is not part of the vote. (But it could have been!) So, the vote could pass and if people prefer doing things any other way, we don't have to do it the way I described.
 * --Daniel Carrero (talk) 11:06, 21 November 2016 (UTC)
 * I'm sorry I didn't help to revise the vote text before the vote started and I'm also sorry that I didn't respond to this discussion in time to cast a vote at its conclusion. Re, I predict possible political objections to hard-redirecting and  thereto… Re the scope of this vote, does it only govern codepoints in the range U+FF00–U+FFEF? If so, would it be possible to add code to  that causes it to display “It is a formal policy of the English Wiktionary that all codepoints for halfwidth and fullwidth forms redirect to their normal-width counterparts.” (or a similar note) whenever its 1 is left unspecified or is specified as ＀–￯, 0xFF00–0xFFEF, or 65280–65519? — I.S.M.E.T.A. 02:33, 30 November 2016 (UTC)
 * It's all OK. Maybe I could leave you a message in your talk page when I have new proposals in mind, concerning character boxes. I have a .txt file on my PC with some ideas that I did not yet present to the community because I kind of like to discuss one new thing at a time.
 * What you said about the globes is a very good point. Maybe there's some merit to the idea of re-splitting the globe back into three entries, for the same reasons we have separate entries for color and colour, but I'd really prefer not, because they appear to be exactly the same thing (a globe) seen from different perspectives. I'm also considering the idea of redirecting all the three globes into 🌐, which is a generic globe ("with meridians") and has basically the same meanings currently in the separate 🌎 entry.
 * The idea of using the charbox to let people know of tWiktionary policy sounds nice, but that text is a bit too long for my taste, and could be shorter. That line of text would be repeated in all (or almost all) the 235 entries that have a fullwidth/halfwidth form, which are basic, common characters in different languages. Maybe we could create a comprehensive policy named Character boxes, and then make something like this appear in the charboxes: "The fullwidth entry redirects to the main entry. See Character boxes for more information." --Daniel Carrero (talk) 05:55, 30 November 2016 (UTC)
 * Don't get me wrong: I don't care which globe is the lemma, I just know how touchy people can be; maybe redirecting all three to can cut that Gordian knot. I am perfectly happy with a shorter note like the one you propose; it could be even shorter, like “See Character boxes for the relevant redirection policies.”, perhaps. Yes, I'd appreciate you “leav[ing me] a message [on my] talk page when [you] have new proposals in mind, concerning character boxes”; thank you. — I.S.M.E.T.A. 15:19, 6 December 2016 (UTC)
 * Allright. If it's OK with everyone, I redirected the other globes to 🌐 as discussed above.
 * : I created a new think tank policy now named Character variations. I had suggested using the name Character boxes before, but then I thought: maybe the policy should not be only about character boxes, but also about character redirects that are often the cause for adding multiple charboxes in a single entry. Feel free to edit the policy. I believe the new page should be voted at some point, after we are done discussing it and/or revising it. --Daniel Carrero (talk) 09:17, 10 December 2016 (UTC)
 * Thanks; watchlisted. — I.S.M.E.T.A. 09:31, 10 December 2016 (UTC)

Decision
Passed: 7-1-0 (87.5%-12.5%) --Daniel Carrero (talk) 23:02, 26 November 2016 (UTC)
 * ✅. All characters from Appendix:Unicode/Halfwidth and Fullwidth Forms were redirected to their normal forms according to this vote, unless otherwise stated in this message.
 * The only exception is this entry:
 * The character ￨ ("HALFWIDTH FORMS LIGHT VERTICAL") was not created. I'm pretty sure its counterpart is supposed to be │ ("BOX DRAWINGS LIGHT VERTICAL"), but we currently don't have entries for Appendix:Unicode/Box Drawing characters. The vote specifically predicted that if a halfwidth entry does not have a counterpart, it does not need to be created as a pointless redirect. That said, ｜ ("FULLWIDTH VERTICAL LINE") was successfully redirected to the main vertical line entry.
 * These tasks were not specifically voted but were done anyway, so feel free to suggest doing something else:
 * All main entries with redirects from Appendix:Unicode/Halfwidth and Fullwidth Forms now have multiple to account for the redirected halfwidth/fullwidth forms.
 * All redirected halfwidth/fullwidth entries have and are populating Category:Character variation redirects.
 * All entries listed in Appendix:Unicode/Hangul Compatibility Jamo had redirects created from their respective counterparts in Appendix:Unicode/Hangul Jamo whenever possible. When the entry in "Hangul Compatibility Jamo" did not exist at the time, it was created with, so all entries in the "Hangul Compatibility Jamo" block now exist as blue links (as opposed to red links). All entries in "Hangul Compatibility Jamo" now have multiple instances of when applicable.
 * Rationale for the last item above: a number of "Hangul Jamo" were already redirects to "Hangul Compatibility Jamo" before this vote started, but not all. So, I created the remaining redirects when possible. Also, almost none of the main entries in "Hangul Compatibility Jamo" had multiple charboxes, so I added the applicable charboxes in all these entries.
 * I feel the explanation above may be too compact. Anyway, feel free to ask questions. --Daniel Carrero (talk) 20:11, 15 December 2016 (UTC)