User talk:Keffy/Great Pronunciation Flood

Some ConnelThoughts:


 * 100% template: the pronunciation section can easily get long, but you do not want to make people have to skip lines of obscure looking codes. (Pronunciation section preceeds the "POS" section(s).)


 * With all due respect, perhaps you should skip words that already have human pronunciation. The last thing you need is 10,000 people making arbitrary critiques and comparisons.
 * (I'll talk about this below, since this one might turn into a long discussion. Keffy 18:50, 28 February 2006 (UTC))
 * You probably shouldn't upload more than 10,000 files per day. Fewer than that, and the normal 'bot throttling should be adequate.
 * Since I just this morning transcribed the 10,000th word, uploading more than 10,000 a night doesn't look very likely. (I was worried that the acceptable number might be closer to something like 100.)
 * Have you checked the Wikipedia pronunciation pages regarding things like the cot/caught issues?
 * Said a different way; are you going to attempt to address regional pronunciations in any way? Or just punt and give the Canadian only?
 * 100% punt. All right, 99% punt.  For some words there are two current Canadian pronunciation, one US, one UK -- I'll tag those for region as I see them.  But I'm not going to pretend to transcribe a dialect I don't speak and I'm not going to crib from dictionaries that do.  The results will be quite useable for most American users.
 * Does your synthesis take any IPA input? Should we have an AU, CA, UK, US set of feeder files?
 * In theory. To synthesize anything, Festival needs a voice database (a ton of little sound clips from a live speaker) and lexicon of transcriptions dressed up in Lisp S-expressions.  I'll be using one of the American voice databases that comes with Festival and feeding it a lexicon based on my transcriptions (and probably wasting a weekend trying to trick it to say "about" like a real hoser :-)).  It would be feasible to do RP and General US the same way with the built-in voices.  Getting a decent Australian (or other) pronunciation wouldn't be possible without somebody putting in a lot of grunt work to prepare a new voice database.Keffy 18:50, 28 February 2006 (UTC)
 * This is cool stuff. Lots of people will probably hate it, but it is worlds better than a textual representation of IPA.

--Connel MacKenzie T C 08:48, 28 February 2006 (UTC)

File naming convention
I'm a little confused. You'd have "synth" come before the word? Hmmm. Maybe. I think  en-ca--synth.ogg  might be better; at least worth some consideration, anyway. --Connel MacKenzie T C 18:23, 28 February 2006 (UTC)


 * Doesn't matter one way or the other to me. I guess I was thinking of "Synthetic Canadian English" as a very small dialect with zero living speakers. Keffy 18:50, 28 February 2006 (UTC)

Skipping articles with existing entries
I'm not sure about that. Some of the existing entries are the ones that most desperately need something consistent. You know the ones: transcribed by someone who doesn't really understand IPA, doesn't bother saying which dialect they're transcribing, maybe doesn't even speak English. Or worse, the entries that go like If somebody has read the future explanation of IPA(c), understands it, and still wants to compare and critique the transcriptions, that's wonderful. As for the clueless people who will critique without bothering to read the explanation, well, that's going to happen too, and I honestly don't think that avoiding articles with existing entries will prevent it. There's no way I'll delete work that other people have already done, but I don't see their previous work as an excuse not to do it again in a way that's guaranteed to be comparable to the entries on other Wiktionary pages. Keffy 18:50, 28 February 2006 (UTC)
 * IPA: //, SAMPA: //, AHD: /ŭōcho͝oxäəʹ/


 * Nonono, you misunderstood my request. I mean to say, skip adding audio to entries that already have an .ogg file, nothing more.  --Connel MacKenzie T C 05:49, 22 March 2006 (UTC)


 * Oh. That makes even more sense.  But I admit that the logic of what I hallucinated you said has been growing on me the last few weeks.  (Or perhaps it's the good excuse to procrastinate on the hard cases that has been growing on me.) Keffy 23:26, 22 March 2006 (UTC)