Talk:text file

text file
The definition is unclear as to whether files in HTML, RTF, and the like formats are considered "text files" (it says only the one extreme, sans formatting, is so considered, and the other is not), so needs clarification.—msh210 ℠  20:09, 19 February 2009 (UTC)


 * I'd say the term itself is unclear. Even if you stick to the fairly objective criterion of the MIME media type (see http://www.iana.org/assignments/media-types/; it's what goes in HTTP's "Content-Type" headers and so on), there's some vacillation. HTML is text/html; unspecified XML is both text/xml and application/xml; and XHTML is application/xhtml+xml. (IIRC, the relevant spec says that the difference between text/&hellip; and application/&hellip; is that if a user agent doesn't recognize a text/&hellip; type, it can treat it as text/plain — i.e., it's something expected to be vaguely useful to a human. With this in mind, you can take the change in MIME type from HTML to XHTML either as a change in expected userbase — time was, it was expected that most Netizens could look at HTML and recognize angle brackets as containing "stuff I don't care about" — or as a change in expected context — time was, it was expected that most HTML files were mostly text documents with a bit of markup, which is certainly not the case today. I'm sure there's a document somewhere giving the rationale for this change, but I haven't looked for it.)


 * HTML got busiest when table-based layout was all the rage, but many sites' code is much simpler again, now that CSS is widely used. Anyway, HTML is text in the sense of “text-encoded data stream”, if not necessarily in the sense of “natural-language writing.” —Michael Z. 2010-05-25 13:28 z 


 * The Google hits for might interest you; you can see some of the different ways people interpret it. Some seem to take it to mean "an ASCII-coded plain-text file, with no special markup and no non-ASCII characters" (one of the extremes you mention); others seem to take it to mean "a file with a specific text-minded character encoding, that you can open in a text editor (assuming it supports the encoding) and do useful things with"; and at least one seems to take it to mean something like "a file with a .txt extension" (regardless of what's actually in the file), which is both more extreme and less extreme than the extreme you mention.
 * —Ruakh <i >TALK</i > 20:33, 19 February 2009 (UTC)


 * This term encompasses at least two different, but perhaps overlapping, attributes of a file: its encoding and its content.


 * Technically, the encoding of a file is, broadly, either text or binary. Text encoding denotes a range of types too, whether it be ASCII, ISO-Latin or another code page, Unicode, etc.  Most text files are 8-bit bytestreams, but some kinds of Unicode text files, for example, are not.  In this sense, all HTML and XML files, all UNIX mbox mailboxes, all tab-delimited data tables, all or most RTF formatted text documents, etc., are text files.


 * But the content of a file may be text only, or text plus markup, or text and images. In this sense, a file which contains only text is more specifically called a text-only or plain text file. —Michael Z. 2009-02-21 21:12 z 

By the way, isn't this just sum-of-parts: text (4) + file (2)?
 * I think so, since it has no clear definition.—msh210 ℠  22:39, 24 February 2009 (UTC)


 * Not SOP per WT:COALMINE, since textfile: exists. I have tried to clear up the entry. Equinox ◑ 18:39, 24 May 2010 (UTC)


 * Well, it would meet CFI if textfile were verified, but that doesn't make it non-SOP. (textfile and textfiles appear only once each in COCA, so it seems somewhat rare. Perhaps the closed compound is mainly used attributively, and is not exactly equivalent to text file.) —Michael Z. 2010-05-25 13:28 z 


 * I'd say it's text (1/4) + file (2). Sum-of-parts, because the distinction in sense of text may be ambiguous to both speaker and reader, but includable due to wt: COALMINE. —Michael Z. 2010-05-25 14:27 z 

Good now? &#x200b;—msh210℠ (talk) 20:20, 14 December 2010 (UTC)

RFC discussion: August 2011
The second of the senses seems to be a bit encyclopedic, overly detailed and narrow, and, to the extent it is not, to duplicate the first sense. DCDuring TALK 18:45, 5 August 2011 (UTC)
 * See [[talk:text file]]. The distinction between the two senses is roughly that 2 is all files except binary files, so including HTML, CSS, Javascript, CSV, RTF, and many other files excluded by 1, which is just plain text meant to be read by humans and not machines. (This comment is meant to address your last concern, duplication, only.) &#x200b;—msh210℠ (talk) 19:04, 5 August 2011 (UTC)