May 6, 2011, 9:43 p.m.
posted by stackme
Setting the Character Encoding of Incoming Data
You want to make sure that data flowing into your program has a consistent character encoding so you can handle it properly. For example, you want to treat all incoming submitted form data as UTF-8.
You can't guarantee that browsers will respect the instructions you give them with regard to character encoding, but there are a number of things you can do that make well-behaved browsers generally follow the rules.
First, follow the instructions in 19.11 so that your programs tell browsers that they are emitting UTF-8-encoded text. A Content-Type header with a charset is a good hint to a browser that submitted forms should be encoded using the character encoding the header specifies.
Second, include an accept-charset="utf-8" attribute in <form/> elements that you output. Although it's not supported by all web browsers, it instructs the browser to encode the user-entered data in the form as UTF-8 before sending it to the server.
In general, browsers send back form data with the same encoding that was used to generate the page containing the form. So if you standardize on UTF-8 output, you can be reasonably sure that you're always getting UTF-8 input. The accept-charset <form/> attribute is part of the HTML 4.0 specification, but is not implemented everywhere.
19.11 for information about sending UTF-8-encoded output; the accept-charset <form/> attribute is described at http://www.w3.org/TR/REC-html40/interact/forms.html#adef-accept-charset.