Setting the Character Encoding of Incoming Data

Setting the Character Encoding of Incoming Data


You want to make sure that data flowing into your program has a consistent character encoding so you can handle it properly. For example, you want to treat all incoming submitted form data as UTF-8.


You can't guarantee that browsers will respect the instructions you give them with regard to character encoding, but there are a number of things you can do that make well-behaved browsers generally follow the rules.

First, follow the instructions in 19.11 so that your programs tell browsers that they are emitting UTF-8-encoded text. A Content-Type header with a charset is a good hint to a browser that submitted forms should be encoded using the character encoding the header specifies.

Second, include an accept-charset="utf-8" attribute in <form/> elements that you output. Although it's not supported by all web browsers, it instructs the browser to encode the user-entered data in the form as UTF-8 before sending it to the server.


In general, browsers send back form data with the same encoding that was used to generate the page containing the form. So if you standardize on UTF-8 output, you can be reasonably sure that you're always getting UTF-8 input. The accept-charset <form/> attribute is part of the HTML 4.0 specification, but is not implemented everywhere.

See Also

19.11 for information about sending UTF-8-encoded output; the accept-charset <form/> attribute is described at

 Python   SQL   Java   php   Perl 
 game development   web development   internet   *nix   graphics   hardware 
 telecommunications   C++ 
 Flash   Active Directory   Windows