Enter Unicode characters in MySQL

PHP 5 and MySQL 4.1 supports UTF-8 natively (without special compilation options) . However one must take care if the data is stored properly and can be viewed properly. In this tutorial, I am going to guide you to input unicode characters and fetch it out in MySQL 4.1+ version using PHP 5.

MySQL has support for UTF-8 but it doesn’t use it as its default character encoding. However if you have control over your server, you can configure to use UTF-8 as default. Since most of us don’t have complete control, we’ll focus on tweaks that we can do at the table level.

First of all let us create a table to hold some data. The most important think you must note here is the “CHARACTER SET utf8”. This tells MySQL that all the text in this table will be encoded in UTF-8.


CREATE TABLE table_name(  
 id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, unicodeText VARCHAR(45) NOT NULL
   )
   CHARACTER SET utf8 COLLATE utf8_general_ci;

MySQL uses term, “utf8” to mean UTF-8. The COLLATE command tells how to sort the data when using the SORT BY command. Also note that you should always use VARCHAR instead of CHAR with UTF-8 as UTF-8 uses variable sized numbers for different characters. For instance, Latin letters use 1 byte codes, while Japanesee characters are 3 bytes. Using CHAR(10) would force the database to reserve 30 bytes, because it doesn’t know ahead of time which length with be used, so it reserves the maximum.
By default MySQL assumes your input in its default encoding, you must tell MySQL that the data you are passing into is UTF-8. The command for this is:
SET NAMES ‘utf8’;

Refer to the following PHP code:


<?php 
$DB = new mysqli('localhost', 'user', 'root', 'dbname'); 
$DB->query("SET NAMES 'utf8'");
if (!empty($_POST['data'])) {
$DB->query("UPDATE table_name SET unicodeText='{$_POST['data']}' WHERE ID=1");
}
$result = $DB->query("SELECT unicodeText FROM table_nameWHERE ID=1");
$return = $result->fetch_object();
 $result->close();
?>

//don't ignore this line

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>Test</title>
</head>
<body>
<p>Posted: <?php echo $_POST['data'];?></p>
 <form enctype="multipart/form-data" method="post" action="test2.php">
 <fieldset>
<textarea name="data"><?php echo $return->unicodeText;?></textarea>
<input type="submit" />
</fieldset>
</form>
</body>
</html>

The above sample script connects to the database and to expect UTF-8 data as input. Then it updatees a row in the database with the Unicode data we send through the form. (Make sure to have some sample data in row 1 of the database when you test this code) Again it pull the data back out of the database. Note the line that i’ve commented as important, XHTML defaults to using UTF-8 encoding, unless you specifically tell it otherwise. Using the correct Doctype declaration will alert your browser to use UTF-8 encoding. The rest of the code is the form and lets you compare the raw POST data with what the database returns. By taking a little care with setting up the database, and using valid XHTML, one can properly store and serve UTF-8 code.
Hope this was helpful.

You can leave a response, or trackback from your own site.

Leave a Reply