Collation
About
Collation is a set of rules that define how strings are sorted and compared. These rules are applied when performing operations like:
ORDER BY
to sort resultsWHERE name = 'John'
to compare valuesGROUP BY name
to group similar records
The rules include:
Case sensitivity (is
a
equal toA
?)Accent sensitivity (is
é
equal toe
?)Language-specific sorting (does
ä
come before or afterz
?)
Naming Convention
MySQL collation names follow this format:
character_set_name + _ + collation_name
Examples:
utf8mb4_general_ci
→ character set:utf8mb4
, collation:general_ci
latin1_swedish_ci
→ character set:latin1
, collation:swedish_ci
Common Suffixes
_ci
→ Case Insensitive (e.g.,a = A
)_cs
→ Case Sensitive_bin
→ Binary comparison (exact byte-by-byte match)
How to Check Available Collations
SHOW COLLATION;
How Collation Works Internally
MySQL converts characters into binary code points using the character set. Then, collation defines how those code points are evaluated for equality and order:
Binary collation
: Compares the raw byte values directly.General collation
: Uses simplified sorting rules.Unicode collation
: Follows the Unicode Collation Algorithm (UCA), more accurate and language-aware.
Character Set vs Collation
What it defines
Encoding and storage of text
Comparison and sorting rules
Example
utf8mb4
, latin1
, ascii
utf8mb4_unicode_ci
, latin1_swedish_ci
Storage impact
Affects byte size of characters
No impact on storage, only on string logic
Types of Collations
Suffix
Meaning
_ci
Case-insensitive: 'a' = 'A'
_cs
Case-sensitive: 'a' ≠ 'A'
_ai
Accent-insensitive: 'é' = 'e'
_as
Accent-sensitive: 'é' ≠ 'e'
_bin
Binary comparison (byte-by-byte): strictest and fastest
_0900
Unicode version 9.0 (used in MySQL 8.0 and later)
_520
Unicode version 5.2 (used for more accurate ordering)
a. Case Insensitive Collations (_ci
)
_ci
)'A' = 'a'
Common default in MySQL
Good for user-facing applications
b. Case Sensitive Collations (_cs
)
_cs
)'A' != 'a'
Used when case must be preserved (e.g., passwords)
c. Accent Sensitive Collations (_as
)
_as
)'e' != 'é'
Important for supporting multiple languages
d. Binary Collation (_bin
)
_bin
)Compares bytes directly
Strictest comparison
Fastest performance
Used for internal keys, identifiers
Common MySQL Collations and Their Meanings
Collation Name
Character Set
Meaning / Behavior
utf8mb4_general_ci
utf8mb4
Case-insensitive, fast, simplified Unicode collation. Suitable for general use.
utf8mb4_unicode_ci
utf8mb4
Case-insensitive, full Unicode-compliant. Slower but accurate multilingual sorting.
utf8mb4_unicode_520_ci
utf8mb4
Same as above, but based on Unicode 5.2 standard. Better for newer Unicode versions.
utf8mb4_bin
utf8mb4
Binary collation: case-sensitive and accent-sensitive. Fastest and strict comparison.
utf8mb4_general_cs
utf8mb4
Case-sensitive variant of general_ci
. 'a' != 'A'
.
utf8mb4_0900_ai_ci
utf8mb4
Unicode 9.0, accent-insensitive, case-insensitive. Used in MySQL 8.0+.
utf8mb4_0900_as_cs
utf8mb4
Unicode 9.0, accent-sensitive, case-sensitive.
utf8mb4_0900_bin
utf8mb4
Unicode 9.0 binary collation. Strictest match.
latin1_swedish_ci
latin1
Default for latin1
, case-insensitive.
latin1_general_ci
latin1
Case-insensitive, faster for general use in Western languages.
latin1_general_cs
latin1
Case-sensitive version of latin1_general_ci
.
ascii_general_ci
ascii
Case-insensitive ASCII collation.
ascii_bin
ascii
Binary collation for ASCII characters. Case-sensitive and fast.
utf8_general_ci
utf8
Same as utf8mb4_general_ci
but with older 3-byte character support only.
utf8_unicode_ci
utf8
Unicode-compliant collation for utf8
character set.
utf8_bin
utf8
Binary collation for 3-byte utf8
.
How to Set & Check Collation
Set Collation
1. Server Level (default)
Set in configuration file (my.cnf
or my.ini
):
[mysqld]
collation-server = utf8mb4_general_ci
2. Database Level
CREATE DATABASE mydb CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
3. Table Level
CREATE TABLE users (
name VARCHAR(100)
) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
4. Column Level
CREATE TABLE users (
name VARCHAR(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin
);
5. Query Level (for comparison)
SELECT * FROM users WHERE name COLLATE utf8mb4_general_ci = 'john';
6. In application.properties file (Spring)
spring.datasource.url=jdbc:mysql://localhost:3306/your_database_name?useUnicode=true&characterEncoding=UTF-8&useSSL=false
spring.datasource.username=your_username
spring.datasource.password=your_password
spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver
Check Current Collation
SELECT @@collation_database;
SELECT @@collation_server;
SELECT COLLATION(name) FROM users;
Changing Collation
Change collation for a column:
ALTER TABLE users MODIFY name VARCHAR(100) COLLATE utf8mb4_unicode_ci;
Practical Scenarios
a. Usernames Must Be Unique and Case-Sensitive
VARCHAR(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin
b. Searching Names Without Case Sensitivity
VARCHAR(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci
c. Sorting Multilingual Content Correctly
utf8mb4_unicode_ci` or `utf8mb4_unicode_520_ci
Last updated
Was this helpful?