Collation

About

Collation is a set of rules that define how strings are sorted and compared. These rules are applied when performing operations like:

ORDER BY to sort results
WHERE name = 'John' to compare values
GROUP BY name to group similar records

The rules include:

Case sensitivity (is a equal to A?)
Accent sensitivity (is é equal to e?)
Language-specific sorting (does ä come before or after z?)

Naming Convention

MySQL collation names follow this format:

character_set_name + _ + collation_name

Examples:

utf8mb4_general_ci → character set: utf8mb4, collation: general_ci
latin1_swedish_ci → character set: latin1, collation: swedish_ci

Common Suffixes

_ci → Case Insensitive (e.g., a = A)
_cs → Case Sensitive
_bin → Binary comparison (exact byte-by-byte match)

How to Check Available Collations

SHOW COLLATION;

13KB

show_collation.csv

How Collation Works Internally

MySQL converts characters into binary code points using the character set. Then, collation defines how those code points are evaluated for equality and order:

Binary collation: Compares the raw byte values directly.
General collation: Uses simplified sorting rules.
Unicode collation: Follows the Unicode Collation Algorithm (UCA), more accurate and language-aware.

Character Set vs Collation

Feature

Character Set

Collation

What it defines

Encoding and storage of text

Comparison and sorting rules

Example

utf8mb4, latin1, ascii

utf8mb4_unicode_ci, latin1_swedish_ci

Storage impact

Affects byte size of characters

No impact on storage, only on string logic

Types of Collations

Suffix

Meaning

_ci

Case-insensitive: 'a' = 'A'

_cs

Case-sensitive: 'a' ≠ 'A'

_ai

Accent-insensitive: 'é' = 'e'

_as

Accent-sensitive: 'é' ≠ 'e'

_bin

Binary comparison (byte-by-byte): strictest and fastest

_0900

Unicode version 9.0 (used in MySQL 8.0 and later)

_520

Unicode version 5.2 (used for more accurate ordering)

a. Case Insensitive Collations (`_ci`)

'A' = 'a'
Common default in MySQL
Good for user-facing applications

b. Case Sensitive Collations (`_cs`)

'A' != 'a'
Used when case must be preserved (e.g., passwords)

c. Accent Sensitive Collations (`_as`)

'e' != 'é'
Important for supporting multiple languages

d. Binary Collation (`_bin`)

Compares bytes directly
Strictest comparison
Fastest performance
Used for internal keys, identifiers

Common MySQL Collations and Their Meanings

Collation Name

Character Set

Meaning / Behavior

utf8mb4_general_ci

utf8mb4

Case-insensitive, fast, simplified Unicode collation. Suitable for general use.

utf8mb4_unicode_ci

utf8mb4

Case-insensitive, full Unicode-compliant. Slower but accurate multilingual sorting.

utf8mb4_unicode_520_ci

utf8mb4

Same as above, but based on Unicode 5.2 standard. Better for newer Unicode versions.

utf8mb4_bin

utf8mb4

Binary collation: case-sensitive and accent-sensitive. Fastest and strict comparison.

utf8mb4_general_cs

utf8mb4

Case-sensitive variant of general_ci. 'a' != 'A'.

utf8mb4_0900_ai_ci

utf8mb4

Unicode 9.0, accent-insensitive, case-insensitive. Used in MySQL 8.0+.

utf8mb4_0900_as_cs

utf8mb4

Unicode 9.0, accent-sensitive, case-sensitive.

utf8mb4_0900_bin

utf8mb4

Unicode 9.0 binary collation. Strictest match.

latin1_swedish_ci

latin1

Default for latin1, case-insensitive.

latin1_general_ci

latin1

Case-insensitive, faster for general use in Western languages.

latin1_general_cs

latin1

Case-sensitive version of latin1_general_ci.

ascii_general_ci

ascii

Case-insensitive ASCII collation.

ascii_bin

ascii

Binary collation for ASCII characters. Case-sensitive and fast.

utf8_general_ci

utf8

Same as utf8mb4_general_ci but with older 3-byte character support only.

utf8_unicode_ci

utf8

Unicode-compliant collation for utf8 character set.

utf8_bin

utf8

Binary collation for 3-byte utf8.

How to Set & Check Collation

Set Collation

1. Server Level (default) Set in configuration file (my.cnf or my.ini):

[mysqld]
collation-server = utf8mb4_general_ci

2. Database Level

CREATE DATABASE mydb CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;

3. Table Level

CREATE TABLE users (
  name VARCHAR(100)
) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

4. Column Level

CREATE TABLE users (
  name VARCHAR(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin
);

5. Query Level (for comparison)

SELECT * FROM users WHERE name COLLATE utf8mb4_general_ci = 'john';

6. In application.properties file (Spring)

spring.datasource.url=jdbc:mysql://localhost:3306/your_database_name?useUnicode=true&characterEncoding=UTF-8&useSSL=false
spring.datasource.username=your_username
spring.datasource.password=your_password
spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver

Check Current Collation

SELECT @@collation_database;
SELECT @@collation_server;
SELECT COLLATION(name) FROM users;

Changing Collation

Change collation for a column:

ALTER TABLE users MODIFY name VARCHAR(100) COLLATE utf8mb4_unicode_ci;

Practical Scenarios

a. Usernames Must Be Unique and Case-Sensitive

VARCHAR(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin

b. Searching Names Without Case Sensitivity

VARCHAR(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci

c. Sorting Multilingual Content Correctly

utf8mb4_unicode_ci` or `utf8mb4_unicode_520_ci

PreviousMYSQL Specific NextCharacter Set

Last updated 7 hours ago

Was this helpful?

About

How Collation Works Internally

Character Set vs Collation

Types of Collations

a. Case Insensitive Collations (_ci)

b. Case Sensitive Collations (_cs)

c. Accent Sensitive Collations (_as)

d. Binary Collation (_bin)

Common MySQL Collations and Their Meanings

How to Set & Check Collation

Set Collation

Check Current Collation

Changing Collation

Practical Scenarios

a. Case Insensitive Collations (`_ci`)

b. Case Sensitive Collations (`_cs`)

c. Accent Sensitive Collations (`_as`)

d. Binary Collation (`_bin`)