Collation

About

Collation is a set of rules that define how strings are sorted and compared. These rules are applied when performing operations like:

  • ORDER BY to sort results

  • WHERE name = 'John' to compare values

  • GROUP BY name to group similar records

The rules include:

  • Case sensitivity (is a equal to A?)

  • Accent sensitivity (is é equal to e?)

  • Language-specific sorting (does ä come before or after z?)

Naming Convention

MySQL collation names follow this format:

character_set_name + _ + collation_name

Examples:

  • utf8mb4_general_ci → character set: utf8mb4, collation: general_ci

  • latin1_swedish_ci → character set: latin1, collation: swedish_ci

Common Suffixes

  • _ci → Case Insensitive (e.g., a = A)

  • _cs → Case Sensitive

  • _bin → Binary comparison (exact byte-by-byte match)

How to Check Available Collations

SHOW COLLATION;

How Collation Works Internally

MySQL converts characters into binary code points using the character set. Then, collation defines how those code points are evaluated for equality and order:

  • Binary collation: Compares the raw byte values directly.

  • General collation: Uses simplified sorting rules.

  • Unicode collation: Follows the Unicode Collation Algorithm (UCA), more accurate and language-aware.

Character Set vs Collation

Feature
Character Set
Collation

What it defines

Encoding and storage of text

Comparison and sorting rules

Example

utf8mb4, latin1, ascii

utf8mb4_unicode_ci, latin1_swedish_ci

Storage impact

Affects byte size of characters

No impact on storage, only on string logic

Types of Collations

Suffix

Meaning

_ci

Case-insensitive: 'a' = 'A'

_cs

Case-sensitive: 'a' ≠ 'A'

_ai

Accent-insensitive: 'é' = 'e'

_as

Accent-sensitive: 'é' ≠ 'e'

_bin

Binary comparison (byte-by-byte): strictest and fastest

_0900

Unicode version 9.0 (used in MySQL 8.0 and later)

_520

Unicode version 5.2 (used for more accurate ordering)

a. Case Insensitive Collations (_ci)

  • 'A' = 'a'

  • Common default in MySQL

  • Good for user-facing applications

b. Case Sensitive Collations (_cs)

  • 'A' != 'a'

  • Used when case must be preserved (e.g., passwords)

c. Accent Sensitive Collations (_as)

  • 'e' != 'é'

  • Important for supporting multiple languages

d. Binary Collation (_bin)

  • Compares bytes directly

  • Strictest comparison

  • Fastest performance

  • Used for internal keys, identifiers

Common MySQL Collations and Their Meanings

Collation Name

Character Set

Meaning / Behavior

utf8mb4_general_ci

utf8mb4

Case-insensitive, fast, simplified Unicode collation. Suitable for general use.

utf8mb4_unicode_ci

utf8mb4

Case-insensitive, full Unicode-compliant. Slower but accurate multilingual sorting.

utf8mb4_unicode_520_ci

utf8mb4

Same as above, but based on Unicode 5.2 standard. Better for newer Unicode versions.

utf8mb4_bin

utf8mb4

Binary collation: case-sensitive and accent-sensitive. Fastest and strict comparison.

utf8mb4_general_cs

utf8mb4

Case-sensitive variant of general_ci. 'a' != 'A'.

utf8mb4_0900_ai_ci

utf8mb4

Unicode 9.0, accent-insensitive, case-insensitive. Used in MySQL 8.0+.

utf8mb4_0900_as_cs

utf8mb4

Unicode 9.0, accent-sensitive, case-sensitive.

utf8mb4_0900_bin

utf8mb4

Unicode 9.0 binary collation. Strictest match.

latin1_swedish_ci

latin1

Default for latin1, case-insensitive.

latin1_general_ci

latin1

Case-insensitive, faster for general use in Western languages.

latin1_general_cs

latin1

Case-sensitive version of latin1_general_ci.

ascii_general_ci

ascii

Case-insensitive ASCII collation.

ascii_bin

ascii

Binary collation for ASCII characters. Case-sensitive and fast.

utf8_general_ci

utf8

Same as utf8mb4_general_ci but with older 3-byte character support only.

utf8_unicode_ci

utf8

Unicode-compliant collation for utf8 character set.

utf8_bin

utf8

Binary collation for 3-byte utf8.

How to Set & Check Collation

Set Collation

1. Server Level (default) Set in configuration file (my.cnf or my.ini):

[mysqld]
collation-server = utf8mb4_general_ci

2. Database Level

CREATE DATABASE mydb CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;

3. Table Level

CREATE TABLE users (
  name VARCHAR(100)
) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

4. Column Level

CREATE TABLE users (
  name VARCHAR(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin
);

5. Query Level (for comparison)

SELECT * FROM users WHERE name COLLATE utf8mb4_general_ci = 'john';

6. In application.properties file (Spring)

spring.datasource.url=jdbc:mysql://localhost:3306/your_database_name?useUnicode=true&characterEncoding=UTF-8&useSSL=false
spring.datasource.username=your_username
spring.datasource.password=your_password
spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver

Check Current Collation

SELECT @@collation_database;
SELECT @@collation_server;
SELECT COLLATION(name) FROM users;

Changing Collation

Change collation for a column:

ALTER TABLE users MODIFY name VARCHAR(100) COLLATE utf8mb4_unicode_ci;

Practical Scenarios

a. Usernames Must Be Unique and Case-Sensitive

VARCHAR(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin

b. Searching Names Without Case Sensitivity

VARCHAR(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci

c. Sorting Multilingual Content Correctly

utf8mb4_unicode_ci` or `utf8mb4_unicode_520_ci

Last updated

Was this helpful?