Character Set
About
A character set in Oracle defines how textual data is represented and stored. It maps each character (like A
, ñ
, or 你
) to a unique numeric code (code point) and encodes it in one or more bytes.
In Oracle, character sets affect:
How
CHAR
,VARCHAR2
,CLOB
values are stored and interpretedCompatibility with client applications
Byte size of characters (which impacts storage and length limits)
Sorting, comparison, and indexing behavior for text
Character Sets in Oracle
Oracle supports two character sets in a database:
1. Database Character Set
Applies to:
CHAR
,VARCHAR2
,LONG
, andCLOB
Chosen at the time of database creation
Cannot be changed easily after creation
Defines how most string data is stored
2. National Character Set
Applies to:
NCHAR
,NVARCHAR2
, andNCLOB
Independent of the database character set
Used to support Unicode multilingual data
Chosen from a limited set: typically
AL16UTF16
orUTF8
Common Character Sets
It’s recommended to use AL32UTF8 in modern systems because it:
Supports all known human languages
Is required for modern applications (APIs, web interfaces, mobile)
Prevents data loss during multilingual input
WE8MSWIN1252
Single-byte
Western European (Windows Latin 1)
1
AL32UTF8
Multibyte (UTF-8)
Modern Unicode encoding; recommended for new systems
1 to 4
UTF8
Obsolete UTF-8
Old Oracle UTF-8 encoding; deprecated
1 to 3
AL16UTF16
UTF-16
Used for national character set only
2 or 4
US7ASCII
Single-byte
Basic ASCII, 7-bit
1
Example with AL32UTF8
A
1 byte
é
2 bytes
क
(Devanagari)
3 bytes
😊
(Emoji)
4 bytes
Byte Semantics vs Character Semantics
Oracle allows specifying string lengths in:
Byte semantics (default): Limits are measured in bytes. Risky with multibyte character sets.
Character semantics: Limits are measured in characters, regardless of how many bytes each character consumes.
We can configure this via:
or at table column level:
How to Check Character Set
To check the character sets of your database:
Collation and Sorting
Character sets influence collation, i.e., how Oracle sorts and compares strings. For example:
A < Z
in ASCIIAccented characters may be ignored or prioritized depending on collation
Oracle 12c and above support linguistic collation and case-insensitive comparisons with NLS_SORT
, NLS_COMP
.
Best Practices
1. Use Unicode Character Set (AL32UTF8
) for New Databases
AL32UTF8
) for New DatabasesWhy:
AL32UTF8
(Oracle’s implementation of UTF-8) supports all known characters from every language.It is fully compatible with modern applications (web, APIs, mobile).
Required for features like XML, JSON, and RESTful services in Oracle.
How:
Set AL32UTF8
at the time of database creation using:
Avoid the older
UTF8
character set (deprecated).
2. Use CHAR
Semantics When Working with Multibyte Characters
CHAR
Semantics When Working with Multibyte CharactersProblem:
By default, Oracle uses byte semantics (e.g., VARCHAR2(10)
= 10 bytes). If using a multibyte character set (like AL32UTF8
), this can lead to truncation errors if characters consume more than one byte.
Solution:
Use character semantics to allocate space by character count, not byte size.
Ways to enable:
At the session level:
At the column level:
3. Ensure Client-Server Character Set Compatibility
Problem:
When client and server use different character sets, Oracle performs implicit character set conversion, which can:
Corrupt data if unsupported characters are involved
Add CPU overhead
Cause query failures or display issues
Solution:
Set the client’s
NLS_LANG
parameter correctly to match the database character set or desired behavior.Use
AL32UTF8
on both client and server when possible.
Example for Windows client:
4. Validate Character Set Compatibility Before Migration
When upgrading Oracle versions or migrating databases:
Use tools like
CSMIG
,CSSCAN
, orDMU
(Database Migration Assistant for Unicode) to detect and resolve incompatible data.Use
Data Pump
with correctCHARACTERSET
to avoid corrupt exports/imports.
Example export command:
Example import with charset conversion:
5. Avoid Mixing Character Sets in Distributed Systems
Problem:
If different databases in a distributed setup use different character sets, it can cause:
Conversion errors in DB links
Query failures
Inconsistent results
Recommendation:
Standardize all related databases to use
AL32UTF8
.Always test distributed queries across DB links for compatibility.
6. Use Unicode-Aware Data Types for Multilingual Data
For storing multilingual text (Chinese, Arabic, Japanese, etc.), prefer:
NCHAR
,NVARCHAR2
, andNCLOB
(use the national character set)VARCHAR2
with Unicode database character set (AL32UTF8
)
Avoid legacy single-byte character sets unless absolutely necessary for legacy integration.
7. Always Document Character Set Configurations
For each database and environment:
Record the values of:
NLS_CHARACTERSET
NLS_NCHAR_CHARACTERSET
NLS_LANG
(client-side)
Include this in infrastructure documentation
Helps during troubleshooting, migrations, and audits
8. Test Multilingual Applications Thoroughly
Multilingual data involves:
Data entry
Storage
Retrieval
Sorting
Collation
Display on UI
Make sure all components (DB, UI, middleware, APIs) are Unicode-compatible and tested for multilingual behavior.
9. Plan for Future Growth and Internationalization
Even if your current application is monolingual:
Choose
AL32UTF8
early to avoid future reworkSupports globalization, mobile interfaces, external integrations
Scenario
If we have a table with one CHAR
column and one NCHAR
column, each column uses a different character set, because in Oracle:
CHAR
,VARCHAR2
, andCLOB
→ use the Database Character SetNCHAR
,NVARCHAR2
, andNCLOB
→ use the National Character Set
So both AL32UTF8
and AL16UTF16
can be involved in the same table, with each applying to its respective column type.
Which Character Set Is Used?
CHAR
Database Character Set
Example: AL32UTF8
, WE8MSWIN1252
, etc.
NCHAR
National Character Set
Usually AL16UTF16
or UTF8
You can check your settings with:
Should we use AL16UTF16 or AL32UTF8?
Here’s what Oracle recommends:
NLS_CHARACTERSET
AL32UTF8
Unicode for database-wide multilingual support
NLS_NCHAR_CHARACTERSET
AL16UTF16
Default for national character set; optimized for NCHAR
So:
We should use
AL32UTF8
for database character set (applies toCHAR
,VARCHAR2
)We should use
AL16UTF16
for national character set (applies toNCHAR
,NVARCHAR2
)
These two are designed to work together efficiently and safely.
Let’s say:
The
name
column usesAL32UTF8
encoding (1–4 bytes per character)The
description
column usesAL16UTF16
encoding (2 or 4 bytes per character)
Oracle will store and manage both encodings internally, and automatically handle them based on the column type.
Last updated
Was this helpful?