The Programmer's Guide
  • About
  • Algorithm
    • Big O Notation
      • Tree
      • Problems
    • Basic Notes
    • Data Structure Implementation
      • Custom LinkedList
      • Custom Stack
      • Custom Queue
      • Custom Tree
        • Binary Tree Implementation
        • Binary Search Tree Implementation
        • Min Heap Implementation
        • Max Heap Implementation
        • Trie Implementation
      • Custom Graph
        • Adjacency List
        • Adjacency Matrix
        • Edge List
        • Bidirectional Search
    • Mathematical Algorithms
      • Problems - Set 1
      • Problems - Set 2
    • Bit Manipulation
      • Representation
      • Truth Tables
      • Number System
        • Java Program
      • Problems - Set 1
    • Searching
    • Sorting
    • Array Algorithms
    • String Algorithms
    • Tree
      • Tree Traversal Techniques
      • Tree Implementation
      • Applications of Trees
      • Problems - Set 1
    • Graph
      • Graph Traversal Techniques
      • Shortest Path Algorithms
      • Minimum Spanning Tree (MST) Algorithms
    • Dynamic Programming
      • Problems - Set 1
    • Recursion
    • Parallel Programming
    • Miscellaneous
      • Problems - Set 1
  • API
    • API Basics
      • What is an API?
      • Types of API
        • Comparison - TBU
      • Synchronous vs Asynchronous API
    • API Architecture
      • Synchronous & Asynchronous Communication
    • API Specification
  • Cloud Computing
    • Cloud Fundamentals
      • Cloud Terminology
      • Core Terminology
      • Cloud Models
      • Cloud Service Models
      • Benefits, Challenges and Risk of Cloud Computing
      • Cloud Ecosystem
  • Database
    • DBMS
      • Types of DBMS
        • Relational DBMS (RDBMS)
        • NoSQL DBMS
        • Object-Oriented DBMS (OODBMS)
        • Columnar DBMS
        • In-Memory DBMS
        • Distributed DBMS
        • Cloud-Based DBMS
        • Hierarchical DBMS
      • DBMS Architecture
      • DBMS Structure
    • SQL Databases
      • Terminology
      • RDBMS Concepts
        • Entity Relationship Diagram (ERD)
          • ERD Examples
        • Normalization
        • Denormalization
        • ACID & BASE Properties
          • ACID Properties
          • BASE Properties
        • Locking and Unlocking
      • SQL Fundamentals
        • SQL Commands
          • DDL (Data Definition Language)
          • DML (Data Manipulation Language)
          • DCL (Data Control Language)
          • TCL (Transaction Control Language)
          • DQL (Data Query Language)
        • SQL Operators
          • INTERSECT
          • EXCEPT
          • MINUS
          • IN and NOT IN
          • EXISTS and NOT EXISTS
        • SQL Clauses
          • Joins
          • OVER
          • WITH
          • CONNECT BY
          • MODEL
          • FETCH FIRST
          • KEEP
          • OFFSET with FETCH
        • SQL Functions
          • Oracle Specific
        • SQL Data Types
          • Numeric Types
          • Character Types
          • Date & Time Types
          • Large Object Types
        • Others
          • Indexing
      • Vendor Specific Concepts
        • Oracle Specific
          • Data Types
          • Character Set
          • Rownum, Rowid, Urowid
          • Order of Execution of the query
          • Keys
          • Tablespace
          • Partition
      • Best Practice
      • Resources & References
        • O’Reilly SQL Cookbook (2nd Edition)
          • 1. Retrieving Records
          • 2. Sorting Query Results
          • 3. Working with Multiple Tables
          • 4. Inserting, Updating, and Deleting
          • 5. Metadata Queries
          • 6. Working with Strings
          • 7. Working with Numbers
          • 8. Date Arithmetic
          • 9. Date Manipulation
          • 10. Working with Ranges
          • 11. Advanced Searching
          • 12. Reporting and Reshaping
          • 13. Hierarchical Queries
          • 14. Odds 'n' Ends
    • SQL vs NoSQL
    • Best Practices
  • Git
    • Commands
      • Setup and Configuration Commands
      • Getting and Creating Projects
      • Tracking Changes
      • Branching and Merging
      • Sharing and Updating Projects
      • Inspection and Comparison
      • Debugging
      • Patching
      • Stashing and Cleaning
      • Advanced Manipulations
    • Workflows
      • Branching Strategies
        • Git Flow
        • Trunk-Based Development
        • GitHub Flow
        • Comparison
      • Merge Strategies
        • Merge
        • Rebase
        • Squash
        • Fast-forward vs No-fast-forward
        • MR vs PR
      • Conflict Resolution
        • Handling Merge Conflicts
        • Merge Conflicts
        • Rebase Conflicts
        • Divergent Branches After git pull
        • Force Push
      • Patch & Recovery
        • Cherry-pick strategies
        • Revert vs Reset
        • Recover from a bad rebase
      • Rebasing Practices
        • Merge vs Rebase
        • Rebase develop branch on main branch
      • Repository Management
        • Working Directory
        • Mirror a repository
        • Convert a local folder to a Git repo
        • Backup and restore a Git repository
  • Java
    • Java Installation
    • Java Distributions
    • Java Platform Editions
      • Java SE
      • Java EE
      • Jakarta EE
      • Java ME
      • JavaFX
    • Java Overview
      • OOP Principles
        • Encapsulation
        • Inheritance
        • Polymorphism
        • Abstraction
          • Abstract Class & Method
          • Interface
            • Functional Interfaces
            • Marker Interfaces
          • Abstract Class vs Interface
      • OOP Basics
        • What is a Class?
          • Types of Classes
        • What is an Object?
          • Equals and HashCode
            • FAQ
          • Shallow Copy and Deep Copy
          • Ways to Create Object
          • Serialization & Deserialization
        • Methods & Fields
          • Method Overriding & Overloading
          • Method Signature & Header
          • Variables
        • Constructors
        • Access Modifiers
      • Parallelism & Concurrency
        • Ways to Identify Thread Concurrency or Parallelism
        • Thread Basics
          • Thread vs Process
          • Creating Threads
          • Thread Context Switching
          • Thread Lifecycle & States
          • Runnable & Callable
          • Types of Threads
          • Thread Priority
        • Thread Management & Synchronisation
          • Thread Resource Sharing
          • Thread Synchronization
            • Why is Synchronization Needed?
            • Synchronized Blocks & Methods
          • Thread Lock
            • Types of Locks
            • Intrinsic Lock (Monitor Lock)
            • Reentrant Lock
          • Semaphore
          • Thread Starvation
          • Thread Contention
          • Thread Deadlock
          • Best Practices for Avoiding Thread Issues
      • Keywords
        • this
        • super
        • Access Modifiers
      • Data Types
        • Default Values
        • Primitive Types
          • byte
          • short
          • int
          • long
          • float
          • double
          • char
          • boolean
        • Non-Primitive (Reference) Types
          • String
            • StringBuilder
            • StringBuffer
              • Problems
            • Multiline String
            • Comparison - String, StringBuilder & StringBuffer
          • Array
          • Collections
            • List
              • Array vs List
              • ArrayList
              • Vector
                • Stack
                  • Problems
              • LinkedList
            • Queue
              • PriorityQueue
              • Deque (Double-Ended Queue)
                • ArrayDeque
                • ConcurrentLinkedDeque - TBU
                • LinkedBlockingDeque - TBU
            • Map
              • HashMap
              • Hashtable
              • LinkedHashMap
              • ConcurrentHashMap
              • TreeMap
              • EnumMap
              • WeakHashMap
            • Set
              • HashSet
              • LinkedHashSet
              • TreeSet
              • EnumSet
              • ConcurrentSkipListSet
              • CopyOnWriteArraySet
        • Specialized Classes
          • BigInteger
          • BigDecimal
            • Examples
          • BitSet
          • Date and Time
            • Examples
          • Optional
          • Math
          • UUID
          • Scanner
          • Formatter
            • Examples
          • Properties
          • Regex (Pattern and Matcher)
            • Examples
          • Atomic Classes
          • Random
          • Format
            • NumberFormat
            • DateFormat
            • DecimalFormat
        • Others
          • Object
          • Enum
            • Pre-Defined Enum
            • Custom Enum
            • EnumSet and EnumMap
          • Record
          • Optional
          • System
          • Runtime
          • ProcessBuilder
          • Class
          • Void
          • Throwable
            • Error
            • Exception
              • Custom Exception Handling
              • Best Practice
            • Error vs Exception
            • StackTraceElement
    • Java Features by Version
      • How New Java Features are Released ?
      • Java Versions
        • Java 8
        • Java 9
        • Scoped Values
        • Unnamed Variables & Patterns
      • FAQ
    • Concepts
      • Set 1
        • Streams
          • flatmap
          • Collectors Utility Class
          • Problems
        • Functional Interfaces
          • Standard Built-In Interfaces
          • Custom Interfaces
        • Annotation
          • Custom Annotation
          • Meta Annotation
        • Generics
          • Covariance and Invariance
        • Asynchronous Computation
          • Future
          • CompletableFuture
          • Future v/s CompletableFuture
          • ExecutorService
            • Thread Pool
            • Types of Work Queues
            • Rejection Policies
            • ExecutorService Implementations
            • ExecutorService Usage
          • Locks, Atomic Variables, CountDownLatch, CyclicBarrier - TBU
          • Parallel Streams, Fork/Join Framework,Stream API with Parallelism - TBU
      • Set 2
        • Standards
          • ISO Standards
          • JSR
            • JSR 303, 349, 380 (Bean Validation)
        • Operator Precedence
      • Set 3
        • Date Time Formatter
        • Validation
      • Set 4
        • Input from User
        • Comparison & Ordering
          • Object Equality Check
          • Comparable and Comparator
            • Comparator Interface
          • Sorting of Objects
          • Insertion Ordering
    • Packages
      • Core Packages
        • java.lang
          • java.lang.System
          • java.lang.Thread
      • Jakarta Packages
        • jakarta.validation
        • javax.validation
      • Third-party Packages
    • Code Troubleshoot
      • Thread Dump
      • Heap Dump
    • Code Quality & Analysis
      • ArchUnit
      • Terminologies
        • Cyclic dependencies
    • Code Style
      • Naming Convention
      • Package Structure
      • Formatting
      • Comments and Documentation
      • Imports
      • Exception Handling
      • Class Structure
      • Method Guidelines
      • Page 1
      • Code Smells to Avoid
      • Lambdas and Streams Style
      • Tools
    • Tools
      • IntelliJ IDEA
        • Shortcuts for MAC
      • Apache JMeter
        • Examples
      • Thread Dump Capture
        • jstack
        • VisualVM - TBU
        • jcmd - TBU
        • JConsole - TBU
        • YourKit Java Profiler - TBU
        • Eclipse MAT - TBU
        • IntelliJ IDEA Profiler - TBU
        • AppDynamics - TBU
        • Dynatrace - TBU
        • Thread Dump Analyzers - TBU
      • Heap Dump Capture
        • jmap
        • VisualVM - TBU
        • jcmd - TBU
        • Eclipse MAT (Memory Analyzer Tool) - TBU
        • IntelliJ IDEA Profiler - TBU
        • YourKit Java Profiler - TBU
        • AppDynamics - TBU
        • Dynatrace - TBU
        • Kill -3 Command - TBU
        • jhat (Java Heap Analysis Tool) - TBU
        • JVM Options - TBU
      • Wireshark
        • Search Filters
    • Best Practices
      • Artifact and BOM Versioning
  • Maven
    • Installation
    • Local Repository & Configuration
    • Command-line Options
    • Build & Lifecycle
    • Dependency Management
      • Dependency
        • Transitive Dependency
        • Optional Dependency
      • Dependency Scope
        • Maven Lifecycle and Dependency Scope
      • Dependency Exclusions & Overrides
      • Bill of Materials (BOM)
      • Dependency Conflict Resolution
      • Dependency Tree & Analysis
      • Dependency Versioning Strategies
    • Plugins
      • Build Lifecycle Management
      • Dependency Management
      • Code Quality and Analysis
      • Documentation Generation
      • Code Generation
      • Packaging and Deployment
      • Reporting
      • Integration and Testing
      • Customization and Enhancement
        • build-helper-maven-plugin
        • properties-maven-plugin
        • ant-run plugin
        • exec-maven-plugin
        • gmavenplus-plugin
      • Performance Optimization
    • FAQs
      • Fixing Maven SSL Issues: Unable to Find Valid Certification Path
  • Spring
    • Spring Basics
      • What is Spring?
      • Why Use Spring
      • Spring Ecosystem
      • Versioning
      • Setting Up a Spring Project
    • Core Concepts
      • Spring Core
        • Dependency Injection (DI)
        • Stereotype Annotation
      • Spring Beans
        • Bean Lifecycle
        • Bean Scope
          • Singleton Bean
        • Lazy & Eager Initialization
          • Use Case of Lazy Initialization
        • BeanFactory
        • ApplicationContext
      • Spring Annotations
        • Spring Boot Specific
        • Controller Layer (Web & REST Controllers)
    • Spring Features
      • Auto Configuration
        • Spring Boot 2: spring.factories
        • Spring Boot 3: spring.factories
      • Spring Caching
        • In-Memory Caching
      • Spring AOP
        • Before Advice
        • After Returning Advice
        • After Throwing Advice
        • After (finally) Advice
        • Around Advice
      • Spring File Handling
      • Reactive Programming
        • Reactive System
        • Reactive Stream Specification
        • Project Reactor
          • Mono & Flux
      • Asynchronous Computation
        • @Async annotation
      • Spring Security
        • Authentication
          • Core Components
            • Security Filter Chain
              • HttpSecurity
              • Example
            • AuthenticationManager
            • AuthenticationProvider
            • UserDetailsService
              • UserDetails
              • PasswordEncoder
            • SecurityContext
            • SecurityContextHolder
            • GrantedAuthority
            • Security Configuration (Spring Security DSL)
          • Authentication Models
            • One-Way Authentication
            • Mutual Authentication
          • Authentication Mechanism
            • Basic Authentication
            • Form-Based Authentication
            • Token-Based Authentication (JWT)
            • OAuth2 Authentication
            • Multi-Factor Authentication (MFA)
            • SAML Authentication
            • X.509 Certificate Authentication
            • API Key Authentication
            • Remember-Me Authentication
            • Custom Authentication
          • Logout Handling
        • Authorization
        • Security Filters and Interceptors
        • CSRF
          • Real-World CSRF Attacks & Prevention
        • CORS
        • Session Management and Security
        • Best Practices
      • Spring Persistence
        • JDBC
          • JDBC Components
          • JDBC Template
          • Transaction Management
          • Best Practices in JDBC Usage
          • Datasource
            • Connection Pooling
              • HikariCP
            • Caching
        • JPA (Java Persistence API)
          • JPA Fundamentals
          • ORM Mapping Annotations
            • 1. Entity and Table Mappings
            • 2. Field/Column Mappings
            • 3. Relationship Mappings
            • 4. Inheritance Mappings
            • 5. Additional Configuration Annotations
          • Querying Data
            • JPQL
            • Criteria API
            • JPA Specification
              • Example - Employee Portal
            • Native SQL Queries
            • Named Queries
            • Query Return Types
            • Pagination & Sorting
              • Example - Employee Portal
            • Projection
          • Fetch Strategies in JPA
        • JPA Implementation
          • Hibernate
            • Properties
            • Example
        • Spring Data JPA
          • Repository Abstractions
          • Entity-to-Table Mapping
          • Derived Query Methods
        • Cross-Cutting Concerns
          • Transactions
          • Caching
          • Concurrency
        • Examples
          • Employee Portal
            • API
    • Distributed Systems & Communication
      • Distributed Scheduling
      • Inter-Service Communication
        • 1. RestTemplate
        • 2. WebClient
        • 3. OpenFeign
        • Retry Mechanism
          • @Retryable annotation
            • Example
    • Security & Data Protection
      • Encoding | Decoding
        • Types
          • Base Encoding
            • Base16 - TBD
              • Encoding and Decoding in Java - TBD
            • Base32
              • Encoding and Decoding in Java
            • Base64 -TBD
              • Encoding and Decoding in Java - TBD
          • Text Encoding - TBD
            • Extended ASCII
              • Encoding and Decoding in Java - TBD
                • ISO-8859-1
                • Windows-1252 - TBD
                • IBM Code Pages - TBD
            • ASCII
              • Encoding and Decoding in Java
        • Java Guidelines
          • Text Encoding Decoding Examples
          • Base Encoding Decoding Examples
          • Best Practices and Concepts
          • Libraries
      • Cryptography
        • Terminology
        • Java Cryptography Architecture (JCA)
        • Key Management
          • Key Generation
            • Tools and Libraries
              • OpenSSL
              • Java Keytool
                • Concept
                • Use Cases
            • Key & Certificate File Formats
          • Key Distribution
          • Key Storage
          • Key Rotation
          • Key Revocation
        • Encryption & Decryption
          • Symmetric Encryption
            • Algorithm
            • Modes of Operation
            • Examples
          • Asymmetric Encryption
            • Algorithm
            • Mode of Operation
            • Examples
    • Utilities & Libraries
      • Apache Libraries
        • Apache Camel
          • Camel Architecture
            • Camel Context
            • Camel Endpoints
            • Camel Components
            • Camel Exchange & MEP
          • Spring Dependency
          • Different Components
            • Camel SFTP
        • Apache Commons Lang
      • MapStruct Mapper
      • Utilities by Spring framework
        • FileCopyUtils
    • General Concepts
      • Spring Boot Artifact Packaging
      • Classpath and Resource Loading
      • Configuration - Mapping Properties to Java Class
      • Validations in Spring Framework
        • Jakarta Validation
          • Jakarta Bean Validation Annotations
    • Practical Guidelines
      • Spring Configuration
      • Spring Code Design
  • Software Testing
    • Software Testing Methodologies
      • Functional Testing
      • Non Functional Testing
    • Software Testing Life Cycle (STLC)
    • Integration Test
      • Dynamic Property Registration
    • Java Test Framework
      • JUnit
        • JUnit 4
          • Examples
        • JUnit 5
          • Examples
        • JUnit 4 vs JUnit 5
  • System Design
    • Foundations
      • Programming Paradigms
      • Object-Oriented Design
        • SOLID Principles
        • GRASP Principles
        • Composition
        • Aggregation
        • Association
      • Design Pattern
        • Creational Pattern
        • Structural Pattern
        • Behavioral Pattern
        • Examples
          • Data Collector
          • Payment Processor
        • Design Enhancements
          • Fluent API Design
            • Examples
    • Architectural Building Blocks
      • CAP Theorem
      • Load Balancer
        • Load Balancer Architecture
        • Load Balancing in Java Microservices
          • Client-Side Load Balancing Example
          • Server-Side Load Balancing Example
        • Load Balancer Monitoring Tool
      • Scaling
        • Vertical Scaling (Scaling Up)
        • Horizontal Scaling (Scaling Out)
        • Auto-Scaling
        • Database Scaling via Sharding
      • Caching
        • Pod-Level vs Distributed Caching
      • Networking Metrics
        • Types of Delay
        • Scenario
      • System Characteristics
      • Workload Types
      • Resilience & Failure Handling
    • Performance
      • Why Is My API Sometimes Slow ?
    • Security
      • Security by Design
      • Zero Trust Security Model
      • Zero Trust Architecture
      • Principles
        • CIA
        • Least Privilege Principle
        • Defense in Depth
      • Security Threats & Mitigations
        • OWASP
          • Top 10 Security Threats
          • Application Security Verification Standard
          • Software Assurance Maturity Model
          • Dependency Check
          • CSRFGuard
          • Cheat Sheets
          • Security Testing Guide
          • Threat Dragon
        • Threat Modeling
      • Compliance & Regulation
        • PCI DSS
    • Deployment Patterns
    • Diagrams
      • UML Diagrams
        • PlantUML
          • Class Diagram
          • Object Diagram
          • Sequence Diagram
          • Use Case Diagram
          • Activity Diagram
          • State Diagram
          • Architecture Diagram
          • Component Diagram
          • Timing Diagram
          • ER Diagram (Entity-Relationship)
          • Network Diagram
    • Common Terminologies
    • Problems
      • Reference Materials
      • Cache Design
  • Interview Guide
    • Non-Technical
      • Behavioural or Introductory Guide
      • Project Specific
    • Technical
      • Java Interview Companion
        • Java Key Concepts
          • Set 1
          • Set 2
        • Java Code Snippets
        • Java Practice Programs
          • Set 3 - Strings
          • Set 4 - Search
          • Set 5 - Streams and Collection
      • SQL Interview Companion
        • SQL Practice Problems
          • Set 1
      • Spring Interview Companion
        • Spring Key Concepts
          • Set 1 - General
          • Set 2 - Core Spring
        • Spring Code Snippets
          • JPA
      • Application Server
      • Maven
      • Containerized Application
      • Microservices
    • General
      • Applicant Tracking System (ATS)
      • Flowchart - How to Solve Coding Problem?
Powered by GitBook
On this page
  • About Regex
  • Terminology
  • 1. Literals
  • 2. Meta-characters
  • 3. Quantifiers
  • 4. Groups
  • 5. Flags
  • 6. Anchors
  • 7. Escaping
  • 8. Assertions
  • 9. Greedy, Reluctant, and Possessive Quantifiers
  • Pattern
  • About Pattern
  • Advantages:
  • Features
  • Supported Methods in Pattern
  • Some Regex Symbols
  • Matcher
  • About Matcher
  • Features
  • Supported Methods in Matcher
  • Named Capturing Groups
  • Atomic Groups
  • How Pattern and Matcher Work Together ?
  • Relationship Between Pattern and Matcher
  • Workflow
  • Performance Optimization Techniques
  • 1. Compile the Pattern Once
  • 2. Use Lazy Quantifiers When Appropriate
  • 3. Avoid Catastrophic Backtracking
  • 4. Use Predefined Character Classes
  • 5. Limit the Region for Matching
  • 6. Use Anchors for Efficiency
  • 7. Optimize Replacement Operations
  • 8. Profile and Benchmark Regex
  • 9. Avoid Using Regex When Simpler Solutions Exist

Was this helpful?

  1. Java
  2. Java Overview
  3. Data Types
  4. Specialized Classes

Regex (Pattern and Matcher)

About Regex

Regex (Regular Expression) is a sequence of characters that forms a search pattern. It is widely used for:

  • Validating inputs (e.g., email, phone numbers).

  • Searching and extracting text from larger strings.

  • Replacing patterns in text.

  • Splitting strings.

Terminology

1. Literals

Literals in regex are characters that match themselves exactly. They are the simplest building blocks of a regex pattern.

  • Example:

    • Pattern: abc

    • Matches: The string "abc" exactly, no variations.

    • Does not match: "ab" or "abcd".

  • Use Case: Used when you want to match static text exactly as it appears.

2. Meta-characters

Meta-characters are special characters in regex that have a unique meaning or functionality. They are used to define patterns beyond literal characters.

Meta-character

Meaning

Example

.

Matches any single character (except newline).

Pattern: a.c → Matches: "abc", "a3c".

^

Matches the beginning of a string.

Pattern: ^abc → Matches: "abc" at the start of the string.

$

Matches the end of a string.

Pattern: abc$ → Matches: "abc" at the end of the string.

[]

Denotes a character set.

Pattern: [a-z] → Matches any lowercase letter.

\

Escapes meta-characters to treat them as literals.

Pattern: \. → Matches a literal dot (".").

3. Quantifiers

Quantifiers define the number of occurrences of a character or group that must match for a pattern to be valid.

Quantifier

Meaning

Example

*

Matches 0 or more occurrences.

Pattern: ab* → Matches: "a", "ab", "abb", "abbb".

+

Matches 1 or more occurrences.

Pattern: ab+ → Matches: "ab", "abb", "abbb".

?

Matches 0 or 1 occurrence.

Pattern: ab? → Matches: "a", "ab".

{n}

Matches exactly n occurrences.

Pattern: a{2} → Matches: "aa".

{n,}

Matches at least n occurrences.

Pattern: a{2,} → Matches: "aa", "aaa", "aaaa".

{n,m}

Matches between n and m occurrences.

Pattern: a{2,4} → Matches: "aa", "aaa", "aaaa".

4. Groups

Groups are portions of a regex enclosed in parentheses () that allow:

  • Capturing and extracting parts of a match.

  • Applying quantifiers to an entire group.

Types of Groups:

  1. Capturing Groups:

    • Regular parentheses ( ) are used to capture matched sub-patterns.

    • Example:

      • Pattern: (a|b)c

      • Matches: "ac" or "bc"

      • Captures: "a" or "b".

  2. Non-Capturing Groups:

    • (?: ) are used for grouping without capturing.

    • Example:

      • Pattern: (?:a|b)c

      • Matches: "ac" or "bc"

      • Captures: None.

5. Flags

Flags are optional modifiers that change the behavior of a regex. They are typically passed as the second argument to Pattern.compile() in Java.

Flag

Description

Code

CASE_INSENSITIVE

Makes the pattern case-insensitive.

Pattern.CASE_INSENSITIVE

MULTILINE

Makes ^ and $ match the start/end of each line.

Pattern.MULTILINE

DOTALL

Makes . match newlines as well.

Pattern.DOTALL

UNICODE_CASE

Enables Unicode-aware case-insensitive matching.

Pattern.UNICODE_CASE

UNIX_LINES

Matches only as a line terminator.

Pattern.UNIX_LINES

Example:

Pattern pattern = Pattern.compile("abc", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher("ABC");  // Matches "ABC" due to case-insensitivity.

6. Anchors

Anchors are zero-width assertions that specify positions in the string (not actual characters).

Anchor

Meaning

Example

^

Matches the start of a string.

Pattern: ^abc → Matches: "abc" at the start.

$

Matches the end of a string.

Pattern: abc$ → Matches: "abc" at the end.

\b

Matches a word boundary.

Pattern: \bword\b → Matches "word" as a whole word.

\B

Matches non-word boundaries.

Pattern: \Bword\B → Matches "word" inside another word.

7. Escaping

Since some characters (meta-characters) have special meanings in regex, they must be escaped with a backslash (\) to be treated literally.

Meta-character

Escaped Form

Description

.

\.

Matches a literal dot.

*

\*

Matches a literal asterisk.

(, )

\(, \)

Matches literal parentheses.

Example:

  • Pattern: 3\.14

    • Matches: "3.14".

    • Does not match: "314".

8. Assertions

Assertions are zero-width patterns that check for specific conditions without consuming any characters.

Assertion

Meaning

Example

Lookahead

Matches if a pattern exists ahead.

Pattern: foo(?=bar) → Matches: "foo" if "bar" follows.

Negative Lookahead

Matches if a pattern does NOT exist ahead.

Pattern: foo(?!bar) → Matches: "foo" if "bar" does NOT follow.

Lookbehind

Matches if a pattern exists behind.

Pattern: (?<=bar)foo → Matches: "foo" if "bar" precedes.

Negative Lookbehind

Matches if a pattern does NOT exist behind.

Pattern: (?<!bar)foo → Matches: "foo" if "bar" does NOT precede.

9. Greedy, Reluctant, and Possessive Quantifiers

Quantifiers in regex can control how much text they try to match:

Type

Symbol

Behavior

Greedy

*, +, ?, {}

Matches as much as possible (default).

Reluctant

*?, +?, ??

Matches as little as possible.

Possessive

*+, ++, ?+

Matches as much as possible without backtracking.

Example:

  • Pattern: a.*b (Greedy)

    • Matches: "a123b456b" (entire string).

  • Pattern: a.*?b (Reluctant)

    • Matches: "a123b" (stops after first "b").

Pattern

About Pattern

The Pattern class represents a compiled regex. It is immutable and thread-safe, meaning a single Pattern instance can be shared across threads.

Advantages:

  • Pre-compiling a regex with Pattern.compile() improves performance for repeated use.

  • Pattern provides advanced regex features like flags and Unicode support.

Features

Feature

Description

Pre-compilation

Compiles a regex once to avoid re-compilation in repeated use.

Flags

Enable special behavior like case-insensitivity or dotall mode.

Group Extraction

Supports capturing groups using parentheses for extracting matched sub-patterns.

Unicode Support

Supports Unicode-aware character classes and case folding.

Advanced Assertions

Provides zero-width assertions like lookaheads and lookbehinds.

Performance Optimization

Supports possessive quantifiers and atomic groups to reduce backtracking.

Escaping Characters

Allows matching meta-characters as literals (e.g., \\. to match a dot).

Supported Methods in Pattern

Feature Group

Method

Description

Compilation

Pattern compile(String regex)

Compiles a regex into a pattern.

Pattern compile(String regex, int flags)

Compiles a regex with specific flags.

Flags

int flags()

Returns the flags used when compiling the pattern.

Matching

boolean matches(String regex, CharSequence input)

Matches the input string against the regex.

Pattern Retrieval

String pattern()

Returns the regex pattern as a string.

Splitting Strings

String[] split(CharSequence input)

Splits the input string around matches of the pattern.

String[] split(CharSequence input, int limit)

Splits the input string around matches, with a limit on splits.

Unicode Support

Pattern UNICODE_CASE

Enables Unicode-aware case folding.

Pattern UNICODE_CHARACTER_CLASS

Enables Unicode-aware character classes.

Some Regex Symbols

Symbol

Description

.

Matches any single character except a newline.

\d

Matches a digit (equivalent to [0-9]).

\D

Matches a non-digit (equivalent to [^0-9]).

\w

Matches a word character (alphanumeric or _).

\W

Matches a non-word character (opposite of \w).

\s

Matches a whitespace character (spaces, tabs, newlines).

\S

Matches a non-whitespace character.

^

Matches the beginning of a line or string.

$

Matches the end of a line or string.

\b

Matches a word boundary.

\B

Matches a position that is not a word boundary.

[...]

Matches any character inside the brackets (e.g., [abc] matches "a", "b", or "c").

[^...]

Matches any character NOT inside the brackets (e.g., [^abc] matches anything except "a", "b", or "c").

?

Matches 0 or 1 occurrence of the preceding element.

*

Matches 0 or more occurrences of the preceding element (greedy).

+

Matches 1 or more occurrences of the preceding element (greedy).

{n}

Matches exactly n occurrences of the preceding element.

{n,}

Matches at least n occurrences of the preceding element.

{n,m}

Matches between n and m occurrences of the preceding element.

(?=...)

Positive lookahead: Ensures that a certain pattern follows.

(?!...)

Negative lookahead: Ensures that a certain pattern does NOT follow.

(?<=...)

Positive lookbehind: Ensures that a certain pattern precedes.

(?<!...)

Negative lookbehind: Ensures that a certain pattern does NOT precede.

\

Escapes special characters (e.g., \\. matches a literal dot).

Matcher

About Matcher

The Matcher class in Java represents an engine that performs match operations on a character sequence using a Pattern. It works as a stateful iterator, allowing for complex matching, group extraction, and replacement operations. The Matcheris not thread-safe, so each thread must use its own instance if concurrency is required.

Features

Feature

Description

Stateful Matching

Allows iteration through matches in a target string using find().

Group Extraction

Extracts specific parts of the matched text using capturing groups ( ).

Position Tracking

Tracks the start and end positions of matches within the input string.

Regex Replacement

Performs targeted replacement using regex patterns with replaceAll() and replaceFirst().

Anchored Matching

Matches from the beginning of the string with matches() or lookingAt().

Region Matching

Limits matching to a specific substring of the input.

Reset Functionality

Allows resetting the Matcher with a new input or pattern.

Supported Methods in Matcher

Feature Group

Method

Description

Matching

boolean matches()

Attempts to match the entire input sequence against the pattern.

boolean lookingAt()

Attempts to match the input sequence from the beginning.

boolean find()

Finds the next subsequence that matches the pattern.

boolean find(int start)

Starts the search at the specified index and finds the next match.

Group Extraction

String group()

Returns the matched subsequence from the last match.

String group(int group)

Returns the specified capturing group's matched subsequence.

int groupCount()

Returns the number of capturing groups in the pattern.

int start()

Returns the start index of the last match.

int start(int group)

Returns the start index of the specified group in the last match.

int end()

Returns the end index (exclusive) of the last match.

int end(int group)

Returns the end index (exclusive) of the specified group in the last match.

Replacement

String replaceAll(String replacement)

Replaces every subsequence that matches the pattern with the replacement string.

String replaceFirst(String replacement)

Replaces the first subsequence that matches the pattern with the replacement string.

Matcher appendReplacement(StringBuffer sb, String replacement)

Appends a replacement to the StringBuffer.

StringBuffer appendTail(StringBuffer sb)

Appends the remaining input after the last match to the StringBuffer.

Position Tracking

int start()

Returns the starting position of the last match.

int end()

Returns the ending position of the last match.

Region Matching

Matcher region(int start, int end)

Sets the bounds of the region within which matches are searched.

boolean hasTransparentBounds()

Checks if the matcher uses transparent bounds.

Matcher useTransparentBounds(boolean b)

Sets whether the matcher uses transparent bounds.

boolean hasAnchoringBounds()

Checks if the matcher uses anchoring bounds.

Matcher useAnchoringBounds(boolean b)

Sets whether the matcher uses anchoring bounds.

Reset

Matcher reset()

Resets the matcher, clearing any previous match state.

Matcher reset(CharSequence input)

Resets the matcher with a new input sequence.

Named Capturing Groups

Named Capturing Groups allow us to assign names to specific groups in a regex pattern. This makes it easier to extract data without relying on the group index.

Syntax

  • Use the format (?<name>...) to define a named group.

  • Use Matcher.group("name") to retrieve the content of the named group.

Example

Pattern pattern = Pattern.compile("(?<day>\\d{2})-(?<month>\\d{2})-(?<year>\\d{4})");
Matcher matcher = pattern.matcher("15-08-2023");
if (matcher.matches()) {
    System.out.println("Day: " + matcher.group("day"));    // Output: 15
    System.out.println("Month: " + matcher.group("month")); // Output: 08
    System.out.println("Year: " + matcher.group("year"));   // Output: 2023
}

Advantages:

  • Improves code readability.

  • Reduces errors caused by incorrect group indices.

Atomic Groups

Atomic Groups are used to prevent backtracking within a group. Once a group is matched, the regex engine will not revisit it, even if the match fails later.

Syntax

  • Use the format (?>...) to define an atomic group.

Example

Pattern pattern = Pattern.compile("(?>a|aa)b");
Matcher matcher = pattern.matcher("aab");
System.out.println(matcher.matches()); // Output: false

Explanation:

  • (?>a|aa) matches "a" first (atomic group), but when it fails to match "b" after it, the regex engine does not backtrack to try "aa".

Use Cases:

  • Performance Optimization: Reduces backtracking for large or complex patterns.

  • Matching Efficiency: Ensures certain patterns are matched only once.

When to Use:

  • When matching rules within a group are strict and should not allow any backtracking.

  • When the regex is suffering from performance issues due to excessive backtracking.

How Pattern and Matcher Work Together ?

The Pattern and Matcher classes in Java's java.util.regex package work together to provide a mechanism for regular expression processing.

Relationship Between Pattern and Matcher

  • Pattern: Represents the compiled version of a regular expression. It is immutable and thread-safe. You create a Pattern once and reuse it across multiple matching operations.

  • Matcher: Represents the engine that performs match operations against a specific input string using the Pattern. It is stateful and not thread-safe.

Workflow

  1. Compile the Regex: A Pattern object is created using Pattern.compile(String regex). This compiles the regex for better performance.

  2. Create a Matcher: A Matcher object is created from the Pattern using Pattern.matcher(CharSequence input).

  3. Perform Matching Operations: The Matcher is used to perform operations like find(), matches(), or replaceAll() on the input string.

import java.util.regex.*;

public class RegexExample {
    public static void main(String[] args) {
        // Step 1: Compile the regex
        Pattern pattern = Pattern.compile("\\d{3}-\\d{2}-\\d{4}");
        
        // Step 2: Create a matcher for the input string
        Matcher matcher = pattern.matcher("123-45-6789");
        
        // Step 3: Perform matching operations
        if (matcher.matches()) {
            System.out.println("The input matches the pattern."); //The input matches the pattern.
        } else {
            System.out.println("The input does not match the pattern.");
        }
    }
}
  • The regex \\d{3}-\\d{2}-\\d{4} is compiled into a Pattern.

  • The Pattern is used to create a Matcher for the input string "123-45-6789".

  • The matches() method checks if the entire input matches the regex.

  • Reuse of Pattern: The Pattern can be reused to create multiple Matcher instances for different input strings.

  • Statefulness of Matcher: The Matcher retains state during operations (e.g., the position of the last match).

  • Thread-Safety:

    • Pattern: Thread-safe and reusable.

    • Matcher: Not thread-safe; each thread should use its own Matcher instance.

Performance Optimization Techniques

Regex operations can sometimes be computationally expensive. Below are techniques to optimize the performance of Pattern and Matcher:

1. Compile the Pattern Once

  • Problem: Re-compiling the regex repeatedly can be expensive.

  • Solution: Compile the regex once using Pattern.compile() and reuse the Pattern object across multiple matching operations.

// Compile once
Pattern pattern = Pattern.compile("\\d{3}-\\d{2}-\\d{4}");

// Reuse Pattern for multiple inputs
Matcher matcher1 = pattern.matcher("123-45-6789");
Matcher matcher2 = pattern.matcher("987-65-4321");

2. Use Lazy Quantifiers When Appropriate

  • Problem: Greedy quantifiers (*, +, ?) can cause excessive backtracking, especially with large input strings.

  • Solution: Use lazy quantifiers (*?, +?, ??) to minimize unnecessary matching attempts.

// Greedy
Pattern greedyPattern = Pattern.compile(".*b");

// Lazy
Pattern lazyPattern = Pattern.compile(".*?b");

3. Avoid Catastrophic Backtracking

  • Problem: Nested quantifiers can lead to exponential backtracking, causing performance issues.

  • Solution:

    • Use atomic groups ((?>...)) to prevent backtracking.

    • Simplify regex patterns to reduce complexity.

// Problematic regex
Pattern pattern = Pattern.compile("(a+)+b");

// Optimized with atomic groups
Pattern atomicPattern = Pattern.compile("(?>(a+))+b");

4. Use Predefined Character Classes

  • Problem: Defining custom character classes like [a-zA-Z0-9_] can make regex verbose and less efficient.

  • Solution: Use predefined character classes like \\w (word character), \\d (digit), or \\s (whitespace).

// Custom character class
Pattern custom = Pattern.compile("[a-zA-Z0-9_]");

// Predefined character class
Pattern predefined = Pattern.compile("\\w");

5. Limit the Region for Matching

  • Problem: Searching the entire string when only a portion is relevant can waste time.

  • Solution: Use Matcher.region(int start, int end) to limit matching to a specific substring.

Matcher matcher = pattern.matcher("123-45-6789");
matcher.region(4, 9); // Search only within "45-6789"

6. Use Anchors for Efficiency

  • Problem: Matching without specifying start (^) or end ($) anchors can lead to unnecessary scanning.

  • Solution: Use anchors to match at specific positions in the input.

// Match only if the entire input is a number
Pattern pattern = Pattern.compile("^\\d+$");

7. Optimize Replacement Operations

  • Problem: Using complex patterns for replacement can be inefficient.

  • Solution:

    • Use Matcher.appendReplacement() and Matcher.appendTail() for fine-grained control.

    • Precompile the Pattern for repeated replacements.

8. Profile and Benchmark Regex

  • Use tools like JMH (Java Microbenchmark Harness) to benchmark regex operations.

  • Analyze the runtime behavior of regex patterns and optimize accordingly.

9. Avoid Using Regex When Simpler Solutions Exist

  • Regex is powerful but can be overkill for simple operations. For example:

    • Use String.contains() for simple substring checks.

    • Use String.split() for basic splitting instead of regex patterns.

PreviousPropertiesNextExamples

Last updated 5 months ago

Was this helpful?