正则表达式

Youky ... 2021-10-11

About 3 min

# 正则表达式

# 定义

用描述性的语言定义一个规则，并检验字符串是否符合规则。

创建方式有两种：

字面量创建

const reg1 = /\d/g;

构造函数创建。此时对于特殊字符\需要进行转义

const reg2 = new RegExp('\\d', 'g')
console.log(reg2) // /\d/g

1
2

# 匹配规则

# 特殊标记

\：用于特殊字符的转义
.：匹配任何字符（除了换行符）
^：匹配的开头
$：匹配的结尾

# 转义

\n：换行
\s：空格符（空格、回车、制表符等）
\d：数字
\D：非数字
\w：单字字符（数字、字母、下划线）

# 匹配次数

*：任意次数
+：至少一次，等价于{1,}
?：0或1次，等价于{0,1}
{n}：n次
{n,m}：n到m次（）包括n和m。若省略m则表示至少n次

# 匹配若干个元素

x|y：匹配x或y
[abc]：匹配a、b、c中的任意一个
[^abc]：匹配不属于该集合的字符
[0-9a-zA-Z_$]：合法的变量名可用的字符

# 正则表达式的标志

标志	含义
`g`	全局匹配
`i`	不区分大小写
`m`	多行搜索
`s`	允许`.`匹配换行符
`u`	使用Unicode模式进行匹配
`y`	使用粘性搜索（每次从字符串的当前位置开始）

全局匹配：当一次匹配结束时，下一次匹配从上一次的结束位置开始 eg:

const reg = /\d+/g
const str = '12a3'
console.log(reg.exec(str)) // [ '12', index: 0, input: '12a3', groups: undefined ]
console.log(reg.exec(str)) // [ '3', index: 3, input: '12a3', groups: undefined ]

1
2
3
4

而默认情况下是每次匹配都从头开始：

const reg = /\d+/
const str = '12a3'
console.log(reg.exec(str)) // [ '12', index: 0, input: '12a3', groups: undefined ]
console.log(reg.exec(str)) // [ '12', index: 0, input: '12a3', groups: undefined ]

1
2
3
4

粘性搜索：下次匹配必须从上一次终止的位置开始

const reg = /\d+/y
const str = '12a3'
console.log(reg.exec(str)) // [ '12', index: 0, input: '12a3', groups: undefined ]
console.log(reg.exec(str)) // null，因为下一次开始的index为2，对应的字符为a

1
2
3
4

# 匹配方法

# 判断字符串是否符合规范：reg.test

const str = '123@qq.com';
const reg = /^\d+@\w+\.com$/
console.log(reg.test(str)) // true

1
2
3

# 提取字符串内容并分组：reg.exec

reg.exec方法返回的是一个数组，首项是字符串匹配成功了的部分，数组的剩下几项依次是匹配到的分组。

注意，这个数组上还添加了几个属性：

index：开始匹配的下标（没有开启全局匹配时默认是0）
input：完整的字符串
groups：具名组匹配时的各个分组

const str = '2021-10-11a';
const reg = /(\d{4})-(\d{2})-(\d{2})/;
const res = reg.exec(str);
console.log(Array.isArray(res)); // true
console.log(res);
// ['2021-10-11', '2021', '10', '11', index: 0, input: '2021-10-11', groups: undefined];

1
2
3
4
5
6

# 贪婪匹配

默认情况下，会尽量匹配多的字符，称为贪婪匹配

const str = '123456789'
const reg1 = /(\d{1,3})(\d{1,3})/g 
console.log(reg1.exec(str)) // [ '123456', '123', '456', index: 0, input: '123456789', groups: undefined]

1
2
3

在匹配数量后添加?，可采用惰性匹配模式，匹配尽量少的字符

const reg2 = /(\d{1,3}?)(\d{1,3}?)/g 
console.log(reg2.exec(str)) // [ '12', '1', '2', index: 0, input: '123456789', groups: undefined]

1
2

# 断言匹配

某一部分是否匹配成功取决于前后元素的情况，属于上下文有关的匹配

x(?=y)：只有x之后是y才匹配
(?<=y)x：只有x之前是y才匹配
x(?!y)：只有x之后不是y才匹配
(?<!y)x：只有x之前不是y才匹配

# 分组匹配

用括号()将内容括起来，可以将正则表达式分为若干个部分，分组的内容会反应在exec方法返回的数组中

同时，每个分组可以进行命名，称为具名组匹配

const str = '2021-10-10asd';
const reg = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
console.log(reg.exec(str).groups) // { year: '2021', month: '10', day: '10' }

1
2
3

正则表达式

# 正则表达式

# 定义

# 匹配规则

# 特殊标记

# 转义

# 匹配次数

# 匹配若干个元素

# 正则表达式的标志

# 匹配方法

# 判断字符串是否符合规范：reg.test

# 提取字符串内容并分组：reg.exec

# 贪婪匹配

# 断言匹配

# 分组匹配

Description