自动化数据抓取技术(IV):正则表达Regex

学习教程和文档

Regex Cheat Sheet。参看rexegg.com网站教程](https://www.rexegg.com/regex-quickstart.html#lookarounds)

regular-expressions.info。参看网站教程

常见问题场景

任意中文字符

具体请参看

v=c("a","b","c","中","e","文")
    grep("[\\p{Han}]", v, value = T, perl = T)
## [1] "中" "文"

指定出现次数

具体请参看参看

  • ?, ?? : 0 or 1 occurrences (?? is lazy, ? is greedy)

  • *, *? : any number of occurrences

  • +, +? : at least one occurrence

  • {n} : exactly n occurrences

  • {n,m} : n to m occurrences, inclusive

  • {n,m}? : n to m occurences, lazy

  • {n,}, {n,}? : at least n occurrence

例子:

To get “exactly N or M”, you need to write the quantified regex twice, unless m,n are special:

X{n,m} if m = n+1
(?:X{n}){1,2} if m = 2n

杂谈

中文半破折号

通过regex查找替换

(\d{4})-(\d{4})  替换为 \1—\2
Hu Huaping
Hu Huaping
PhD on Agricultural Economic and Management

My research interests include Data Science, Statistics, Agricultural Economics and Management.

Related