INT201 W3
It’s the note of week3. (INT201 W3)
Lecture
Regular expressions 正则表达式
A regular expression (r.e.) is a way of describing a language. It can consist of a finite set of words {cat, dog, mouse, ε}
which represents the 4 words that are listed. 正则表达式是描述语言的一种方式,它可以由一组有限的单词组成。
Then, there are 3 operators that can appear in r.e’s: they are 有3个操作符可以出现在re中
- Union
- Concatenation
- Closure
these operators allow you to glue together subexpressions to form larger expression. 这些操作符允许您将子表达式粘在一起,形成更大的表达式。
Union and Concatenation
Union: Any r.e. represnets a set of words (its language) and we may use ∪
to connect 2 subexpressions into a larger regular expression {cat, dog, mouse, ε} ∪ {cat, cats}
which represents the 5 words ε, cat, cats, dog, mouse
. 就是一般的并集。
Concatenation: By joining two r.e’s together, we denote the set of words you can make by taking a word from the first r.e. and concatenating it to some word from the second r.e.,{over, under}{cooked, state, rate}
denotes the words overcooked, undercooked, overstate, understate, overrate, underrate
. 就是把两边的单词合并的集合。有先后次序的区别,**+就是union,乘号就是concatenation**。
Closure: If we take a regular expression and add the superscript *, we get a new r.e. that represents the set of all words you can make by taking any sequence of words from the original r.e. and concatenating them together. ({cat })∗
denotes the words ε, cat, catcat, catcatcat, catcatcatcat, ...
Notice that using closure, you can define an infinite language! 注意,使用闭包,您可以定义无限语言!
Common extensions to the notation 符号的通用扩展
Let E denote a regular expression.
- (E)n denotes concatenations of n words generated by E. (Could be written EEE…E(n times)) 多次重复E
- (E)+ denotes concatenations of at least one word generated by E (Could be written EE∗ or E∗E). 由E生成的至少一个单词的连接
The above give no extra expressive power in terms of what languages can be described.
Lexical tokens 词法记号
We can use regular expressions for short, precise definitions of lexical tokens:
“variable: string of letters/digits starting with a letter” r.e.: {a,b,...,z,A,B,...,Z} {a,b,...,z,A,B,...,Z, 0,1,2,...9}∗
number in exponential/scientific notation 指数/科学记数法 (“1.16121122E-03” denotes 1.16121122 × 10−3, i.e. 0.00116121122): {1, 2, 3, ..., 9}.{0, 1, 2, ..., 9}8E{00,{ε, −}{{1, 2, ..., 9}{0, 1, 2, ..., 9}, {0, 1, 2, ..., 9}{1, 2, ..., 9}}}
A language that is easier to describe using a regular expression.
A language that is easier to describe using a DFA.
Equivalences amongst regular expressions
Reference
- XJTLU slides (INT201 W3)