Syntax And Semantics In Programming Languages¶
Definition¶
Syntax: The syntax of a programming language is the set of rules that defines the combinations of symbols that are considered to be correctly structured programs in that language.
Semantics: The semantics of a programming language is the set of rules that defines the meaning of the programs in that language.
Formal Language Theory¶
Formal language theory is a branch of mathematics and computer science that deals with the theoretical properties of formal languages.
Components of a language (including programming languages)¶
Alphabet: A finite set of symbols.
String: A sequence of symbols from the alphabet.
Language: A set of strings.
Grammar: A set of rules that generate strings in a language.
Chomsky Hierarchy¶
Type |
Grammar |
Automaton |
Language Example |
---|---|---|---|
Type 0 |
Unrestricted |
Turing Machine (TM) |
Any language |
Type 1 |
Context-sensitive |
Linear-bounded automaton (LBA) |
Most languages |
Type 2 |
Context-free |
Pushdown automaton (PDA) |
Expression grammar |
Type 3 |
Regular |
Finite automaton (FA) |
Regular expressions, Lexemes |
Notation Systems¶
Chomsky notation: describes any formal grammar
Backus-Naur Form (BNF): describes context-free grammars
Extended Backus-Naur Form (EBNF): equivalent to BNF but more readable
Formal Language Theory In Programming Languages¶
Scope¶
Formal language theory fully handles the syntax of programming languages.
Semantics requires more advanced mathematical tools.
Syntax¶
Lexeme/Token level: the smallest unit of syntax.
Usually as regular languages.
Alphabet: ASCII character
String: Lexeme
Grammar: Rule to construct lexemes
Tokens: Category of lexemes
Syntax level: the structure of the program.
Usually as context-free languages.
Alphabet: Tokens
String: Program
Grammar: Rule to construct programs
Compiler/Interpreter Implementation¶
Lexical analysis
Lexer/tokenizer/scanner
Goal:
Breaks the program into lexemes
Assigns tokens to lexemes
Syntax analysis
Parser
Goal:
Checks the structure of the program (syntax) for correctness
Generates a parse tree
Limitations:
Most parsers can only handle a subset of context-free languages
Limits the expressiveness of the language
Types:
Top-down vs bottom-up
LL vs LR
Semantic analysis
Extensions to the parser to handle context-sensitive portion of the language.
Examples:
Static semantics uses type systems and attribute grammars
Dynamic semantics uses operational, denotational, or axiomatic approaches
Goes beyond what formal language theory can describe