Syntax And Semantics In Programming Languages

Definition

  • Syntax: The syntax of a programming language is the set of rules that defines the combinations of symbols that are considered to be correctly structured programs in that language.

  • Semantics: The semantics of a programming language is the set of rules that defines the meaning of the programs in that language.

Formal Language Theory

Formal language theory is a branch of mathematics and computer science that deals with the theoretical properties of formal languages.

Components of a language (including programming languages)

  • Alphabet: A finite set of symbols.

  • String: A sequence of symbols from the alphabet.

  • Language: A set of strings.

  • Grammar: A set of rules that generate strings in a language.

Chomsky Hierarchy

Type

Grammar

Automaton

Language Example

Type 0

Unrestricted

Turing Machine (TM)

Any language

Type 1

Context-sensitive

Linear-bounded automaton (LBA)

Most languages

Type 2

Context-free

Pushdown automaton (PDA)

Expression grammar

Type 3

Regular

Finite automaton (FA)

Regular expressions, Lexemes

Notation Systems

  • Chomsky notation: describes any formal grammar

  • Backus-Naur Form (BNF): describes context-free grammars

  • Extended Backus-Naur Form (EBNF): equivalent to BNF but more readable

Formal Language Theory In Programming Languages

Scope

  • Formal language theory fully handles the syntax of programming languages.

  • Semantics requires more advanced mathematical tools.

Syntax

  • Lexeme/Token level: the smallest unit of syntax.

    • Usually as regular languages.

    • Alphabet: ASCII character

    • String: Lexeme

    • Grammar: Rule to construct lexemes

    • Tokens: Category of lexemes

  • Syntax level: the structure of the program.

    • Usually as context-free languages.

    • Alphabet: Tokens

    • String: Program

    • Grammar: Rule to construct programs

Compiler/Interpreter Implementation

  • Lexical analysis

    • Lexer/tokenizer/scanner

    • Goal:

      • Breaks the program into lexemes

      • Assigns tokens to lexemes

  • Syntax analysis

    • Parser

    • Goal:

      • Checks the structure of the program (syntax) for correctness

      • Generates a parse tree

    • Limitations:

      • Most parsers can only handle a subset of context-free languages

      • Limits the expressiveness of the language

    • Types:

      • Top-down vs bottom-up

      • LL vs LR

  • Semantic analysis

    • Extensions to the parser to handle context-sensitive portion of the language.

    • Examples:

      • Static semantics uses type systems and attribute grammars

      • Dynamic semantics uses operational, denotational, or axiomatic approaches

      • Goes beyond what formal language theory can describe