********************************************* Syntax And Semantics In Programming Languages ********************************************* Definition ========== - Syntax: The syntax of a programming language is the set of rules that defines the combinations of symbols that are considered to be correctly structured programs in that language. - Semantics: The semantics of a programming language is the set of rules that defines the meaning of the programs in that language. Formal Language Theory ====================== Formal language theory is a branch of mathematics and computer science that deals with the theoretical properties of formal languages. Components of a language (including programming languages) ---------------------------------------------------------- - Alphabet: A finite set of symbols. - String: A sequence of symbols from the alphabet. - Language: A set of strings. - Grammar: A set of rules that generate strings in a language. Related Concepts ---------------- - Automata theory: Study of abstract machines and their computational capabilities. Automata are used to recognize patterns in strings for a specific language. - Chomsky hierarchy: A classification of formal languages based on their generative power. Chomsky Hierarchy ----------------- .. csv-table:: :header: "Type", "Grammar", "Automaton", "Language Example" "Type 0", "Unrestricted", "Turing Machine (TM)", "Any language" "Type 1", "Context-sensitive", "Linear-bounded automaton (LBA)", "Most languages" "Type 2", "Context-free", "Pushdown automaton (PDA)", "Expression grammar" "Type 3", "Regular", "Finite automaton (FA)", "Regular expressions, Lexemes" Notation Systems ---------------- + Chomsky notation: describes any formal grammar + Backus-Naur Form (BNF): describes context-free grammars + Extended Backus-Naur Form (EBNF): equivalent to BNF but more readable Formal Language Theory In Programming Languages =============================================== Scope ----- - Formal language theory fully handles the syntax of programming languages. - Semantics requires more advanced mathematical tools. Syntax ------ - Lexeme/Token level: the smallest unit of syntax. - Usually as regular languages. - **Alphabet**: ASCII character - **String**: Lexeme - **Grammar**: Rule to construct lexemes - Tokens: Category of lexemes - Syntax level: the structure of the program. - Usually as context-free languages. - **Alphabet**: Tokens - **String**: Program - **Grammar**: Rule to construct programs Compiler/Interpreter Implementation =================================== - Lexical analysis * Lexer/tokenizer/scanner * Goal: + Breaks the program into lexemes + Assigns tokens to lexemes - Syntax analysis * Parser * Goal: + Checks the structure of the program (syntax) for correctness + Generates a parse tree * Limitations: + Most parsers can only handle a subset of context-free languages + Limits the expressiveness of the language * Types: + Top-down vs bottom-up + LL vs LR - Semantic analysis * Extensions to the parser to handle context-sensitive portion of the language. * Examples: - Static semantics uses type systems and attribute grammars - Dynamic semantics uses operational, denotational, or axiomatic approaches - Goes beyond what formal language theory can describe