Skip to main content

Structure of Compilers



Mapping of a source program into a semantically equivalent target program is divided into two parts: analysis and synthesis.

The analysis part breaks up the source program into constituent pieces and imposes a grammatical structure on them. It then uses this structure to create an intermediate representation of the source program. The analysis part also collects information about the source program and stores it in a data structure called a symbol table, which is passed along with the intermediate representation to the synthesis part.
The synthesis part constructs the desired target program from the intermediate representation and the information in the symbol table.

The analysis part is often called the front end of the compiler; the synthesis part is the back end.

Compiler operates as a sequence of phases, each of which transforms one representation of the source program to another. These phases are:

Lexical Analysis

The lexical analyzer is the interface between the source program and the compiler. It reads the source program one character at a time, carving the source program into a sequence of atomic units called tokens. In other words, the main function of the lexical analyzer is to determine the tokens, that may be identifiers, keywords, constants, operations, and punctuation symbols such as commas and parentheses.

For example, suppose a source program contains the following statement:

IF ( 5 eq MAX ) GOTO 100

The characters in this statement are mapped into the following eight tokens passed on to the syntax analyzer:

“IF”, “(”, “5”, “eq”, “MAX”, “)”, “GOTO”, and “100”

Blanks separating the lexemes would be discarded by the lexical analyzer.

Syntax Analysis

The second phase of the compiler is syntax analysis or parsing. A parser has two functions as:

(i)   To check whether the tokens occurring in the input are permitted by the specification of the source language, and
(ii)  To give the sequence of tokens, a tree like structure also called as parse tree.

The parser uses the first components of the tokens produced by the lexical analyzer to create a tree-like intermediate representation that depicts the grammatical structure of the token stream. A typical representation is a syntax tree in which each interior node represents an operation and the children of the node represent the arguments of the operation.

For example, consider an operation
                                                  A / B * C
Parser will generate two parse trees for the statement as:

PARSER checks whether the output of lexical analyzer satisfies the context free grammar (CFG).

Semantic Analysis

The semantic analyzer uses the syntax tree and the information in the symbol table to check the source program for semantic consistency with the language definition. It also gathers type information and saves it in either the syntax tree or the symbol table, for subsequent use during intermediate-code generation. An important part of semantic analysis is type checking, where the compiler checks that each operator has matching operands. For example, many programming language definitions require an array index to be an integer; the compiler must report an error if a floating-point number is used to index an array.

Intermediate Code Generation

In the process of translating a source program into target code, a compiler may construct one or more intermediate representations, which can have a variety of forms. Syntax trees are a form of intermediate representation; they are commonly used during syntax and semantic analysis.

For example, the parse tree generated for the statement: A/B*C

  T1 = B * C                                                 T1 = A/B
  T2 = T1/A                                                T2 = T1*C

Code Optimization

The machine-independent code-optimization phase attempts to improve the intermediate code so that better target code will result. Usually better means faster, but other objectives may be desired, such as shorter code, or target code that consumes less power.

Code optimization can be done in two ways:

Local Optimization: There are local transformations that can be applied to a program to attempt an improvement.

For example, consider the following statements:

                              if A > B GOTO L2
                              GOTO L3
          L2:
                              ----------
                              ----------
          L3:
                              ----------
                              ----------
This sequence could be replaced by the single statement

                              if A <=B GOTO L3
          L3:
                              ----------
                              ----------
Loop Optimization: A typical improvement is to move a computation that produces the same result, each time around the loop to a point in the program just before the loop is entered.

Code Generation

The code generator takes as input an intermediate representation of the source program and maps it into the target language. If the target language is machine code, registers or memory locations are selected for each of the variables used by the program. Then, the intermediate instructions are translated into sequences of machine instructions that perform the same task. A crucial aspect of code generation is the judicious assignment of registers to hold variables.

For example, A statement A = B + C is mapped into the target language code sequence as:
                              LOAD B
                              ADD C
                              STORE A

Symbol Table Management (BOOK KEEPING)

A compiler need to collect information about all the data objects that appear in the source program. The information is collected by early phases of the compiler – lexical and syntactic analysis – and entered into the symbol table.
The symbol table is a data structure containing a record for each variable name, with fields for the attributes of the name. The data structure should be designed to allow the compiler to find the record for each name quickly and to store or retrieve data from that record quickly.

Comments