Mapping of a source program into a semantically equivalent target program is divided into two parts: analysis and synthesis.
The
analysis part breaks up the source program into constituent pieces and imposes
a grammatical structure on them. It then uses this structure to create an
intermediate representation of the source program. The analysis part also
collects information about the source program and stores it in a data structure
called a symbol table, which is passed along with the intermediate
representation to the synthesis part.
The
synthesis part constructs the desired target program from the intermediate
representation and the information in the symbol table.
The
analysis part is often called the front end of the compiler; the synthesis part
is the back end.
Compiler
operates as a sequence of phases, each of which transforms one representation
of the source program to another. These phases are:
Lexical
Analysis
The
lexical analyzer is the interface between the source program and the compiler.
It reads the source program one character at a time, carving the source program
into a sequence of atomic units called tokens. In other words, the main
function of the lexical analyzer is to determine the tokens, that may be
identifiers, keywords, constants, operations, and punctuation symbols such as
commas and parentheses.
For
example, suppose a source program contains the following statement:
IF ( 5 eq MAX ) GOTO 100
The
characters in this statement are mapped into the following eight tokens passed
on to the syntax analyzer:
“IF”,
“(”, “5”, “eq”, “MAX”, “)”, “GOTO”, and “100”
Blanks separating the
lexemes would be discarded by the lexical analyzer.
Syntax
Analysis
The
second phase of the compiler is syntax analysis or parsing. A parser has two
functions as:
(i)
To
check whether the tokens occurring in the input are permitted by the
specification of the source language, and
(ii) To give the sequence
of tokens, a tree like structure also called as parse tree.
The
parser uses the first components of the tokens produced by the lexical analyzer
to create a tree-like intermediate representation that depicts the grammatical
structure of the token stream. A typical representation is a syntax tree in
which each interior node represents an operation and the children of the node
represent the arguments of the operation.
For
example, consider an operation
A / B * C
Parser
will generate two parse trees for the statement as:
PARSER
checks whether the output of lexical analyzer satisfies the context free
grammar (CFG).
Semantic
Analysis
The
semantic analyzer uses the syntax tree and the information in the symbol table
to check the source program for semantic consistency with the language
definition. It also gathers type information and saves it in either the syntax
tree or the symbol table, for subsequent use during intermediate-code
generation. An important part of semantic analysis is type checking, where the
compiler checks that each operator has matching operands. For example, many
programming language definitions require an array index to be an integer; the
compiler must report an error if a floating-point number is used to index an
array.
Intermediate
Code Generation
In
the process of translating a source program into target code, a compiler may
construct one or more intermediate representations, which can have a variety of
forms. Syntax trees are a form of intermediate representation; they are
commonly used during syntax and semantic analysis.
For
example, the parse tree generated for the statement: A/B*C
T1 =
B * C T1 = A/B
T2 = T1/A T2 = T1*C
Code
Optimization
The
machine-independent code-optimization phase attempts to improve the
intermediate code so that better target code will result. Usually better means
faster, but other objectives may be desired, such as shorter code, or target
code that consumes less power.
Code
optimization can be done in two ways:
Local Optimization: There are local
transformations that can be applied to a program to attempt an improvement.
For
example, consider the following statements:
if A > B GOTO
L2
GOTO L3
L2:
----------
----------
L3:
----------
----------
This
sequence could be replaced by the single statement
if A <=B GOTO
L3
L3:
----------
----------
Loop Optimization: A typical improvement
is to move a computation that produces the same result, each time around the
loop to a point in the program just before the loop is entered.
Code
Generation
The
code generator takes as input an intermediate representation of the source
program and maps it into the target language. If the target language is machine
code, registers or memory locations are selected for each of the variables used
by the program. Then, the intermediate instructions are translated into
sequences of machine instructions that perform the same task. A crucial aspect
of code generation is the judicious assignment of registers to hold variables.
For
example, A statement A = B + C is mapped into the target language code sequence
as:
LOAD B
ADD C
STORE A
Symbol Table Management (BOOK KEEPING)
A
compiler need to collect information about all the data objects that appear in
the source program. The information is collected by early phases of the
compiler – lexical and syntactic analysis – and entered into the symbol table.
The
symbol table is a data structure containing a record for each variable name,
with fields for the attributes of the name. The data structure should be
designed to allow the compiler to find the record for each name quickly and to
store or retrieve data from that record quickly.
Comments
Post a Comment