Symbol Table in Compiler Design




Symbol Table

  • Symbol tables are data structures that are used by compilers to hold information about source-program constructs. The information is collected incrementally by the analysis phases of a compiler and used by the synthesis phases to generate the target code.
  • Entries in the symbol table contain information about an identifier such as its character string (or lexeme) , its type, its position in storage, and any other relevant information.
 Symbol Table

Symbol Table

  • The symbol table, which stores information about the entire source program, is used by all phases of the compiler.
  • An essential function of a compiler is to record the variable names used in the source program and collect information about various attributes of each name.
  • These attributes may provide information about the storage allocated for a name, its type, its scope.
  • A symbol table can be implemented in one of the following ways:
  • Among the above all, symbol tables are mostly implemented as hash tables, where the source code symbol itself is treated as a key for the hash function and the return value is the information about the symbol.
  • A symbol table may serve the following purposes depending upon the language in hand:
    • To store the names of all entities in a structured form at one place.
    • To verify if a variable has been declared.
    • To implement type checking, by verifying assignments and expressions.
    • To determine the scope of a name (scope resolution).

Symbol-Table Entries

  • A compiler uses a symbol table to keep track of scope and binding information about names. The symbol table is searched every time a name is encountered in the source text.
  • Changes to the table occur if a new name or new information about an existing name is discovered. A linear list is the simplest to implement, but its performance is poor. Hashing schemes provide better performance.
  • The symbol table grows dynamically even though fixed at compile time.
  • Each entry in the symbol table is for the declaration of a name.
  • The format of entries does not uniform.
  • The following information about identifiers are stored in symbol table.
    • The name.
    • The data type.
    • The block level.
    • Its scope (local, global).
    • Pointer / address
    • Its offset from base pointer
    • Function name, parameter and variable.
 Symbol Table Entries

Symbol Table Entries

Characters in a Name

  • There is a distinction between the token id for an identifier or name.
  • The lexeme consisting of the character string forming the name, and the attributes of the name.
  • Strings of characters may be unwieldy to work with, so compilers often use some fixed-length representation of the name rather than the lexeme.
  • The lexeme is needed when a symbol-table entry is set up for the first time, and when we look up a lexeme found in the input to determine whether it is a name that has already appeared.
  • A common representation of a name is a pointer to a symbol-table entry for it.

If there is a modest upper bound on the length of a name, then the characters in the name can be stored in symbol-table entry

 Characters in a Name

Characters in a Name

If there is no limit on the length of a name, or if the limit is rarely reached, the indirect scheme can be used.

 Name in Seperate Array

Symbol table Names in a Seperate Array

Storage Allocation Information

  • Information about the storage locations that will be bund to names at run time is kept in the symbol table.
  • Static and dynamic allocation can be done.
  • Storage is allocated for code, data, stack, and heap.
  • COMMON blocks in Fortran are loaded separately.

The List Data Structure for Symbol Tables

  • The compiler plans out the activation record for each procedure.
  • The simplest and easiest to implement data structure for a symbol table is a linear list of records.
  • We use a single array, or equivalently several arrays. To store names and their associated information.
  • If the symbol table contains n names, To find the data about a name, on the average, we search n/2 names, so the cost of an inquiry is also proportional to n.
 List of Data Structure

List of Data Structure

Hash Tables for Symbol Tables

  • Variations of the searching technique known as hashing have been implemented in many compilers.
  • Open hashing is a simplest variant of searching technique.
  • Even this scheme gives us the capability of performing e inquiries on n names in time proportional to n ( n+e) / m, for any constant m of our choosing.
  • This method is generally more efficient than linear lists.
  • The basic hashing scheme is illustrated. There are two parts to the data structure:
    • A hash table consisting of a fixed array of m pointers to table entries.
    • Table entries organized into m separate linked lists, called buckers (some buckets may be empty). Each record in the symbol table appears on exactly one of these lists.
 Hash Table

Hash Table of Size 210

Representing Scope Information

  • A simple approach is to maintain a separate symbol table for each scope. In effect, the symbol table for a procedure or scope is the compile time equivalent of an activation record.
  • Linked list is best to represent the Scope Information.
 Representing of Symbol Table

Representing of Symbol Table



Related Searches to Symbol Table in Compiler Design