Symbol Table in Compiler Design
- Symbol tables are data structures that are used by compilers to hold information about source-program constructs. The information is collected incrementally by the analysis phases of a compiler and used by the synthesis phases to generate the target code.
- Entries in the symbol table contain information about an identifier such as its character string (or lexeme) , its type, its position in storage, and any other relevant information.
- The symbol table, which stores information about the entire source program, is used by all phases of the compiler.
- An essential function of a compiler is to record the variable names used in the source program and collect information about various attributes of each name.
- These attributes may provide information about the storage allocated for a name, its type, its scope.
- A symbol table can be implemented in one of the following ways:
- Among the above all, symbol tables are mostly implemented as hash tables, where the source code symbol itself is treated as a key for the hash function and the return value is the information about the symbol.
- A symbol table may serve the following purposes depending upon the language in hand:
- To store the names of all entities in a structured form at one place.
- To verify if a variable has been declared.
- To implement type checking, by verifying assignments and expressions.
- To determine the scope of a name (scope resolution).
- A compiler uses a symbol table to keep track of scope and binding information about names. The symbol table is searched every time a name is encountered in the source text.
- Changes to the table occur if a new name or new information about an existing name is discovered. A linear list is the simplest to implement, but its performance is poor. Hashing schemes provide better performance.
- The symbol table grows dynamically even though fixed at compile time.
- Each entry in the symbol table is for the declaration of a name.
- The format of entries does not uniform.
- The following information about identifiers are stored in symbol table.
Symbol Table Entries
Characters in a Name
- There is a distinction between the token id for an identifier or name.
- The lexeme consisting of the character string forming the name, and the attributes of the name.
- Strings of characters may be unwieldy to work with, so compilers often use some fixed-length representation of the name rather than the lexeme.
- The lexeme is needed when a symbol-table entry is set up for the first time, and when we look up a lexeme found in the input to determine whether it is a name that has already appeared.
- A common representation of a name is a pointer to a symbol-table entry for it.
If there is a modest upper bound on the length of a name, then the characters in the name can be stored in symbol-table entry
Characters in a Name
If there is no limit on the length of a name, or if the limit is rarely reached, the indirect scheme can be used.
Symbol table Names in a Seperate Array
Read AlsoC Dynamic Memory Allocation
Storage Allocation Information
- Information about the storage locations that will be bund to names at run time is kept in the symbol table.
- Static and dynamic allocation can be done.
- Storage is allocated for code, data, stack, and heap.
- COMMON blocks in Fortran are loaded separately.
The List Data Structure for Symbol Tables
- The compiler plans out the activation record for each procedure.
- The simplest and easiest to implement data structure for a symbol table is a linear list of records.
- We use a single array, or equivalently several arrays. To store names and their associated information.
- If the symbol table contains n names, To find the data about a name, on the average, we search n/2 names, so the cost of an inquiry is also proportional to n.
List of Data Structure
Read AlsoPython Hash Tables Hashing Dictionary.
Hash Tables for Symbol Tables
- Variations of the searching technique known as hashing have been implemented in many compilers.
- Open hashing is a simplest variant of searching technique.
- Even this scheme gives us the capability of performing e inquiries on n names in time proportional to n ( n+e) / m, for any constant m of our choosing.
- This method is generally more efficient than linear lists.
- The basic hashing scheme is illustrated. There are two parts to the data structure:
Hash Table of Size 210
Representing Scope Information
- A simple approach is to maintain a separate symbol table for each scope. In effect, the symbol table for a procedure or scope is the compile time equivalent of an activation record.
- Linked list is best to represent the Scope Information.
Representing of Symbol Table