10.4. Data Frames and Storage Allocation.   

Variables (and structures) denote run time storage. The main task of a descriptor phase in compiling is to insert a run- time address into the "address" field of the descriptor.


Once this is known, then the translation phase (= code generation)
can generate instructions to reference the relevant memory cell.
i.e. Y := 4
LIT 0 4
STO 0 3

In this case the 3 refers to a stack base address offset for level 0.

The basic compile-time storage allocation mechanism involves a DATA FRAME. Each subdivision of a program - procedure or subroutine, block - can have its own particular DATA FRAME. Where does Java allow separate DATA FRAMES?


MAINLINE = global variables - permanent static
procedure & function = local variables
= temporary on stack dynamic.

In fact Java always (?) has its data, even global variables, on the stack as in PL0.

Languages such as FORTRAN-66 have all variables as set memory locations i.e. subroutines have their own local variables which are permanent - they do not change from one invokation to the next. Therefore no recursion.


Programming Language 'C':


(1) Global variables as in Java = permanent.
(2) Local variables (even in mainline) on the stack.
Therefore recursion allowed.
(3) Static variables in procedures as well as (2) above.
(Similar to FORTRAN-66). Therefore variables which are only
defined in this procedure but which retain the values from
invokation to invokation.
(4) Register variables: Since 'C' is a system programming
language the programmer is given the choice of whether a
variable is put in fast memory = registers or not.
This can speed up program considerably.
(5) 'C' (and Java) also allow variable declarations within block i.e.
if(expression)
{
int a,b,c;
double x,y,z;
CODE for block which can now use
these TEMPORARY variables a,b,c,x,y,z:
} /* variables were on stack and now disappear */

Java allows this since the temporary variables must still be defined before they are used. It makes the compiler a bit more complex and therefore slower.

To facilitate addressing, the compiler allocates consecutive storage locations to the data objects within the data frame. It may not know the absolute address of the objects - perhaps this is left up to the loader - but it does know the relative positions with respect to the start of the data frame.

Note that this applies to general case and not just structured languages, although we still have an offset for the variable on the stack.


i.e. MOVE 0, Y(17) DEC-10
STO 0 3 on PL0 machine
MOV R1, 12(SP) PDP-11's
Y is relative to stock base
3 relative to base frame
12 relative to stack pointer.

For statically allocated data objects FORTRAN-66, COBOL, i.e. not stored on stack, this is all that needs to be done. All addresses are calculated using base address of data frame for local variables or from base address of data frame for a named COMMON area.

Even for dynamically allocated storage i.e. stack, the addressing is simple. When executing the program, there will be pointers into the stack which indicate the base of a particular data frame. Compiler generates code to move pointer to relevant data frame and then code which accesses the data object using an offset (relative to base data frame) which it can get from object descriptor in symbol table.

It is quite possible with many languages for a single identifier to reference different objects. Why is this?


Different procedures can contain same identifier and
then local identifier takes precedence over non-local variables.
In 'C' this means that within our "block" variables/identifiers
can be re-used.

Therefore symbol table must contain an entry for each identifier some of which may be same name, but different "level". The translsator must discover which of the identifiers is relevant in different parts of the program. How is this done in PL0?

Since all items are put into table [Tx] what happens if we have
two variables of same name but at different levels?
two variables with the same name at same level?


var X,X,X;
or procedure FISH;
var F; why is F not found
begin end; if it is in symbol table?
begin F 0
end.

In real languages with complex descriptors, the solution to multiple names is to maintain a list of descriptors in each symbol table entry and to manipulate the list during translation so that the correct descriptor is at the front of the list. Collect all descriptors for relevant variables declared in the block (or whatever) and insert these into the front of the list of descriptors just before translation or code generation. After translation of the code in this block each descriptor can be removed from the list.

In PL0, how is a local variable kept distinct from a non- local variable of the same name? Both are entered into the symbol table by ENTER. Why is local given preference?


POSITION scans backwards from TX to 0 and gets 1st identifier that matches.
Since identifiers are entered at end of table/list as procedures are processed
(i.e. block within block) the local variables come after the global
and hence are found first in a backwards scan.

Surely though, when back in mainline, the local variable of a procedure will be found first again and not the correct global variable! Why not?

Due to TX being a value parameter.

Note that it may be necessary to preserve all symbols at each level for a reason: DEBUGGING.

 When debugging a program it is necessary for the debug routines

to be able to reference local variables correctly.
Therefore the information may be saved to be included as
symbols in the .REL file. The linker program can then remove the symbols
if not wanted (since they take up space and the program
code certainly doesn't need them since it uses relative
addressing) OR it can leave it for use by the DEBUG program.

Using the primitive identifier look-up scheme in PL0, can you see how this reflects the notion of scope of variables and procedures? Is anything not catered for?