diff options
Diffstat (limited to 'contrib/gcc/cp/gxxint.texi')
-rw-r--r-- | contrib/gcc/cp/gxxint.texi | 2075 |
1 files changed, 0 insertions, 2075 deletions
diff --git a/contrib/gcc/cp/gxxint.texi b/contrib/gcc/cp/gxxint.texi deleted file mode 100644 index 81bcab864250..000000000000 --- a/contrib/gcc/cp/gxxint.texi +++ /dev/null @@ -1,2075 +0,0 @@ -\input texinfo @c -*-texinfo-*- -@c %**start of header -@setfilename g++int.info -@settitle G++ internals -@setchapternewpage odd -@c %**end of header - -@node Top, Limitations of g++, (dir), (dir) -@chapter Internal Architecture of the Compiler - -This is meant to describe the C++ front-end for gcc in detail. -Questions and comments to Benjamin Kosnik @code{<bkoz@@cygnus.com>}. - -@menu -* Limitations of g++:: -* Routines:: -* Implementation Specifics:: -* Glossary:: -* Macros:: -* Typical Behavior:: -* Coding Conventions:: -* Templates:: -* Access Control:: -* Error Reporting:: -* Parser:: -* Exception Handling:: -* Free Store:: -* Mangling:: Function name mangling for C++ and Java -* Vtables:: Two ways to do virtual functions -* Concept Index:: -@end menu - -@node Limitations of g++, Routines, Top, Top -@section Limitations of g++ - -@itemize @bullet -@item -Limitations on input source code: 240 nesting levels with the parser -stacksize (YYSTACKSIZE) set to 500 (the default), and requires around -16.4k swap space per nesting level. The parser needs about 2.09 * -number of nesting levels worth of stackspace. - -@cindex pushdecl_class_level -@item -I suspect there are other uses of pushdecl_class_level that do not call -set_identifier_type_value in tandem with the call to -pushdecl_class_level. It would seem to be an omission. - -@cindex access checking -@item -Access checking is unimplemented for nested types. - -@cindex @code{volatile} -@item -@code{volatile} is not implemented in general. - -@end itemize - -@node Routines, Implementation Specifics, Limitations of g++, Top -@section Routines - -This section describes some of the routines used in the C++ front-end. - -@code{build_vtable} and @code{prepare_fresh_vtable} is used only within -the @file{cp-class.c} file, and only in @code{finish_struct} and -@code{modify_vtable_entries}. - -@code{build_vtable}, @code{prepare_fresh_vtable}, and -@code{finish_struct} are the only routines that set @code{DECL_VPARENT}. - -@code{finish_struct} can steal the virtual function table from parents, -this prohibits related_vslot from working. When finish_struct steals, -we know that - -@example -get_binfo (DECL_FIELD_CONTEXT (CLASSTYPE_VFIELD (t)), t, 0) -@end example - -@noindent -will get the related binfo. - -@code{layout_basetypes} does something with the VIRTUALS. - -Supposedly (according to Tiemann) most of the breadth first searching -done, like in @code{get_base_distance} and in @code{get_binfo} was not -because of any design decision. I have since found out the at least one -part of the compiler needs the notion of depth first binfo searching, I -am going to try and convert the whole thing, it should just work. The -term left-most refers to the depth first left-most node. It uses -@code{MAIN_VARIANT == type} as the condition to get left-most, because -the things that have @code{BINFO_OFFSET}s of zero are shared and will -have themselves as their own @code{MAIN_VARIANT}s. The non-shared right -ones, are copies of the left-most one, hence if it is its own -@code{MAIN_VARIANT}, we know it IS a left-most one, if it is not, it is -a non-left-most one. - -@code{get_base_distance}'s path and distance matters in its use in: - -@itemize @bullet -@item -@code{prepare_fresh_vtable} (the code is probably wrong) -@item -@code{init_vfields} Depends upon distance probably in a safe way, -build_offset_ref might use partial paths to do further lookups, -hack_identifier is probably not properly checking access. - -@item -@code{get_first_matching_virtual} probably should check for -@code{get_base_distance} returning -2. - -@item -@code{resolve_offset_ref} should be called in a more deterministic -manner. Right now, it is called in some random contexts, like for -arguments at @code{build_method_call} time, @code{default_conversion} -time, @code{convert_arguments} time, @code{build_unary_op} time, -@code{build_c_cast} time, @code{build_modify_expr} time, -@code{convert_for_assignment} time, and -@code{convert_for_initialization} time. - -But, there are still more contexts it needs to be called in, one was the -ever simple: - -@example -if (obj.*pmi != 7) - @dots{} -@end example - -Seems that the problems were due to the fact that @code{TREE_TYPE} of -the @code{OFFSET_REF} was not a @code{OFFSET_TYPE}, but rather the type -of the referent (like @code{INTEGER_TYPE}). This problem was fixed by -changing @code{default_conversion} to check @code{TREE_CODE (x)}, -instead of only checking @code{TREE_CODE (TREE_TYPE (x))} to see if it -was @code{OFFSET_TYPE}. - -@end itemize - -@node Implementation Specifics, Glossary, Routines, Top -@section Implementation Specifics - -@itemize @bullet -@item Explicit Initialization - -The global list @code{current_member_init_list} contains the list of -mem-initializers specified in a constructor declaration. For example: - -@example -foo::foo() : a(1), b(2) @{@} -@end example - -@noindent -will initialize @samp{a} with 1 and @samp{b} with 2. -@code{expand_member_init} places each initialization (a with 1) on the -global list. Then, when the fndecl is being processed, -@code{emit_base_init} runs down the list, initializing them. It used to -be the case that g++ first ran down @code{current_member_init_list}, -then ran down the list of members initializing the ones that weren't -explicitly initialized. Things were rewritten to perform the -initializations in order of declaration in the class. So, for the above -example, @samp{a} and @samp{b} will be initialized in the order that -they were declared: - -@example -class foo @{ public: int b; int a; foo (); @}; -@end example - -@noindent -Thus, @samp{b} will be initialized with 2 first, then @samp{a} will be -initialized with 1, regardless of how they're listed in the mem-initializer. - -@item The Explicit Keyword - -The use of @code{explicit} on a constructor is used by @code{grokdeclarator} -to set the field @code{DECL_NONCONVERTING_P}. That value is used by -@code{build_method_call} and @code{build_user_type_conversion_1} to decide -if a particular constructor should be used as a candidate for conversions. - -@end itemize - -@node Glossary, Macros, Implementation Specifics, Top -@section Glossary - -@table @r -@item binfo -The main data structure in the compiler used to represent the -inheritance relationships between classes. The data in the binfo can be -accessed by the BINFO_ accessor macros. - -@item vtable -@itemx virtual function table - -The virtual function table holds information used in virtual function -dispatching. In the compiler, they are usually referred to as vtables, -or vtbls. The first index is not used in the normal way, I believe it -is probably used for the virtual destructor. There are two forms of -virtual tables, one that has offsets in addition to pointers, and one -using thunks. @xref{Vtables}. - -@item vfield - -vfields can be thought of as the base information needed to build -vtables. For every vtable that exists for a class, there is a vfield. -See also vtable and virtual function table pointer. When a type is used -as a base class to another type, the virtual function table for the -derived class can be based upon the vtable for the base class, just -extended to include the additional virtual methods declared in the -derived class. The virtual function table from a virtual base class is -never reused in a derived class. @code{is_normal} depends upon this. - -@item virtual function table pointer - -These are @code{FIELD_DECL}s that are pointer types that point to -vtables. See also vtable and vfield. -@end table - -@node Macros, Typical Behavior, Glossary, Top -@section Macros - -This section describes some of the macros used on trees. The list -should be alphabetical. Eventually all macros should be documented -here. - -@table @code -@item BINFO_BASETYPES -A vector of additional binfos for the types inherited by this basetype. -The binfos are fully unshared (except for virtual bases, in which -case the binfo structure is shared). - - If this basetype describes type D as inherited in C, - and if the basetypes of D are E anf F, - then this vector contains binfos for inheritance of E and F by C. - -Has values of: - - TREE_VECs - - -@item BINFO_INHERITANCE_CHAIN -Temporarily used to represent specific inheritances. It usually points -to the binfo associated with the lesser derived type, but it can be -reversed by reverse_path. For example: - -@example - Z ZbY least derived - | - Y YbX - | - X Xb most derived - -TYPE_BINFO (X) == Xb -BINFO_INHERITANCE_CHAIN (Xb) == YbX -BINFO_INHERITANCE_CHAIN (Yb) == ZbY -BINFO_INHERITANCE_CHAIN (Zb) == 0 -@end example - -Not sure is the above is really true, get_base_distance has is point -towards the most derived type, opposite from above. - -Set by build_vbase_path, recursive_bounded_basetype_p, -get_base_distance, lookup_field, lookup_fnfields, and reverse_path. - -What things can this be used on: - - TREE_VECs that are binfos - - -@item BINFO_OFFSET -The offset where this basetype appears in its containing type. -BINFO_OFFSET slot holds the offset (in bytes) from the base of the -complete object to the base of the part of the object that is allocated -on behalf of this `type'. This is always 0 except when there is -multiple inheritance. - -Used on TREE_VEC_ELTs of the binfos BINFO_BASETYPES (...) for example. - - -@item BINFO_VIRTUALS -A unique list of functions for the virtual function table. See also -TYPE_BINFO_VIRTUALS. - -What things can this be used on: - - TREE_VECs that are binfos - - -@item BINFO_VTABLE -Used to find the VAR_DECL that is the virtual function table associated -with this binfo. See also TYPE_BINFO_VTABLE. To get the virtual -function table pointer, see CLASSTYPE_VFIELD. - -What things can this be used on: - - TREE_VECs that are binfos - -Has values of: - - VAR_DECLs that are virtual function tables - - -@item BLOCK_SUPERCONTEXT -In the outermost scope of each function, it points to the FUNCTION_DECL -node. It aids in better DWARF support of inline functions. - - -@item CLASSTYPE_TAGS -CLASSTYPE_TAGS is a linked (via TREE_CHAIN) list of member classes of a -class. TREE_PURPOSE is the name, TREE_VALUE is the type (pushclass scans -these and calls pushtag on them.) - -finish_struct scans these to produce TYPE_DECLs to add to the -TYPE_FIELDS of the type. - -It is expected that name found in the TREE_PURPOSE slot is unique, -resolve_scope_to_name is one such place that depends upon this -uniqueness. - - -@item CLASSTYPE_METHOD_VEC -The following is true after finish_struct has been called (on the -class?) but not before. Before finish_struct is called, things are -different to some extent. Contains a TREE_VEC of methods of the class. -The TREE_VEC_LENGTH is the number of differently named methods plus one -for the 0th entry. The 0th entry is always allocated, and reserved for -ctors and dtors. If there are none, TREE_VEC_ELT(N,0) == NULL_TREE. -Each entry of the TREE_VEC is a FUNCTION_DECL. For each FUNCTION_DECL, -there is a DECL_CHAIN slot. If the FUNCTION_DECL is the last one with a -given name, the DECL_CHAIN slot is NULL_TREE. Otherwise it is the next -method that has the same name (but a different signature). It would -seem that it is not true that because the DECL_CHAIN slot is used in -this way, we cannot call pushdecl to put the method in the global scope -(cause that would overwrite the TREE_CHAIN slot), because they use -different _CHAINs. finish_struct_methods setups up one version of the -TREE_CHAIN slots on the FUNCTION_DECLs. - -friends are kept in TREE_LISTs, so that there's no need to use their -TREE_CHAIN slot for anything. - -Has values of: - - TREE_VECs - - -@item CLASSTYPE_VFIELD -Seems to be in the process of being renamed TYPE_VFIELD. Use on types -to get the main virtual function table pointer. To get the virtual -function table use BINFO_VTABLE (TYPE_BINFO ()). - -Has values of: - - FIELD_DECLs that are virtual function table pointers - -What things can this be used on: - - RECORD_TYPEs - - -@item DECL_CLASS_CONTEXT -Identifies the context that the _DECL was found in. For virtual function -tables, it points to the type associated with the virtual function -table. See also DECL_CONTEXT, DECL_FIELD_CONTEXT and DECL_FCONTEXT. - -The difference between this and DECL_CONTEXT, is that for virtuals -functions like: - -@example -struct A -@{ - virtual int f (); -@}; - -struct B : A -@{ - int f (); -@}; - -DECL_CONTEXT (A::f) == A -DECL_CLASS_CONTEXT (A::f) == A - -DECL_CONTEXT (B::f) == A -DECL_CLASS_CONTEXT (B::f) == B -@end example - -Has values of: - - RECORD_TYPEs, or UNION_TYPEs - -What things can this be used on: - - TYPE_DECLs, _DECLs - - -@item DECL_CONTEXT -Identifies the context that the _DECL was found in. Can be used on -virtual function tables to find the type associated with the virtual -function table, but since they are FIELD_DECLs, DECL_FIELD_CONTEXT is a -better access method. Internally the same as DECL_FIELD_CONTEXT, so -don't us both. See also DECL_FIELD_CONTEXT, DECL_FCONTEXT and -DECL_CLASS_CONTEXT. - -Has values of: - - RECORD_TYPEs - - -What things can this be used on: - -@display -VAR_DECLs that are virtual function tables -_DECLs -@end display - - -@item DECL_FIELD_CONTEXT -Identifies the context that the FIELD_DECL was found in. Internally the -same as DECL_CONTEXT, so don't us both. See also DECL_CONTEXT, -DECL_FCONTEXT and DECL_CLASS_CONTEXT. - -Has values of: - - RECORD_TYPEs - -What things can this be used on: - -@display -FIELD_DECLs that are virtual function pointers -FIELD_DECLs -@end display - - -@item DECL_NAME - -Has values of: - -@display -0 for things that don't have names -IDENTIFIER_NODEs for TYPE_DECLs -@end display - -@item DECL_IGNORED_P -A bit that can be set to inform the debug information output routines in -the back-end that a certain _DECL node should be totally ignored. - -Used in cases where it is known that the debugging information will be -output in another file, or where a sub-type is known not to be needed -because the enclosing type is not needed. - -A compiler constructed virtual destructor in derived classes that do not -define an explicit destructor that was defined explicit in a base class -has this bit set as well. Also used on __FUNCTION__ and -__PRETTY_FUNCTION__ to mark they are ``compiler generated.'' c-decl and -c-lex.c both want DECL_IGNORED_P set for ``internally generated vars,'' -and ``user-invisible variable.'' - -Functions built by the C++ front-end such as default destructors, -virtual destructors and default constructors want to be marked that -they are compiler generated, but unsure why. - -Currently, it is used in an absolute way in the C++ front-end, as an -optimization, to tell the debug information output routines to not -generate debugging information that will be output by another separately -compiled file. - - -@item DECL_VIRTUAL_P -A flag used on FIELD_DECLs and VAR_DECLs. (Documentation in tree.h is -wrong.) Used in VAR_DECLs to indicate that the variable is a vtable. -It is also used in FIELD_DECLs for vtable pointers. - -What things can this be used on: - - FIELD_DECLs and VAR_DECLs - - -@item DECL_VPARENT -Used to point to the parent type of the vtable if there is one, else it -is just the type associated with the vtable. Because of the sharing of -virtual function tables that goes on, this slot is not very useful, and -is in fact, not used in the compiler at all. It can be removed. - -What things can this be used on: - - VAR_DECLs that are virtual function tables - -Has values of: - - RECORD_TYPEs maybe UNION_TYPEs - - -@item DECL_FCONTEXT -Used to find the first baseclass in which this FIELD_DECL is defined. -See also DECL_CONTEXT, DECL_FIELD_CONTEXT and DECL_CLASS_CONTEXT. - -How it is used: - - Used when writing out debugging information about vfield and - vbase decls. - -What things can this be used on: - - FIELD_DECLs that are virtual function pointers - FIELD_DECLs - - -@item DECL_REFERENCE_SLOT -Used to hold the initialize for the reference. - -What things can this be used on: - - PARM_DECLs and VAR_DECLs that have a reference type - - -@item DECL_VINDEX -Used for FUNCTION_DECLs in two different ways. Before the structure -containing the FUNCTION_DECL is laid out, DECL_VINDEX may point to a -FUNCTION_DECL in a base class which is the FUNCTION_DECL which this -FUNCTION_DECL will replace as a virtual function. When the class is -laid out, this pointer is changed to an INTEGER_CST node which is -suitable to find an index into the virtual function table. See -get_vtable_entry as to how one can find the right index into the virtual -function table. The first index 0, of a virtual function table it not -used in the normal way, so the first real index is 1. - -DECL_VINDEX may be a TREE_LIST, that would seem to be a list of -overridden FUNCTION_DECLs. add_virtual_function has code to deal with -this when it uses the variable base_fndecl_list, but it would seem that -somehow, it is possible for the TREE_LIST to pursist until method_call, -and it should not. - - -What things can this be used on: - - FUNCTION_DECLs - - -@item DECL_SOURCE_FILE -Identifies what source file a particular declaration was found in. - -Has values of: - - "<built-in>" on TYPE_DECLs to mean the typedef is built in - - -@item DECL_SOURCE_LINE -Identifies what source line number in the source file the declaration -was found at. - -Has values of: - -@display -0 for an undefined label - -0 for TYPE_DECLs that are internally generated - -0 for FUNCTION_DECLs for functions generated by the compiler - (not yet, but should be) - -0 for ``magic'' arguments to functions, that the user has no - control over -@end display - - -@item TREE_USED - -Has values of: - - 0 for unused labels - - -@item TREE_ADDRESSABLE -A flag that is set for any type that has a constructor. - - -@item TREE_COMPLEXITY -They seem a kludge way to track recursion, poping, and pushing. They only -appear in cp-decl.c and cp-decl2.c, so the are a good candidate for -proper fixing, and removal. - - -@item TREE_HAS_CONSTRUCTOR -A flag to indicate when a CALL_EXPR represents a call to a constructor. -If set, we know that the type of the object, is the complete type of the -object, and that the value returned is nonnull. When used in this -fashion, it is an optimization. Can also be used on SAVE_EXPRs to -indicate when they are of fixed type and nonnull. Can also be used on -INDIRECT_EXPRs on CALL_EXPRs that represent a call to a constructor. - - -@item TREE_PRIVATE -Set for FIELD_DECLs by finish_struct. But not uniformly set. - -The following routines do something with PRIVATE access: -build_method_call, alter_access, finish_struct_methods, -finish_struct, convert_to_aggr, CWriteLanguageDecl, CWriteLanguageType, -CWriteUseObject, compute_access, lookup_field, dfs_pushdecl, -GNU_xref_member, dbxout_type_fields, dbxout_type_method_1 - - -@item TREE_PROTECTED -The following routines do something with PROTECTED access: -build_method_call, alter_access, finish_struct, convert_to_aggr, -CWriteLanguageDecl, CWriteLanguageType, CWriteUseObject, -compute_access, lookup_field, GNU_xref_member, dbxout_type_fields, -dbxout_type_method_1 - - -@item TYPE_BINFO -Used to get the binfo for the type. - -Has values of: - - TREE_VECs that are binfos - -What things can this be used on: - - RECORD_TYPEs - - -@item TYPE_BINFO_BASETYPES -See also BINFO_BASETYPES. - -@item TYPE_BINFO_VIRTUALS -A unique list of functions for the virtual function table. See also -BINFO_VIRTUALS. - -What things can this be used on: - - RECORD_TYPEs - - -@item TYPE_BINFO_VTABLE -Points to the virtual function table associated with the given type. -See also BINFO_VTABLE. - -What things can this be used on: - - RECORD_TYPEs - -Has values of: - - VAR_DECLs that are virtual function tables - - -@item TYPE_NAME -Names the type. - -Has values of: - -@display -0 for things that don't have names. -should be IDENTIFIER_NODE for RECORD_TYPEs UNION_TYPEs and - ENUM_TYPEs. -TYPE_DECL for RECORD_TYPEs, UNION_TYPEs and ENUM_TYPEs, but - shouldn't be. -TYPE_DECL for typedefs, unsure why. -@end display - -What things can one use this on: - -@display -TYPE_DECLs -RECORD_TYPEs -UNION_TYPEs -ENUM_TYPEs -@end display - -History: - - It currently points to the TYPE_DECL for RECORD_TYPEs, - UNION_TYPEs and ENUM_TYPEs, but it should be history soon. - - -@item TYPE_METHODS -Synonym for @code{CLASSTYPE_METHOD_VEC}. Chained together with -@code{TREE_CHAIN}. @file{dbxout.c} uses this to get at the methods of a -class. - - -@item TYPE_DECL -Used to represent typedefs, and used to represent bindings layers. - -Components: - - DECL_NAME is the name of the typedef. For example, foo would - be found in the DECL_NAME slot when @code{typedef int foo;} is - seen. - - DECL_SOURCE_LINE identifies what source line number in the - source file the declaration was found at. A value of 0 - indicates that this TYPE_DECL is just an internal binding layer - marker, and does not correspond to a user supplied typedef. - - DECL_SOURCE_FILE - -@item TYPE_FIELDS -A linked list (via @code{TREE_CHAIN}) of member types of a class. The -list can contain @code{TYPE_DECL}s, but there can also be other things -in the list apparently. See also @code{CLASSTYPE_TAGS}. - - -@item TYPE_VIRTUAL_P -A flag used on a @code{FIELD_DECL} or a @code{VAR_DECL}, indicates it is -a virtual function table or a pointer to one. When used on a -@code{FUNCTION_DECL}, indicates that it is a virtual function. When -used on an @code{IDENTIFIER_NODE}, indicates that a function with this -same name exists and has been declared virtual. - -When used on types, it indicates that the type has virtual functions, or -is derived from one that does. - -Not sure if the above about virtual function tables is still true. See -also info on @code{DECL_VIRTUAL_P}. - -What things can this be used on: - - FIELD_DECLs, VAR_DECLs, FUNCTION_DECLs, IDENTIFIER_NODEs - - -@item VF_BASETYPE_VALUE -Get the associated type from the binfo that caused the given vfield to -exist. This is the least derived class (the most parent class) that -needed a virtual function table. It is probably the case that all uses -of this field are misguided, but they need to be examined on a -case-by-case basis. See history for more information on why the -previous statement was made. - -Set at @code{finish_base_struct} time. - -What things can this be used on: - - TREE_LISTs that are vfields - -History: - - This field was used to determine if a virtual function table's - slot should be filled in with a certain virtual function, by - checking to see if the type returned by VF_BASETYPE_VALUE was a - parent of the context in which the old virtual function existed. - This incorrectly assumes that a given type _could_ not appear as - a parent twice in a given inheritance lattice. For single - inheritance, this would in fact work, because a type could not - possibly appear more than once in an inheritance lattice, but - with multiple inheritance, a type can appear more than once. - - -@item VF_BINFO_VALUE -Identifies the binfo that caused this vfield to exist. If this vfield -is from the first direct base class that has a virtual function table, -then VF_BINFO_VALUE is NULL_TREE, otherwise it will be the binfo of the -direct base where the vfield came from. Can use @code{TREE_VIA_VIRTUAL} -on result to find out if it is a virtual base class. Related to the -binfo found by - -@example -get_binfo (VF_BASETYPE_VALUE (vfield), t, 0) -@end example - -@noindent -where @samp{t} is the type that has the given vfield. - -@example -get_binfo (VF_BASETYPE_VALUE (vfield), t, 0) -@end example - -@noindent -will return the binfo for the given vfield. - -May or may not be set at @code{modify_vtable_entries} time. Set at -@code{finish_base_struct} time. - -What things can this be used on: - - TREE_LISTs that are vfields - - -@item VF_DERIVED_VALUE -Identifies the type of the most derived class of the vfield, excluding -the class this vfield is for. - -Set at @code{finish_base_struct} time. - -What things can this be used on: - - TREE_LISTs that are vfields - - -@item VF_NORMAL_VALUE -Identifies the type of the most derived class of the vfield, including -the class this vfield is for. - -Set at @code{finish_base_struct} time. - -What things can this be used on: - - TREE_LISTs that are vfields - - -@item WRITABLE_VTABLES -This is a option that can be defined when building the compiler, that -will cause the compiler to output vtables into the data segment so that -the vtables maybe written. This is undefined by default, because -normally the vtables should be unwritable. People that implement object -I/O facilities may, or people that want to change the dynamic type of -objects may want to have the vtables writable. Another way of achieving -this would be to make a copy of the vtable into writable memory, but the -drawback there is that that method only changes the type for one object. - -@end table - -@node Typical Behavior, Coding Conventions, Macros, Top -@section Typical Behavior - -@cindex parse errors - -Whenever seemingly normal code fails with errors like -@code{syntax error at `\@{'}, it's highly likely that grokdeclarator is -returning a NULL_TREE for whatever reason. - -@node Coding Conventions, Templates, Typical Behavior, Top -@section Coding Conventions - -It should never be that case that trees are modified in-place by the -back-end, @emph{unless} it is guaranteed that the semantics are the same -no matter how shared the tree structure is. @file{fold-const.c} still -has some cases where this is not true, but rms hypothesizes that this -will never be a problem. - -@node Templates, Access Control, Coding Conventions, Top -@section Templates - -A template is represented by a @code{TEMPLATE_DECL}. The specific -fields used are: - -@table @code -@item DECL_TEMPLATE_RESULT -The generic decl on which instantiations are based. This looks just -like any other decl. - -@item DECL_TEMPLATE_PARMS -The parameters to this template. -@end table - -The generic decl is parsed as much like any other decl as possible, -given the parameterization. The template decl is not built up until the -generic decl has been completed. For template classes, a template decl -is generated for each member function and static data member, as well. - -Template members of template classes are represented by a TEMPLATE_DECL -for the class' parameters around another TEMPLATE_DECL for the member's -parameters. - -All declarations that are instantiations or specializations of templates -refer to their template and parameters through DECL_TEMPLATE_INFO. - -How should I handle parsing member functions with the proper param -decls? Set them up again or try to use the same ones? Currently we do -the former. We can probably do this without any extra machinery in -store_pending_inline, by deducing the parameters from the decl in -do_pending_inlines. PRE_PARSED_TEMPLATE_DECL? - -If a base is a parm, we can't check anything about it. If a base is not -a parm, we need to check it for name binding. Do finish_base_struct if -no bases are parameterized (only if none, including indirect, are -parms). Nah, don't bother trying to do any of this until instantiation --- we only need to do name binding in advance. - -Always set up method vec and fields, inc. synthesized methods. Really? -We can't know the types of the copy folks, or whether we need a -destructor, or can have a default ctor, until we know our bases and -fields. Otherwise, we can assume and fix ourselves later. Hopefully. - -@node Access Control, Error Reporting, Templates, Top -@section Access Control -The function compute_access returns one of three values: - -@table @code -@item access_public -means that the field can be accessed by the current lexical scope. - -@item access_protected -means that the field cannot be accessed by the current lexical scope -because it is protected. - -@item access_private -means that the field cannot be accessed by the current lexical scope -because it is private. -@end table - -DECL_ACCESS is used for access declarations; alter_access creates a list -of types and accesses for a given decl. - -Formerly, DECL_@{PUBLIC,PROTECTED,PRIVATE@} corresponded to the return -codes of compute_access and were used as a cache for compute_access. -Now they are not used at all. - -TREE_PROTECTED and TREE_PRIVATE are used to record the access levels -granted by the containing class. BEWARE: TREE_PUBLIC means something -completely unrelated to access control! - -@node Error Reporting, Parser, Access Control, Top -@section Error Reporting - -The C++ front-end uses a call-back mechanism to allow functions to print -out reasonable strings for types and functions without putting extra -logic in the functions where errors are found. The interface is through -the @code{cp_error} function (or @code{cp_warning}, etc.). The -syntax is exactly like that of @code{error}, except that a few more -conversions are supported: - -@itemize @bullet -@item -%C indicates a value of `enum tree_code'. -@item -%D indicates a *_DECL node. -@item -%E indicates a *_EXPR node. -@item -%L indicates a value of `enum languages'. -@item -%P indicates the name of a parameter (i.e. "this", "1", "2", ...) -@item -%T indicates a *_TYPE node. -@item -%O indicates the name of an operator (MODIFY_EXPR -> "operator ="). - -@end itemize - -There is some overlap between these; for instance, any of the node -options can be used for printing an identifier (though only @code{%D} -tries to decipher function names). - -For a more verbose message (@code{class foo} as opposed to just @code{foo}, -including the return type for functions), use @code{%#c}. -To have the line number on the error message indicate the line of the -DECL, use @code{cp_error_at} and its ilk; to indicate which argument you want, -use @code{%+D}, or it will default to the first. - -@node Parser, Exception Handling, Error Reporting, Top -@section Parser - -Some comments on the parser: - -The @code{after_type_declarator} / @code{notype_declarator} hack is -necessary in order to allow redeclarations of @code{TYPENAME}s, for -instance - -@example -typedef int foo; -class A @{ - char *foo; -@}; -@end example - -In the above, the first @code{foo} is parsed as a @code{notype_declarator}, -and the second as a @code{after_type_declarator}. - -Ambiguities: - -There are currently four reduce/reduce ambiguities in the parser. They are: - -1) Between @code{template_parm} and -@code{named_class_head_sans_basetype}, for the tokens @code{aggr -identifier}. This situation occurs in code looking like - -@example -template <class T> class A @{ @}; -@end example - -It is ambiguous whether @code{class T} should be parsed as the -declaration of a template type parameter named @code{T} or an unnamed -constant parameter of type @code{class T}. Section 14.6, paragraph 3 of -the January '94 working paper states that the first interpretation is -the correct one. This ambiguity results in two reduce/reduce conflicts. - -2) Between @code{primary} and @code{type_id} for code like @samp{int()} -in places where both can be accepted, such as the argument to -@code{sizeof}. Section 8.1 of the pre-San Diego working paper specifies -that these ambiguous constructs will be interpreted as @code{typename}s. -This ambiguity results in six reduce/reduce conflicts between -@samp{absdcl} and @samp{functional_cast}. - -3) Between @code{functional_cast} and -@code{complex_direct_notype_declarator}, for various token strings. -This situation occurs in code looking like - -@example -int (*a); -@end example - -This code is ambiguous; it could be a declaration of the variable -@samp{a} as a pointer to @samp{int}, or it could be a functional cast of -@samp{*a} to @samp{int}. Section 6.8 specifies that the former -interpretation is correct. This ambiguity results in 7 reduce/reduce -conflicts. Another aspect of this ambiguity is code like 'int (x[2]);', -which is resolved at the '[' and accounts for 6 reduce/reduce conflicts -between @samp{direct_notype_declarator} and -@samp{primary}/@samp{overqualified_id}. Finally, there are 4 r/r -conflicts between @samp{expr_or_declarator} and @samp{primary} over code -like 'int (a);', which could probably be resolved but would also -probably be more trouble than it's worth. In all, this situation -accounts for 17 conflicts. Ack! - -The second case above is responsible for the failure to parse 'LinppFile -ppfile (String (argv[1]), &outs, argc, argv);' (from Rogue Wave -Math.h++) as an object declaration, and must be fixed so that it does -not resolve until later. - -4) Indirectly between @code{after_type_declarator} and @code{parm}, for -type names. This occurs in (as one example) code like - -@example -typedef int foo, bar; -class A @{ - foo (bar); -@}; -@end example - -What is @code{bar} inside the class definition? We currently interpret -it as a @code{parm}, as does Cfront, but IBM xlC interprets it as an -@code{after_type_declarator}. I believe that xlC is correct, in light -of 7.1p2, which says "The longest sequence of @i{decl-specifiers} that -could possibly be a type name is taken as the @i{decl-specifier-seq} of -a @i{declaration}." However, it seems clear that this rule must be -violated in the case of constructors. This ambiguity accounts for 8 -conflicts. - -Unlike the others, this ambiguity is not recognized by the Working Paper. - -@node Exception Handling, Free Store, Parser, Top -@section Exception Handling - -Note, exception handling in g++ is still under development. - -This section describes the mapping of C++ exceptions in the C++ -front-end, into the back-end exception handling framework. - -The basic mechanism of exception handling in the back-end is -unwind-protect a la elisp. This is a general, robust, and language -independent representation for exceptions. - -The C++ front-end exceptions are mapping into the unwind-protect -semantics by the C++ front-end. The mapping is describe below. - -When -frtti is used, rtti is used to do exception object type checking, -when it isn't used, the encoded name for the type of the object being -thrown is used instead. All code that originates exceptions, even code -that throws exceptions as a side effect, like dynamic casting, and all -code that catches exceptions must be compiled with either -frtti, or --fno-rtti. It is not possible to mix rtti base exception handling -objects with code that doesn't use rtti. The exceptions to this, are -code that doesn't catch or throw exceptions, catch (...), and code that -just rethrows an exception. - -Currently we use the normal mangling used in building functions names -(int's are "i", const char * is PCc) to build the non-rtti base type -descriptors for exception handling. These descriptors are just plain -NULL terminated strings, and internally they are passed around as char -*. - -In C++, all cleanups should be protected by exception regions. The -region starts just after the reason why the cleanup is created has -ended. For example, with an automatic variable, that has a constructor, -it would be right after the constructor is run. The region ends just -before the finalization is expanded. Since the backend may expand the -cleanup multiple times along different paths, once for normal end of the -region, once for non-local gotos, once for returns, etc, the backend -must take special care to protect the finalization expansion, if the -expansion is for any other reason than normal region end, and it is -`inline' (it is inside the exception region). The backend can either -choose to move them out of line, or it can created an exception region -over the finalization to protect it, and in the handler associated with -it, it would not run the finalization as it otherwise would have, but -rather just rethrow to the outer handler, careful to skip the normal -handler for the original region. - -In Ada, they will use the more runtime intensive approach of having -fewer regions, but at the cost of additional work at run time, to keep a -list of things that need cleanups. When a variable has finished -construction, they add the cleanup to the list, when the come to the end -of the lifetime of the variable, the run the list down. If the take a -hit before the section finishes normally, they examine the list for -actions to perform. I hope they add this logic into the back-end, as it -would be nice to get that alternative approach in C++. - -On an rs6000, xlC stores exception objects on that stack, under the try -block. When is unwinds down into a handler, the frame pointer is -adjusted back to the normal value for the frame in which the handler -resides, and the stack pointer is left unchanged from the time at which -the object was thrown. This is so that there is always someplace for -the exception object, and nothing can overwrite it, once we start -throwing. The only bad part, is that the stack remains large. - -The below points out some things that work in g++'s exception handling. - -All completely constructed temps and local variables are cleaned up in -all unwinded scopes. Completely constructed parts of partially -constructed objects are cleaned up. This includes partially built -arrays. Exception specifications are now handled. Thrown objects are -now cleaned up all the time. We can now tell if we have an active -exception being thrown or not (__eh_type != 0). We use this to call -terminate if someone does a throw; without there being an active -exception object. uncaught_exception () works. Exception handling -should work right if you optimize. Exception handling should work with --fpic or -fPIC. - -The below points out some flaws in g++'s exception handling, as it now -stands. - -Only exact type matching or reference matching of throw types works when --fno-rtti is used. Only works on a SPARC (like Suns) (both -mflat and --mno-flat models work), SPARClite, Hitachi SH, i386, arm, rs6000, -PowerPC, Alpha, mips, VAX, m68k and z8k machines. SPARC v9 may not -work. HPPA is mostly done, but throwing between a shared library and -user code doesn't yet work. Some targets have support for data-driven -unwinding. Partial support is in for all other machines, but a stack -unwinder called __unwind_function has to be written, and added to -libgcc2 for them. The new EH code doesn't rely upon the -__unwind_function for C++ code, instead it creates per function -unwinders right inside the function, unfortunately, on many platforms -the definition of RETURN_ADDR_RTX in the tm.h file for the machine port -is wrong. See below for details on __unwind_function. RTL_EXPRs for EH -cond variables for && and || exprs should probably be wrapped in -UNSAVE_EXPRs, and RTL_EXPRs tweaked so that they can be unsaved. - -We only do pointer conversions on exception matching a la 15.3 p2 case -3: `A handler with type T, const T, T&, or const T& is a match for a -throw-expression with an object of type E if [3]T is a pointer type and -E is a pointer type that can be converted to T by a standard pointer -conversion (_conv.ptr_) not involving conversions to pointers to private -or protected base classes.' when -frtti is given. - -We don't call delete on new expressions that die because the ctor threw -an exception. See except/18 for a test case. - -15.2 para 13: The exception being handled should be rethrown if control -reaches the end of a handler of the function-try-block of a constructor -or destructor, right now, it is not. - -15.2 para 12: If a return statement appears in a handler of -function-try-block of a constructor, the program is ill-formed, but this -isn't diagnosed. - -15.2 para 11: If the handlers of a function-try-block contain a jump -into the body of a constructor or destructor, the program is ill-formed, -but this isn't diagnosed. - -15.2 para 9: Check that the fully constructed base classes and members -of an object are destroyed before entering the handler of a -function-try-block of a constructor or destructor for that object. - -build_exception_variant should sort the incoming list, so that it -implements set compares, not exact list equality. Type smashing should -smash exception specifications using set union. - -Thrown objects are usually allocated on the heap, in the usual way. If -one runs out of heap space, throwing an object will probably never work. -This could be relaxed some by passing an __in_chrg parameter to track -who has control over the exception object. Thrown objects are not -allocated on the heap when they are pointer to object types. We should -extend it so that all small (<4*sizeof(void*)) objects are stored -directly, instead of allocated on the heap. - -When the backend returns a value, it can create new exception regions -that need protecting. The new region should rethrow the object in -context of the last associated cleanup that ran to completion. - -The structure of the code that is generated for C++ exception handling -code is shown below: - -@example -Ln: throw value; - copy value onto heap - jump throw (Ln, id, address of copy of value on heap) - - try @{ -+Lstart: the start of the main EH region -|... ... -+Lend: the end of the main EH region - @} catch (T o) @{ - ...1 - @} -Lresume: - nop used to make sure there is something before - the next region ends, if there is one -... ... - - jump Ldone -[ -Lmainhandler: handler for the region Lstart-Lend - cleanup -] zero or more, depending upon automatic vars with dtors -+Lpartial: -| jump Lover -+Lhere: - rethrow (Lhere, same id, same obj); -Lterm: handler for the region Lpartial-Lhere - call terminate -Lover: -[ - [ - call throw_type_match - if (eq) @{ - ] these lines disappear when there is no catch condition -+Lsregion2: -| ...1 -| jump Lresume -|Lhandler: handler for the region Lsregion2-Leregion2 -| rethrow (Lresume, same id, same obj); -+Leregion2 - @} -] there are zero or more of these sections, depending upon how many - catch clauses there are ------------------------------ expand_end_all_catch -------------------------- - here we have fallen off the end of all catch - clauses, so we rethrow to outer - rethrow (Lresume, same id, same obj); ------------------------------ expand_end_all_catch -------------------------- -[ -L1: maybe throw routine -] depending upon if we have expanded it or not -Ldone: - ret - -start_all_catch emits labels: Lresume, - -@end example - -The __unwind_function takes a pointer to the throw handler, and is -expected to pop the stack frame that was built to call it, as well as -the frame underneath and then jump to the throw handler. It must -restore all registers to their proper values as well as all other -machine state as determined by the context in which we are unwinding -into. The way I normally start is to compile: - - void *g; - foo(void* a) @{ g = a; @} - -with -S, and change the thing that alters the PC (return, or ret -usually) to not alter the PC, making sure to leave all other semantics -(like adjusting the stack pointer, or frame pointers) in. After that, -replicate the prologue once more at the end, again, changing the PC -altering instructions, and finally, at the very end, jump to `g'. - -It takes about a week to write this routine, if someone wants to -volunteer to write this routine for any architecture, exception support -for that architecture will be added to g++. Please send in those code -donations. One other thing that needs to be done, is to double check -that __builtin_return_address (0) works. - -@subsection Specific Targets - -For the alpha, the __unwind_function will be something resembling: - -@example -void -__unwind_function(void *ptr) -@{ - /* First frame */ - asm ("ldq $15, 8($30)"); /* get the saved frame ptr; 15 is fp, 30 is sp */ - asm ("bis $15, $15, $30"); /* reload sp with the fp we found */ - - /* Second frame */ - asm ("ldq $15, 8($30)"); /* fp */ - asm ("bis $15, $15, $30"); /* reload sp with the fp we found */ - - /* Return */ - asm ("ret $31, ($16), 1"); /* return to PTR, stored in a0 */ -@} -@end example - -@noindent -However, there are a few problems preventing it from working. First of -all, the gcc-internal function @code{__builtin_return_address} needs to -work given an argument of 0 for the alpha. As it stands as of August -30th, 1995, the code for @code{BUILT_IN_RETURN_ADDRESS} in @file{expr.c} -will definitely not work on the alpha. Instead, we need to define -the macros @code{DYNAMIC_CHAIN_ADDRESS} (maybe), -@code{RETURN_ADDR_IN_PREVIOUS_FRAME}, and definitely need a new -definition for @code{RETURN_ADDR_RTX}. - -In addition (and more importantly), we need a way to reliably find the -frame pointer on the alpha. The use of the value 8 above to restore the -frame pointer (register 15) is incorrect. On many systems, the frame -pointer is consistently offset to a specific point on the stack. On the -alpha, however, the frame pointer is pushed last. First the return -address is stored, then any other registers are saved (e.g., @code{s0}), -and finally the frame pointer is put in place. So @code{fp} could have -an offset of 8, but if the calling function saved any registers at all, -they add to the offset. - -The only places the frame size is noted are with the @samp{.frame} -directive, for use by the debugger and the OSF exception handling model -(useless to us), and in the initial computation of the new value for -@code{sp}, the stack pointer. For example, the function may start with: - -@example -lda $30,-32($30) -.frame $15,32,$26,0 -@end example - -@noindent -The 32 above is exactly the value we need. With this, we can be sure -that the frame pointer is stored 8 bytes less---in this case, at 24(sp)). -The drawback is that there is no way that I (Brendan) have found to let -us discover the size of a previous frame @emph{inside} the definition -of @code{__unwind_function}. - -So to accomplish exception handling support on the alpha, we need two -things: first, a way to figure out where the frame pointer was stored, -and second, a functional @code{__builtin_return_address} implementation -for except.c to be able to use it. - -Or just support DWARF 2 unwind info. - -@subsection New Backend Exception Support - -This subsection discusses various aspects of the design of the -data-driven model being implemented for the exception handling backend. - -The goal is to generate enough data during the compilation of user code, -such that we can dynamically unwind through functions at run time with a -single routine (@code{__throw}) that lives in libgcc.a, built by the -compiler, and dispatch into associated exception handlers. - -This information is generated by the DWARF 2 debugging backend, and -includes all of the information __throw needs to unwind an arbitrary -frame. It specifies where all of the saved registers and the return -address can be found at any point in the function. - -Major disadvantages when enabling exceptions are: - -@itemize @bullet -@item -Code that uses caller saved registers, can't, when flow can be -transferred into that code from an exception handler. In high performance -code this should not usually be true, so the effects should be minimal. - -@end itemize - -@subsection Backend Exception Support - -The backend must be extended to fully support exceptions. Right now -there are a few hooks into the alpha exception handling backend that -resides in the C++ frontend from that backend that allows exception -handling to work in g++. An exception region is a segment of generated -code that has a handler associated with it. The exception regions are -denoted in the generated code as address ranges denoted by a starting PC -value and an ending PC value of the region. Some of the limitations -with this scheme are: - -@itemize @bullet -@item -The backend replicates insns for such things as loop unrolling and -function inlining. Right now, there are no hooks into the frontend's -exception handling backend to handle the replication of insns. When -replication happens, a new exception region descriptor needs to be -generated for the new region. - -@item -The backend expects to be able to rearrange code, for things like jump -optimization. Any rearranging of the code needs have exception region -descriptors updated appropriately. - -@item -The backend can eliminate dead code. Any associated exception region -descriptor that refers to fully contained code that has been eliminated -should also be removed, although not doing this is harmless in terms of -semantics. - -@end itemize - -The above is not meant to be exhaustive, but does include all things I -have thought of so far. I am sure other limitations exist. - -Below are some notes on the migration of the exception handling code -backend from the C++ frontend to the backend. - -NOTEs are to be used to denote the start of an exception region, and the -end of the region. I presume that the interface used to generate these -notes in the backend would be two functions, start_exception_region and -end_exception_region (or something like that). The frontends are -required to call them in pairs. When marking the end of a region, an -argument can be passed to indicate the handler for the marked region. -This can be passed in many ways, currently a tree is used. Another -possibility would be insns for the handler, or a label that denotes a -handler. I have a feeling insns might be the best way to pass it. -Semantics are, if an exception is thrown inside the region, control is -transferred unconditionally to the handler. If control passes through -the handler, then the backend is to rethrow the exception, in the -context of the end of the original region. The handler is protected by -the conventional mechanisms; it is the frontend's responsibility to -protect the handler, if special semantics are required. - -This is a very low level view, and it would be nice is the backend -supported a somewhat higher level view in addition to this view. This -higher level could include source line number, name of the source file, -name of the language that threw the exception and possibly the name of -the exception. Kenner may want to rope you into doing more than just -the basics required by C++. You will have to resolve this. He may want -you to do support for non-local gotos, first scan for exception handler, -if none is found, allow the debugger to be entered, without any cleanups -being done. To do this, the backend would have to know the difference -between a cleanup-rethrower, and a real handler, if would also have to -have a way to know if a handler `matches' a thrown exception, and this -is frontend specific. - -The stack unwinder is one of the hardest parts to do. It is highly -machine dependent. The form that kenner seems to like was a couple of -macros, that would do the machine dependent grunt work. One preexisting -function that might be of some use is __builtin_return_address (). One -macro he seemed to want was __builtin_return_address, and the other -would do the hard work of fixing up the registers, adjusting the stack -pointer, frame pointer, arg pointer and so on. - - -@node Free Store, Mangling, Exception Handling, Top -@section Free Store - -@code{operator new []} adds a magic cookie to the beginning of arrays -for which the number of elements will be needed by @code{operator delete -[]}. These are arrays of objects with destructors and arrays of objects -that define @code{operator delete []} with the optional size_t argument. -This cookie can be examined from a program as follows: - -@example -typedef unsigned long size_t; -extern "C" int printf (const char *, ...); - -size_t nelts (void *p) -@{ - struct cookie @{ - size_t nelts __attribute__ ((aligned (sizeof (double)))); - @}; - - cookie *cp = (cookie *)p; - --cp; - - return cp->nelts; -@} - -struct A @{ - ~A() @{ @} -@}; - -main() -@{ - A *ap = new A[3]; - printf ("%ld\n", nelts (ap)); -@} -@end example - -@section Linkage -The linkage code in g++ is horribly twisted in order to meet two design goals: - -1) Avoid unnecessary emission of inlines and vtables. - -2) Support pedantic assemblers like the one in AIX. - -To meet the first goal, we defer emission of inlines and vtables until -the end of the translation unit, where we can decide whether or not they -are needed, and how to emit them if they are. - -@node Mangling, Vtables, Free Store, Top -@section Function name mangling for C++ and Java - -Both C++ and Jave provide overloaded function and methods, -which are methods with the same types but different parameter lists. -Selecting the correct version is done at compile time. -Though the overloaded functions have the same name in the source code, -they need to be translated into different assembler-level names, -since typical assemblers and linkers cannot handle overloading. -This process of encoding the parameter types with the method name -into a unique name is called @dfn{name mangling}. The inverse -process is called @dfn{demangling}. - -It is convenient that C++ and Java use compatible mangling schemes, -since the makes life easier for tools such as gdb, and it eases -integration between C++ and Java. - -Note there is also a standard "Jave Native Interface" (JNI) which -implements a different calling convention, and uses a different -mangling scheme. The JNI is a rather abstract ABI so Java can call methods -written in C or C++; -we are concerned here about a lower-level interface primarily -intended for methods written in Java, but that can also be used for C++ -(and less easily C). - -Note that on systems that follow BSD tradition, a C identifier @code{var} -would get "mangled" into the assembler name @samp{_var}. On such -systems, all other mangled names are also prefixed by a @samp{_} -which is not shown in the following examples. - -@subsection Method name mangling - -C++ mangles a method by emitting the function name, followed by @code{__}, -followed by encodings of any method qualifiers (such as @code{const}), -followed by the mangling of the method's class, -followed by the mangling of the parameters, in order. - -For example @code{Foo::bar(int, long) const} is mangled -as @samp{bar__C3Fooil}. - -For a constructor, the method name is left out. -That is @code{Foo::Foo(int, long) const} is mangled -as @samp{__C3Fooil}. - -GNU Java does the same. - -@subsection Primitive types - -The C++ types @code{int}, @code{long}, @code{short}, @code{char}, -and @code{long long} are mangled as @samp{i}, @samp{l}, -@samp{s}, @samp{c}, and @samp{x}, respectively. -The corresponding unsigned types have @samp{U} prefixed -to the mangling. The type @code{signed char} is mangled @samp{Sc}. - -The C++ and Java floating-point types @code{float} and @code{double} -are mangled as @samp{f} and @samp{d} respectively. - -The C++ @code{bool} type and the Java @code{boolean} type are -mangled as @samp{b}. - -The C++ @code{wchar_t} and the Java @code{char} types are -mangled as @samp{w}. - -The Java integral types @code{byte}, @code{short}, @code{int} -and @code{long} are mangled as @samp{c}, @samp{s}, @samp{i}, -and @samp{x}, respectively. - -C++ code that has included @code{javatypes.h} will mangle -the typedefs @code{jbyte}, @code{jshort}, @code{jint} -and @code{jlong} as respectively @samp{c}, @samp{s}, @samp{i}, -and @samp{x}. (This has not been implemented yet.) - -@subsection Mangling of simple names - -A simple class, package, template, or namespace name is -encoded as the number of characters in the name, followed by -the actual characters. Thus the class @code{Foo} -is encoded as @samp{3Foo}. - -If any of the characters in the name are not alphanumeric -(i.e not one of the standard ASCII letters, digits, or '_'), -or the initial character is a digit, then the name is -mangled as a sequence of encoded Unicode letters. -A Unicode encoding starts with a @samp{U} to indicate -that Unicode escapes are used, followed by the number of -bytes used by the Unicode encoding, followed by the bytes -representing the encoding. ASSCI letters and -non-initial digits are encoded without change. However, all -other characters (including underscore and initial digits) are -translated into a sequence starting with an underscore, -followed by the big-endian 4-hex-digit lower-case encoding of the character. - -If a method name contains Unicode-escaped characters, the -entire mangled method name is followed by a @samp{U}. - -For example, the method @code{X\u0319::M\u002B(int)} is encoded as -@samp{M_002b__U6X_0319iU}. - - -@subsection Pointer and reference types - -A C++ pointer type is mangled as @samp{P} followed by the -mangling of the type pointed to. - -A C++ reference type as mangled as @samp{R} followed by the -mangling of the type referenced. - -A Java object reference type is equivalent -to a C++ pointer parameter, so we mangle such an parameter type -as @samp{P} followed by the mangling of the class name. - -@subsection Squangled type compression - -Squangling (enabled with the @samp{-fsquangle} option), utilizes the -@samp{B} code to indicate reuse of a previously seen type within an -indentifier. Types are recognized in a left to right manner and given -increasing values, which are appended to the code in the standard -manner. Ie, multiple digit numbers are delimited by @samp{_} -characters. A type is considered to be any non primitive type, -regardless of whether its a parameter, template parameter, or entire -template. Certain codes are considered modifiers of a type, and are not -included as part of the type. These are the @samp{C}, @samp{V}, -@samp{P}, @samp{A}, @samp{R}, @samp{U} and @samp{u} codes, denoting -constant, volatile, pointer, array, reference, unsigned, and restrict. -These codes may precede a @samp{B} type in order to make the required -modifications to the type. - -For example: -@example -template <class T> class class1 @{ @}; - -template <class T> class class2 @{ @}; - -class class3 @{ @}; - -int f(class2<class1<class3> > a ,int b, const class1<class3>&c, class3 *d) @{ @} - - B0 -> class2<class1<class3> - B1 -> class1<class3> - B2 -> class3 -@end example -Produces the mangled name @samp{f__FGt6class21Zt6class11Z6class3iRCB1PB2}. -The int parameter is a basic type, and does not receive a B encoding... - -@subsection Qualified names - -Both C++ and Java allow a class to be lexically nested inside another -class. C++ also supports namespaces (not yet implemented by G++). -Java also supports packages. - -These are all mangled the same way: First the letter @samp{Q} -indicates that we are emitting a qualified name. -That is followed by the number of parts in the qualified name. -If that number is 9 or less, it is emitted with no delimiters. -Otherwise, an underscore is written before and after the count. -Then follows each part of the qualified name, as described above. - -For example @code{Foo::\u0319::Bar} is encoded as -@samp{Q33FooU5_03193Bar}. - -Squangling utilizes the the letter @samp{K} to indicate a -remembered portion of a qualified name. As qualified names are processed -for an identifier, the names are numbered and remembered in a -manner similar to the @samp{B} type compression code. -Names are recognized left to right, and given increasing values, which are -appended to the code in the standard manner. ie, multiple digit numbers -are delimited by @samp{_} characters. - -For example -@example -class Andrew -@{ - class WasHere - @{ - class AndHereToo - @{ - @}; - @}; -@}; - -f(Andrew&r1, Andrew::WasHere& r2, Andrew::WasHere::AndHereToo& r3) @{ @} - - K0 -> Andrew - K1 -> Andrew::WasHere - K2 -> Andrew::WasHere::AndHereToo -@end example -Function @samp{f()} would be mangled as : -@samp{f__FR6AndrewRQ2K07WasHereRQ2K110AndHereToo} - -There are some occasions when either a @samp{B} or @samp{K} code could -be chosen, preference is always given to the @samp{B} code. Ie, the example -in the section on @samp{B} mangling could have used a @samp{K} code -instead of @samp{B2}. - -@subsection Templates - -A class template instantiation is encoded as the letter @samp{t}, -followed by the encoding of the template name, followed -the number of template parameters, followed by encoding of the template -parameters. If a template parameter is a type, it is written -as a @samp{Z} followed by the encoding of the type. - -A function template specialization (either an instantiation or an -explicit specialization) is encoded by an @samp{H} followed by the -encoding of the template parameters, as described above, followed by an -@samp{_}, the encoding of the argument types to the template function -(not the specialization), another @samp{_}, and the return type. (Like -the argument types, the return type is the return type of the function -template, not the specialization.) Template parameters in the argument -and return types are encoded by an @samp{X} for type parameters, or a -@samp{Y} for constant parameters, an index indicating their position -in the template parameter list declaration, and their template depth. - -@subsection Arrays - -C++ array types are mangled by emitting @samp{A}, followed by -the length of the array, followed by an @samp{_}, followed by -the mangling of the element type. Of course, normally -array parameter types decay into a pointer types, so you -don't see this. - -Java arrays are objects. A Java type @code{T[]} is mangled -as if it were the C++ type @code{JArray<T>}. -For example @code{java.lang.String[]} is encoded as -@samp{Pt6JArray1ZPQ34java4lang6String}. - -@subsection Static fields - -Both C++ and Java classes can have static fields. -These are allocated statically, and are shared among all instances. - -The mangling starts with a prefix (@samp{_} in most systems), which is -followed by the mangling -of the class name, followed by the "joiner" and finally the field name. -The joiner (see @code{JOINER} in @code{cp-tree.h}) is a special -separator character. For historical reasons (and idiosyncracies -of assembler syntax) it can @samp{$} or @samp{.} (or even -@samp{_} on a few systems). If the joiner is @samp{_} then the prefix -is @samp{__static_} instead of just @samp{_}. - -For example @code{Foo::Bar::var} (or @code{Foo.Bar.var} in Java syntax) -would be encoded as @samp{_Q23Foo3Bar$var} or @samp{_Q23Foo3Bar.var} -(or rarely @samp{__static_Q23Foo3Bar_var}). - -If the name of a static variable needs Unicode escapes, -the Unicode indicator @samp{U} comes before the "joiner". -This @code{\u1234Foo::var\u3445} becomes @code{_U8_1234FooU.var_3445}. - -@subsection Table of demangling code characters - -The following special characters are used in mangling: - -@table @samp -@item A -Indicates a C++ array type. - -@item b -Encodes the C++ @code{bool} type, -and the Java @code{boolean} type. - -@item B -Used for squangling. Similar in concept to the 'T' non-squangled code. - -@item c -Encodes the C++ @code{char} type, and the Java @code{byte} type. - -@item C -A modifier to indicate a @code{const} type. -Also used to indicate a @code{const} member function -(in which cases it precedes the encoding of the method's class). - -@item d -Encodes the C++ and Java @code{double} types. - -@item e -Indicates extra unknown arguments @code{...}. - -@item E -Indicates the opening parenthesis of an expression. - -@item f -Encodes the C++ and Java @code{float} types. - -@item F -Used to indicate a function type. - -@item H -Used to indicate a template function. - -@item i -Encodes the C++ and Java @code{int} types. - -@item I -Encodes typedef names of the form @code{int@var{n}_t}, where @var{n} is a -positive decimal number. The @samp{I} is followed by either two -hexidecimal digits, which encode the value of @var{n}, or by an -arbitrary number of hexidecimal digits between underscores. For -example, @samp{I40} encodes the type @code{int64_t}, and @samp{I_200_} -encodes the type @code{int512_t}. - -@item J -Indicates a complex type. - -@item K -Used by squangling to compress qualified names. - -@item l -Encodes the C++ @code{long} type. - -@item n -Immediate repeated type. Followed by the repeat count. - -@item N -Repeated type. Followed by the repeat count of the repeated type, -followed by the type index of the repeated type. Due to a bug in -g++ 2.7.2, this is only generated if index is 0. Superceded by -@samp{n} when squangling. - -@item P -Indicates a pointer type. Followed by the type pointed to. - -@item Q -Used to mangle qualified names, which arise from nested classes. -Also used for namespaces. -In Java used to mangle package-qualified names, and inner classes. - -@item r -Encodes the GNU C++ @code{long double} type. - -@item R -Indicates a reference type. Followed by the referenced type. - -@item s -Encodes the C++ and java @code{short} types. - -@item S -A modifier that indicates that the following integer type is signed. -Only used with @code{char}. - -Also used as a modifier to indicate a static member function. - -@item t -Indicates a template instantiation. - -@item T -A back reference to a previously seen type. - -@item U -A modifier that indicates that the following integer type is unsigned. -Also used to indicate that the following class or namespace name -is encoded using Unicode-mangling. - -@item u -The @code{restrict} type qualifier. - -@item v -Encodes the C++ and Java @code{void} types. - -@item V -A modifier for a @code{volatile} type or method. - -@item w -Encodes the C++ @code{wchar_t} type, and the Java @code{char} types. - -@item W -Indicates the closing parenthesis of an expression. - -@item x -Encodes the GNU C++ @code{long long} type, and the Java @code{long} type. - -@item X -Encodes a template type parameter, when part of a function type. - -@item Y -Encodes a template constant parameter, when part of a function type. - -@item Z -Used for template type parameters. - -@end table - -The letters @samp{G}, @samp{M}, @samp{O}, and @samp{p} -also seem to be used for obscure purposes ... - -@node Vtables, Concept Index, Mangling, Top -@section Virtual Tables - -In order to invoke virtual functions, GNU C++ uses virtual tables. Each -virtual function gets an index, and the table entry points to the -overridden function to call. Sometimes, and adjustment to the this -pointer has to be made before calling a virtual function: - -@example -struct A@{ - int i; - virtual void foo(); -@}; - -struct B@{ - int j; - virtual void bar(); -@}; - -struct C:A,B@{ - virtual void bar(); -@}; - -void C::bar() -@{ - i++; -@} - -int main() -@{ - C *c = new C; - B *b = c; - c->bar(); -@} -@end example - -Here, casting from @samp{c} to @samp{b} adds an offset. When @samp{bar} -is called, this offset needs to be subtracted, so that @samp{C::bar} can -properly access @samp{i}. One approach of achieving this is to use -@emph{thunks}, which are small half-functions put into the virtual -table. The modify the first argument (the @samp{this} pointer), and then -jump into the real function. - -The other (traditional) approach is to have an additional integer in the -virtual table which is added to this. This is an additional overhead -both at the function call, and in the size of virtual tables: In the -case of single inheritance (or for the first base class), these integers -will always be zero. - -@subsection Virtual Base Classes with Virtual Tables - -In case of virtual bases, the code is even more -complicated. Constructors and destructors need to know whether they are -"in charge" of the virtual bases, and an implicit integer -@samp{__in_chrg} for that purpose. - -@example -struct A@{ - int i; - virtual void bar(); - void call_bar()@{bar();@} -@}; - -struct B:virtual A@{ - B(); - int j; - virtual void bar(); -@}; - -B::B()@{ - call_bar(); -@} - -struct C@{ - int k; -@}; - -struct D:C,B@{ - int l; - virtual void bar(); -@}; - -@end example - -When constructing an instance of B, it will have the following layout: -@samp{vbase pointer to A}, @samp{j}, @samp{A virtual table}, @samp{i}. -On a 32-bit machine, downcasting from @samp{A*} to @samp{B*} would need -to subtract 8, which would be the thunk executed when calling -@samp{B::bar} inside @samp{call_bar}. - -When constructing an instance of D, it will have a different layout: -@samp{k}, @samp{vbase pointer to A}, @samp{j}, @samp{l}, @samp{A virtual -table}, @samp{i}. So, when downcasting from @samp{A*} to @samp{B*} in a -@samp{D} object, the offset would be @samp{12}. - -This means that during construction of the @samp{B} base of a @samp{D} -object, a virtual table is needed which has a @samp{-12} thunk to -@samp{B::bar}. This is @emph{only} needed during construction and -destruction, as the full object will use a @samp{-16} thunk to -@samp{D::bar}. - -In order to implement this, the compiler generates an implicit argument -(in addition to @code{__in_chrg}): the virtual list argument -@code{__vlist}. This is a list of virtual tables needed during -construction and destruction. The virtual pointers are ordered in the -way they are used during construction; the destructors will process the -array in reverse order. The ordering is as follows: -@itemize @bullet -@item -If the class is in charge, the vlist starts with virtual table pointers -for the virtual bases that have virtual bases themselves. Here, only -@emph{polymorphic} virtual bases (pvbases) are interesting: if a vbase -has no virtual functions, it doesn't have a virtual table. - -@item -Next, the vlist has virtual tables for the initialization of the -non-virtual bases. These bases are not in charge, so the layout is -recursive, but ignores virtual bases during recursion. - -@item -Next, there is a number of virtual tables for each virtual base. These -are sorted in the order in which virtual bases are constructed. Each -virtual base may have more than one @code{vfield}, and therefore require -more than one @code{vtable}. The order of vtables is the same as used -when initializing vfields of non-virtual bases in a constructor. -@end itemize - -The compiler emits a virtual table list in a variable mangled as -@code{__vl.classname}. - -Class with virtual bases, but without pvbases, only have the -@code{__in_chrg} argument to their ctors and dtors: they don't have any -vfields in the vbases to initialize. - -A further problem arises with virtual destructors: A destructor -typically has only the @code{__in_chrg} argument, which also indicates -whether the destructor should call @code{operator delete}. A dtor of a -class with pvbases has an additional argument. Unfortunately, a caller -of a virtual dtor might not know whether to pass that argument or not. -Therefore, the dtor processes the @code{__vlist} argument in an -automatic variable, which is initialized from the class' vlist if the -__in_chrg flag has a zero value in bit 2 (bit mask 4), or from the -argument @code{__vlist1} if bit 2 of the __in_chrg parameter is set to -one. - -@subsection Specification of non-thunked vtables - -In the traditional implementation of vtables, each slot contains three -fields: The offset to be added to the this pointer before invoking a -virtual function, an unused field that is always zero, and the pointer -to the virtual function. The first two fields are typically 16 bits -wide. The unused field is called `index'; it may be non-zero in -pointer-to-member-functions, which use the same layout. - -The virtual table then is an array of vtable slots. The first slot is -always the virtual type info function, the other slots are in the order -in which the virtual functions appear in the class declaration. - -If a class has base classes, it may inherit other bases' vfields. Each -class may have a primary vfield; the primary vfield of the derived class -is the primary vfield of the left-most non-virtual base class. If a -class inherits a primary vfield, any new virtual functions in the -derived class are appended to the virtual table of the primary -vfield. If there are new virtual functions in the derived class, and no -primary vfield is inherited, a new vfield is introduced which becomes -primary. The redefined virtual functions fill the vtable slots inherited -from the base; new virtual functions are put into the primary vtable in -the order of declaration. If no new virtual functions are introduced, no -primary vfield is allocated. - -In a base class that has pvbases, virtual tables are needed which are -used only in the constructor (see example above). At run-time, the -virtual tables of the base class are adjusted, to reflect the new offset -of the pvbase. The compiler knows statically what offset the pvbase has -for a complete object. At run-time, the offset of the pvbase can be -extracted from the vbase pointer, which is set in the constructor of the -complete object. These two offsets result in a delta, which is used to -adjust the deltas in the vtable (the adjustment might be different for -different vtable slots). To adjust the vtables, the compiler emits code -that creates a vtable on the stack. This vtable is initialized with the -vtable for the complete base type, and then adjusted. - -In order to call a virtual function, the compiler gets the offset field -from the vtable entry, and adds it to the this pointer. It then -indirectly calls the virtual function pointer, passing the adjusted this -pointer, and any arguments the virtual function may have. - -To implement dynamic casting, the dynamic_cast function needs typeinfos -for the complete type, and the pointer to the complete type. The -typeinfo pointer is obtained by calling the virtual typeinfo function -(which doesn't take a this parameter). The pointer to the complete -object is obtained by adding the offset of the virtual typeinfo vtable -slot, since this virtual function is always implemented in the complete -object. - -@subsection Specification of thunked vtables - -For vtable thunks, each slot only consists of a pointer to the virtual -function, which might be a thunk function. The first slot in the vtable -is an offset of the this pointer to the complete object, which is needed -as a parameter to __dynamic_cast. The second slot is the virtual -typeinfo function. All other slots are allocated with the same procedure -as in the non-thunked case. Allocation of vfields also uses the same -procedure as described above. - -If the virtual function needs an adjusted this pointer, a thunk function -is emitted. If supported by the target architecture, this is only a -half-function. Such a thunk has no stack frame; it merely adjusts the -first argument of the function, and then directly branches into the -implementation of the virtual function. If the architecture does not -support half-functions (i.e. if ASM_OUTPUT_MI_THUNK is not defined), the -compiler emits a wrapper function, which copies all arguments, adjust -the this pointer, and then calls the original function. Since objects of -non-aggregate type are passed by invisible reference, this copies only -POD arguments. The approach fails for virtual functions with a variable -number of arguments. - -In order to support the vtables needed in base constructors with -pvbases, the compiler passes an implicit __vlist argument as described -above, if the version 2 thunks are used. For version 1 thunks, the base -class constructor will fill in the vtables for the complete base class, -which will incorrectly adjust the this pointer, leading to a dynamic -error. - -@node Concept Index, , Vtables, Top - -@section Concept Index - -@printindex cp - -@bye |