1 files changed, 225 insertions, 0 deletions
diff --git a/share/doc/psd/18.gprof/gathering.me b/share/doc/psd/18.gprof/gathering.me
new file mode 100644
index 000000000000..9b13b8c4d79d
--- /dev/null
+++ b/share/doc/psd/18.gprof/gathering.me
@@ -0,0 +1,225 @@
+.\" Copyright (c) 1982, 1993
+.\"	The Regents of the University of California.  All rights reserved.
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\" 1. Redistributions of source code must retain the above copyright
+.\"    notice, this list of conditions and the following disclaimer.
+.\" 2. Redistributions in binary form must reproduce the above copyright
+.\"    notice, this list of conditions and the following disclaimer in the
+.\"    documentation and/or other materials provided with the distribution.
+.\" 3. Neither the name of the University nor the names of its contributors
+.\"    may be used to endorse or promote products derived from this software
+.\"    without specific prior written permission.
+.\"
+.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
+.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+.\" SUCH DAMAGE.
+.\"
+.sh 1 "Gathering Profile Data"
+.pp
+Routine calls or statement executions can be measured by having a
+compiler augment the code at strategic points.
+The additions can be inline increments to counters [Knuth71]
+[Satterthwaite72] [Joy79] or calls to
+monitoring routines [Unix].
+The counter increment overhead is low, and is suitable for
+profiling statements.
+A call of the monitoring routine has an overhead comparable with a
+call of a regular routine, and is therefore only suited
+to profiling on a routine by routine basis.
+However, the monitoring routine solution has certain advantages.
+Whatever counters are needed by the monitoring routine can be
+managed by the monitoring routine itself, rather than being
+distributed around the code.
+In particular, a monitoring routine can easily be called from separately
+compiled programs.
+In addition, different monitoring routines can be linked into the
+program
+being measured
+to assemble different profiling data without having to
+change the compiler or recompile the program.
+We have exploited this approach;
+our compilers for C, Fortran77, and Pascal can insert calls to a
+monitoring routine in the prologue for each routine.
+Use of the monitoring routine requires no planning on part of a
+programmer other than to request that augmented routine
+prologues be produced during compilation.
+.pp
+We are interested in gathering three pieces of information during
+program execution: call counts and execution times for 
+each profiled routine, and the arcs of the dynamic call graph
+traversed by this execution of the program.
+By post-processing of this data we can build the dynamic call
+graph for this execution of the program and propagate times along
+the edges of this graph to attribute times for routines to the
+routines that invoke them.
+.pp
+Gathering of the profiling information should not greatly
+interfere with the running of the program.
+Thus, the monitoring routine must not produce trace output each
+time it is invoked.
+The volume of data thus produced would be unmanageably large,
+and the time required to record it would overwhelm the running
+time of most programs.
+Similarly, the monitoring routine can not do the analysis of
+the profiling data (e.g. assembling the call graph, propagating
+times around it, discovering cycles, etc.) during program
+execution.
+Our solution is to gather profiling data in memory during program
+execution and to condense it to a file as the profiled
+program exits.
+This file is then processed by a separate program to produce the
+listing of the profile data.
+An advantage of this approach is that the profile data for
+several executions of a program can be combined by the
+post-processing to provide a profile of many
+executions.
+.pp
+The execution time monitoring consists of three parts.
+The first part allocates and initializes the runtime monitoring data
+structures before the program begins execution.
+The second part is the monitoring routine invoked from the
+prologue of each profiled routine.
+The third part condenses the data structures and writes them
+to a file as the program terminates.
+The monitoring routine is discussed in detail in the following sections.
+.sh 2 "Execution Counts"
+.pp
+The \fBgprof\fP monitoring routine counts the number of times
+each profiled routine is called.
+The monitoring routine also records the arc in the call graph
+that activated the profiled routine.
+The count is associated with the arc in the call graph
+rather than with the routine.
+Call counts for routines can then be determined by summing the counts
+on arcs directed into that routine.
+In a machine-dependent fashion, the monitoring routine notes its
+own return address.
+This address is in the prologue of some profiled routine that is
+the destination of an arc in the dynamic call graph.
+The monitoring routine also discovers the return address for that
+routine, thus identifying the call site, or source of the arc.
+The source of the arc is in the \fIcaller\fP, and the destination is in
+the \fIcallee\fP.
+For example, if a routine A calls a routine B, A is the caller, 
+and B is the callee.
+The prologue of B will include a call to the monitoring routine
+that will note the arc from A to B and either initialize or
+increment a counter for that arc.
+.pp
+One can not afford to have the monitoring routine output tracing
+information as each arc is identified.
+Therefore, the monitoring routine maintains a table of all the
+arcs discovered, with counts of the numbers of times each is
+traversed during execution.
+This table is accessed once per routine call.
+Access to it
+must be as fast as possible so as not to overwhelm the time
+required to execute the program.
+.pp
+Our solution is to access the table through a hash table.
+We use the call site as the primary key with the callee
+address being the secondary key.
+Since each call site typically calls only one callee, we can
+reduce (usually to one) the number of minor lookups based on the callee.
+Another alternative would use the callee as the primary key and the
+call site as the secondary key.
+Such an organization has the advantage of associating callers with
+callees, at the expense of longer lookups in the monitoring
+routine.
+We are fortunate to be running in a virtual memory environment,
+and (for the sake of speed) were able to allocate enough space
+for the primary hash table to allow a one-to-one mapping from
+call site addresses to the primary hash table.
+Thus our hash function is trivial to calculate and collisions
+occur only for call sites that call multiple
+destinations (e.g. functional parameters and functional variables).
+A one level hash function using both call site and callee would
+result in an unreasonably large hash table.
+Further, the number of dynamic call sites and callees is not known during
+execution of the profiled program.
+.pp
+Not all callers and callees can be identified by the monitoring
+routine.
+Routines that were compiled without the profiling augmentations
+will not call the monitoring routine as part of their prologue,
+and thus no arcs will be recorded whose destinations are in these
+routines.
+One need not profile all the routines in a program.
+Routines that are not profiled run at full speed.
+Certain routines, notably exception handlers, are invoked by
+non-standard calling sequences.
+Thus the monitoring routine may know the destination of an arc
+(the callee),
+but find it difficult or
+impossible to determine the source of the arc (the caller).
+Often in these cases the apparent source of the arc is not a call
+site at all.
+Such anomalous invocations are declared ``spontaneous''.
+.sh 2 "Execution Times"
+.pp
+The execution times for routines can be gathered in at least two
+ways.
+One method measures the execution time of a routine by measuring
+the elapsed time from routine entry to routine exit.
+Unfortunately, time measurement is complicated on time-sharing
+systems by the time-slicing of the program.
+A second method samples the value of the program counter at some
+interval, and infers execution time from the distribution of the
+samples within the program.
+This technique is particularly suited to time-sharing systems,
+where the time-slicing can serve as the basis for sampling
+the program counter.
+Notice that, whereas the first method could provide exact timings,
+the second is inherently a statistical approximation.
+.pp
+The sampling method need not require support from the operating
+system:  all that is needed is the ability to set and respond to
+``alarm clock'' interrupts that run relative to program time.
+It is imperative that the intervals be uniform since the
+sampling of the program counter rather than the duration of the
+interval is the basis of the distribution.
+If sampling is done too often, the interruptions to sample the
+program counter will overwhelm the running of the profiled program.
+On the other hand, the program must run for enough sampled
+intervals that the distribution of the samples accurately
+represents the distribution of time for the execution of the
+program.
+As with routine call tracing, the monitoring routine can not
+afford to output information for each program counter
+sample.
+In our computing environment, the operating system can provide a
+histogram of the location of the program counter at the end of
+each clock tick (1/60th of a second) in which a program runs.
+The histogram is assembled in memory as the program runs.
+This facility is enabled by our monitoring routine.
+We have adjusted the granularity of the histogram so that
+program counter values map one-to-one onto the histogram.
+We make the simplifying assumption that all calls to a specific
+routine require the same amount of time to execute.
+This assumption may disguise that some calls
+(or worse, some call sites) always invoke a routine
+such that its execution is faster (or slower)
+than the average time for that routine.
+.pp
+When the profiled program terminates, 
+the arc table and the histogram of
+program counter samples is written to a file.
+The arc table is condensed to consist of the source and destination
+addresses of the arc and the count of the number of times the arc
+was traversed by this execution of the program.
+The recorded histogram consists of counters of the number of
+times the program counter was found to be in each of the ranges covered
+by the histogram.
+The ranges themselves are summarized as a
+lower and upper bound and a step size.