aboutsummaryrefslogtreecommitdiff
path: root/share/doc/psd/15.yacc/ss4
diff options
context:
space:
mode:
Diffstat (limited to 'share/doc/psd/15.yacc/ss4')
-rw-r--r--share/doc/psd/15.yacc/ss4367
1 files changed, 367 insertions, 0 deletions
diff --git a/share/doc/psd/15.yacc/ss4 b/share/doc/psd/15.yacc/ss4
new file mode 100644
index 000000000000..e548d53cfc7d
--- /dev/null
+++ b/share/doc/psd/15.yacc/ss4
@@ -0,0 +1,367 @@
+.\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved.
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions are
+.\" met:
+.\"
+.\" Redistributions of source code and documentation must retain the above
+.\" copyright notice, this list of conditions and the following
+.\" disclaimer.
+.\"
+.\" Redistributions in binary form must reproduce the above copyright
+.\" notice, this list of conditions and the following disclaimer in the
+.\" documentation and/or other materials provided with the distribution.
+.\"
+.\" All advertising materials mentioning features or use of this software
+.\" must display the following acknowledgement:
+.\"
+.\" This product includes software developed or owned by Caldera
+.\" International, Inc. Neither the name of Caldera International, Inc.
+.\" nor the names of other contributors may be used to endorse or promote
+.\" products derived from this software without specific prior written
+.\" permission.
+.\"
+.\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA
+.\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR
+.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+.\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE
+.\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR
+.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+.\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+.\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
+.\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
+.\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+.\"
+.\" @(#)ss4 8.1 (Berkeley) 6/8/93
+.\"
+.\" $FreeBSD$
+.SH
+4: How the Parser Works
+.PP
+Yacc turns the specification file into a C program, which
+parses the input according to the specification given.
+The algorithm used to go from the
+specification to the parser is complex, and will not be discussed
+here (see
+the references for more information).
+The parser itself, however, is relatively simple,
+and understanding how it works, while
+not strictly necessary, will nevertheless make
+treatment of error recovery and ambiguities much more
+comprehensible.
+.PP
+The parser produced by Yacc consists
+of a finite state machine with a stack.
+The parser is also capable of reading and remembering the next
+input token (called the
+.I lookahead
+token).
+The
+.I "current state"
+is always the one on the top of the stack.
+The states of the finite state machine are given
+small integer labels; initially, the machine is in state 0,
+the stack contains only state 0, and no lookahead token has been read.
+.PP
+The machine has only four actions available to it, called
+.I shift ,
+.I reduce ,
+.I accept ,
+and
+.I error .
+A move of the parser is done as follows:
+.IP 1.
+Based on its current state, the parser decides
+whether it needs a lookahead token to decide
+what action should be done; if it needs one, and does
+not have one, it calls
+.I yylex
+to obtain the next token.
+.IP 2.
+Using the current state, and the lookahead token
+if needed, the parser decides on its next action, and
+carries it out.
+This may result in states being pushed onto the stack, or popped off of
+the stack, and in the lookahead token being processed
+or left alone.
+.PP
+The
+.I shift
+action is the most common action the parser takes.
+Whenever a shift action is taken, there is always
+a lookahead token.
+For example, in state 56 there may be
+an action:
+.DS
+ IF shift 34
+.DE
+which says, in state 56, if the lookahead token is IF,
+the current state (56) is pushed down on the stack,
+and state 34 becomes the current state (on the
+top of the stack).
+The lookahead token is cleared.
+.PP
+The
+.I reduce
+action keeps the stack from growing without
+bounds.
+Reduce actions are appropriate when the parser has seen
+the right hand side of a grammar rule,
+and is prepared to announce that it has seen
+an instance of the rule, replacing the right hand side
+by the left hand side.
+It may be necessary to consult the lookahead token
+to decide whether to reduce, but
+usually it is not; in fact, the default
+action (represented by a ``.'') is often a reduce action.
+.PP
+Reduce actions are associated with individual grammar rules.
+Grammar rules are also given small integer
+numbers, leading to some confusion.
+The action
+.DS
+ \fB.\fR reduce 18
+.DE
+refers to
+.I "grammar rule"
+18, while the action
+.DS
+ IF shift 34
+.DE
+refers to
+.I state
+34.
+.PP
+Suppose the rule being reduced is
+.DS
+A \fB:\fR x y z ;
+.DE
+The reduce action depends on the
+left hand symbol (A in this case), and the number of
+symbols on the right hand side (three in this case).
+To reduce, first pop off the top three states
+from the stack
+(In general, the number of states popped equals the number of symbols on the
+right side of the rule).
+In effect, these states were the ones
+put on the stack while recognizing
+.I x ,
+.I y ,
+and
+.I z ,
+and no longer serve any useful purpose.
+After popping these states, a state is uncovered
+which was the state the parser was in before beginning to
+process the rule.
+Using this uncovered state, and the symbol
+on the left side of the rule, perform what is in
+effect a shift of A.
+A new state is obtained, pushed onto the stack, and parsing continues.
+There are significant differences between the processing of
+the left hand symbol and an ordinary shift of a token,
+however, so this action is called a
+.I goto
+action.
+In particular, the lookahead token is cleared by a shift, and
+is not affected by a goto.
+In any case, the uncovered state contains an entry such as:
+.DS
+ A goto 20
+.DE
+causing state 20 to be pushed
+onto the stack, and become the current state.
+.PP
+In effect, the reduce action ``turns back the clock'' in the parse,
+popping the states off the stack to go back to the
+state where the right hand side of the rule was first seen.
+The parser then behaves as if it had seen the left side at that time.
+If the right hand side of the rule is empty,
+no states are popped off of the stack: the uncovered state
+is in fact the current state.
+.PP
+The reduce action is also important in the treatment of user-supplied
+actions and values.
+When a rule is reduced, the code supplied with the rule is executed
+before the stack is adjusted.
+In addition to the stack holding the states, another stack,
+running in parallel with it, holds the values returned
+from the lexical analyzer and the actions.
+When a shift takes place, the external variable
+.I yylval
+is copied onto the value stack.
+After the return from the user code, the reduction is carried out.
+When the
+.I goto
+action is done, the external variable
+.I yyval
+is copied onto the value stack.
+The pseudo-variables $1, $2, etc., refer to the value stack.
+.PP
+The other two parser actions are conceptually much simpler.
+The
+.I accept
+action indicates that the entire input has been seen and
+that it matches the specification.
+This action appears only when the lookahead token is
+the endmarker, and indicates that the parser has successfully
+done its job.
+The
+.I error
+action, on the other hand, represents a place where the parser
+can no longer continue parsing according to the specification.
+The input tokens it has seen, together with the lookahead token,
+cannot be followed by anything that would result
+in a legal input.
+The parser reports an error, and attempts to recover the situation and
+resume parsing: the error recovery (as opposed to the detection of error)
+will be covered in Section 7.
+.PP
+It is time for an example!
+Consider the specification
+.DS
+%token DING DONG DELL
+%%
+rhyme : sound place
+ ;
+sound : DING DONG
+ ;
+place : DELL
+ ;
+.DE
+.PP
+When Yacc is invoked with the
+.B \-v
+option, a file called
+.I y.output
+is produced, with a human-readable description of the parser.
+The
+.I y.output
+file corresponding to the above grammar (with some statistics
+stripped off the end) is:
+.DS
+state 0
+ $accept : \_rhyme $end
+
+ DING shift 3
+ . error
+
+ rhyme goto 1
+ sound goto 2
+
+state 1
+ $accept : rhyme\_$end
+
+ $end accept
+ . error
+
+state 2
+ rhyme : sound\_place
+
+ DELL shift 5
+ . error
+
+ place goto 4
+
+state 3
+ sound : DING\_DONG
+
+ DONG shift 6
+ . error
+
+state 4
+ rhyme : sound place\_ (1)
+
+ . reduce 1
+
+state 5
+ place : DELL\_ (3)
+
+ . reduce 3
+
+state 6
+ sound : DING DONG\_ (2)
+
+ . reduce 2
+.DE
+Notice that, in addition to the actions for each state, there is a
+description of the parsing rules being processed in each
+state. The \_ character is used to indicate
+what has been seen, and what is yet to come, in each rule.
+Suppose the input is
+.DS
+DING DONG DELL
+.DE
+It is instructive to follow the steps of the parser while
+processing this input.
+.PP
+Initially, the current state is state 0.
+The parser needs to refer to the input in order to decide
+between the actions available in state 0, so
+the first token,
+.I DING ,
+is read, becoming the lookahead token.
+The action in state 0 on
+.I DING
+is
+is ``shift 3'', so state 3 is pushed onto the stack,
+and the lookahead token is cleared.
+State 3 becomes the current state.
+The next token,
+.I DONG ,
+is read, becoming the lookahead token.
+The action in state 3 on the token
+.I DONG
+is ``shift 6'',
+so state 6 is pushed onto the stack, and the lookahead is cleared.
+The stack now contains 0, 3, and 6.
+In state 6, without even consulting the lookahead,
+the parser reduces by rule 2.
+.DS
+ sound : DING DONG
+.DE
+This rule has two symbols on the right hand side, so
+two states, 6 and 3, are popped off of the stack, uncovering state 0.
+Consulting the description of state 0, looking for a goto on
+.I sound ,
+.DS
+ sound goto 2
+.DE
+is obtained; thus state 2 is pushed onto the stack,
+becoming the current state.
+.PP
+In state 2, the next token,
+.I DELL ,
+must be read.
+The action is ``shift 5'', so state 5 is pushed onto the stack, which now has
+0, 2, and 5 on it, and the lookahead token is cleared.
+In state 5, the only action is to reduce by rule 3.
+This has one symbol on the right hand side, so one state, 5,
+is popped off, and state 2 is uncovered.
+The goto in state 2 on
+.I place ,
+the left side of rule 3,
+is state 4.
+Now, the stack contains 0, 2, and 4.
+In state 4, the only action is to reduce by rule 1.
+There are two symbols on the right, so the top two states are popped off,
+uncovering state 0 again.
+In state 0, there is a goto on
+.I rhyme
+causing the parser to enter state 1.
+In state 1, the input is read; the endmarker is obtained,
+indicated by ``$end'' in the
+.I y.output
+file.
+The action in state 1 when the endmarker is seen is to accept,
+successfully ending the parse.
+.PP
+The reader is urged to consider how the parser works
+when confronted with such incorrect strings as
+.I "DING DONG DONG" ,
+.I "DING DONG" ,
+.I "DING DONG DELL DELL" ,
+etc.
+A few minutes spend with this and other simple examples will
+probably be repaid when problems arise in more complicated contexts.