The lex utility shall generate C programs to be used in lexical processing of character input, and that can be used as an interface to yacc. The C programs shall be generated from lex source code and conform to the ISO C standard. Usually, the lex utility shall write the program it generates to the file lex.yy.c; the state of this file is unspecified if lex exits with a non-zero exit status.
The general format of lex source shall be:
Definitions
%%
Rules
%%
UserSubroutines
The first "%%" is required to mark the beginning of the rules (regular expressions and actions); the second "%%" is required only if user subroutines follow.
Actions in lex
The action to be taken when an ERE is matched can be a C program fragment or the special actions described below; the program fragment can contain one or more C statements, and can also include special actions. The empty C statement ';' shall be a valid action; any string in the lex.yy.c input that matches the pattern portion of such a rule is effectively ignored or skipped. However, the absence of an action shall not be valid, and the action lex takes in such a condition is undefined.
The specification for an action, including C statements and special actions, can extend across several lines if enclosed in braces:
ERE <one or more blanks> { program statement
program statement }
The default action when a string in the input to a lex.yy.c program is not matched by any expression shall be to copy the string to the output. Because the default behavior of a program generated by lex is to read the input and copy it to the output, a minimal lex source program that has just "%%" shall generate a C program that simply copies the input to the output unchanged.
Four special actions shall be available:
| ECHO; REJECT; BEGIN
|
The action '|' means that the action for the next rule is the action for this rule. Unlike the other three actions, '|' cannot be enclosed in braces or be semicolon-terminated; the application shall ensure that it is specified alone, with no other actions.
ECHO;
Write the contents of the string yytext on the output.
REJECT;
Usually only a single expression is matched by a given string in the input. REJECT means "continue to the next expression that matches the current input", and shall cause whatever rule was the second choice after the current rule to be executed for the same input. Thus, multiple rules can be matched and executed for one input string or overlapping input strings. For example, given the regular expressions "xyz" and "xy" and the input "xyz", usually only the regular expression "xyz" would match. The next attempted match would start after z. If the last action in the "xyz" rule is REJECT, both this rule and the "xy" rule would be executed. The REJECT action may be implemented in such a fashion that flow of control does not continue after it, as if it were equivalent to a goto to another part of yylex(). The use of REJECT may result in somewhat larger and slower scanners.
BEGIN
The action:
BEGIN newstate;
switches the state (start condition) to newstate. If the string newstate has not been declared previously as a start condition in the Definitions section, the results are unspecified. The initial state is indicated by the digit '0' or the token INITIAL.
The functions or macros described below are accessible to user code included in the lex input. It is unspecified whether they appear in the C code output of lex, or are accessible only through the -l l operand to c99 (the lex library).
int yylex(void)
Performs lexical analysis on the input; this is the primary function generated by the lex utility. The function shall return zero when the end of input is reached; otherwise, it shall return non-zero values (tokens) determined by the actions that are selected.
int yymore(void)
When called, indicates that when the next input string is recognized, it is to be appended to the current value of yytext rather than replacing it; the value in yyleng shall be adjusted accordingly.
int yyless(int n)
Retains n initial characters in yytext, NUL-terminated, and treats the remaining characters as if they had not been read; the value in yyleng shall be adjusted accordingly.
int input(void)
Returns the next character from the input, or zero on end-of-file. It shall obtain input from the stream pointer yyin, although possibly via an intermediate buffer. Thus, once scanning has begun, the effect of altering the value of yyin is undefined. The character read shall be removed from the input stream of the scanner without any processing by the scanner.
int unput(int c)
Returns the character 'c' to the input; yytext and yyleng are undefined until the next expression is matched. The result of using unput() for more characters than have been input is unspecified.
The following functions shall appear only in the lex library accessible through the -l l operand; they can therefore be redefined by a conforming application:
int yywrap(void)
Called by yylex() at end-of-file; the default yywrap() shall always return 1. If the application requires yylex() to continue processing with another source of input, then the application can include a function yywrap(), which associates another file with the external variable FILE * yyin and shall return a value of zero.
int main(int argc, char *argv[])
Calls yylex() to perform lexical analysis, then exits. The user code can contain main() to perform application-specific operations, calling yylex() as applicable.
Except for input(), unput(), and main(), all external and static names generated by lex shall begin with the prefix yy or YY.
You will require FLEX to form lex.yy.c file,you can download it easily on ubuntu by going to synaptic manager.
No comments:
Post a Comment