Stack-Lexer is a simple static library for performing lexical analysis on strings. It is designed to store tokens and value in the stack, allowing you to """"pre-compile"""" a lexical codex for static token identification.
#include "stack_lexer.h"To link the library you will need to include a few flags:
# if the library folder is in the root of the project:
-I./Stack-Lexer/include -L./Stack-Lexer/lib -lstack_lexer
#if the library folder is in some 'lib' folder, inside the project:
-I./lib/Stack-Lexer/include -L./lib/Stack-Lexer/lib -lstack_lexerUnpacks and reads the token based on the value stored in the generic_stack global variable.
- Parameters:
token— The token. - Returns: The unpacked value from the token.
Converts the token into a string based on the value stored in the generic_stack global variable.
- Parameters:
token— The token. - Returns: A pointer to the string unpacked from the token.
3. void stack_lexer_32_attach(char* word, uint8_t result, uint32_t codex[256], uint8_t parent, uint8_t* stack_top)
Adds a word to the vocabulary (codex), allowing it to be identified during lexical analysis.
- Parameters:
word: The word to be added to the vocabulary.result: The identifier code for this word (ranging from 1 to 254).codex: The vocabulary (an array of 256 entries) where the word will be stored.parent: The parent word (set as0xffif it’s a new word).stack_top: The top of the stack, which is automatically updated.
Sorts the items in the vocabulary (codex), making it ready for use in lexical analysis.
- Parameters:
codex— The vocabulary to be sorted.
Builds a new vocabulary from a set of words.
- Parameters:
codex: A pre-allocated codex to be populated.words: A list of words to be identified (separated by commas).
Performs lexical analysis on a text, scanning for tokens, and stores the found tokens in the global tokens_stack. Literals of both values and strings are stored in the generic_stack.
- Parameters:
text: The text to be scanned for tokens.codex: The vocabulary to be used for identifying tokens.
tokens_stack[STACK_LEXER_TOKENS_STACK_SIZE]: Stack for storing tokens during lexical analysis.generic_stack[STACK_LEXER_GENERIC_STACK_SIZE]: Stack for storing literals (strings, numbers, etc.).
PUSH_TOKEN(TOKEN): Pushes a token onto thetokens_stack.PUSH_GENERIC(GENERIC): Pushes a value onto thegeneric_stack.
STRING_TOKEN(LOCATION): Token for strings.UNSIGNED_INT__8_TOKEN(LOCATION): Token for 8-bit unsigned integer.UNSIGNED_INT_16_TOKEN(LOCATION): Token for 16-bit unsigned integer.UNSIGNED_INT_32_TOKEN(LOCATION): Token for 32-bit unsigned integer.UNSIGNED_INT_64_TOKEN(LOCATION): Token for 64-bit unsigned integer.
There are several macros in the stack_lexer.h header that can be customized to fit the needs of your specific use case. These macros are defined with #ifndef to allow overriding during compilation or in your project’s code.
Defines the separator used when building a list of words in the vocabulary. By default, it is a comma (,).
#ifndef STACK_LEXER_BUILDER_SEPARATOR
#define STACK_LEXER_BUILDER_SEPARATOR ','
#endif- Customizable: You can modify this to use any other separator, like a space (
' '), semicolon (;), etc.
Defines the starting symbol for identifying number tokens. By default, it is set to '['.
#ifndef STACK_LEXER_SYMBOL_NUMBER_START
#define STACK_LEXER_SYMBOL_NUMBER_START '['
#endif- Customizable: Change this symbol to any other character for your lexic analysis needs.
Defines the ending symbol for number tokens. By default, it is set to ']'.
#ifndef STACK_LEXER_SYMBOL_NUMBER_END
#define STACK_LEXER_SYMBOL_NUMBER_END ']'
#endif- Customizable: Adjust it as needed to match your format.
Defines the symbol that separates multiple numbers. By default, it is a comma (,).
#ifndef STACK_LEXER_SYMBOL_NUMBER_NEXT
#define STACK_LEXER_SYMBOL_NUMBER_NEXT ','
#endif- Customizable: Change it to another symbol if required, such as a semicolon (
;), space (' '), etc.
Defines the symbol used for capturing string tokens. By default, it is set to ", the double quote.
#ifndef STACK_LEXER_SYMBOL_STRING_CAPTURING
#define STACK_LEXER_SYMBOL_STRING_CAPTURING '"'
#endif- Customizable: Modify it to any other character used for string capturing.
Defines the size of the token stack. The default size is 256 tokens.
#ifndef STACK_LEXER_TOKENS_STACK_SIZE
#define STACK_LEXER_TOKENS_STACK_SIZE 256
#endif- Customizable: Change this value to increase or decrease the size of the token stack.
Defines the size of the generic stack. The default size is 1024 values.
#ifndef STACK_LEXER_GENERIC_STACK_SIZE
#define STACK_LEXER_GENERIC_STACK_SIZE 1024
#endif- Customizable: Adjust the size of the generic stack as necessary.
If you want to change the token separator from a comma to a semicolon and increase the size of the generic_stack to 2048, you can modify the macros like this:
#define STACK_LEXER_BUILDER_SEPARATOR ';'
#define STACK_LEXER_GENERIC_STACK_SIZE 2048This library is licensed under the MIT License. You can read more about it at https://opensource.org/licenses/MIT.