This lesson introduces the fundamentals of parsing SML documents using the SML library. You'll learn how to load documents from files, navigate the DOM tree, and extract information.
By the end of this lesson, you will be able to:
- Parse SML documents from files using
Parse_File - Handle parse results by checking for success or errors
- Navigate the DOM tree using
First_Child,Next_Sibling, andRoot - Identify node types using the
Kindfunction - Extract text content from elements
- Handle parse errors with detailed error messages
- Ada 2022 compiler (GNAT)
- Alire package manager
- SML library (automatically fetched by Alire)
# Build the lesson
cd lesson-1-basic-parsing
alr build
# Run the lesson
./bin/lesson_1_basic_parsing============================
Lesson 1: Basic SML Parsing
============================
This lesson demonstrates:
1. Parsing SML documents from files
2. Navigating the DOM tree
3. Extracting text content from elements
4. Handling parse errors gracefully
=============================
Part 1: Parsing a Valid Document
=============================
Loading fixtures/tasks_simple_sml.sml...
[OK] Document parsed successfully!
Task List:
----------
1. [Priority 5] Implement login feature
Status: todo
2. [Priority 3] Write unit tests
Status: in_progress
3. [Priority 1] Update documentation
Status: done
Task Summary by Status:
Todo: 1
In Progress: 1
Done: 1
-----------
Total Tasks: 3
==========================
Testing Error Handling
==========================
Attempting to parse an invalid document...
[EXPECTED] Parse error detected:
Error: Expected closing tag for 'task'
Location: Line 8, Column 5
============================
Lesson 1 Complete!
============================
You've learned how to:
- Use Parse_File to load SML documents
- Check Parse_Result.Success for parse errors
- Navigate the DOM using First_Child and Next_Sibling
- Extract text content from elements
- Handle parse errors with detailed error messages
Next: Lesson 2 will teach you schema validation!
The Parse_Result type contains either a successfully parsed document or an error:
declare
Parse_Res : constant Parse_Result := Parse_File("document.sml");
begin
if Parse_Res.Success then
-- Work with Parse_Res.Doc
else
-- Handle Parse_Res.Error
end if;
end;Structure:
type Parse_Result is record
Success : Boolean;
Doc : Document; -- Valid only if Success = True
Error : Parse_Error; -- Valid only if Success = False
end record;When parsing fails, the error contains diagnostic information:
type Parse_Error is record
Message : String(1..256); -- Error description
Msg_Length : Natural; -- Actual message length
Line : Natural; -- Line number where error occurred
Column : Natural; -- Column number where error occurred
end record;Accessing error information:
if not Parse_Res.Success then
Put_Line(Parse_Res.Error.Message(1 .. Parse_Res.Error.Msg_Length));
Put_Line("Line:" & Natural'Image(Parse_Res.Error.Line));
Put_Line("Column:" & Natural'Image(Parse_Res.Error.Column));
end if;Documents are trees of nodes. Navigate using:
Root(Doc)- Get the root elementFirst_Child(Doc, Node)- Get first child of a nodeNext_Sibling(Doc, Node)- Get next siblingKind(Doc, Node)- Get node type (Element, Text, etc.)Name(Doc, Node)- Get element nameText_Value(Doc, Node)- Get text content
The Node_Kind enumeration defines node types:
type Node_Kind is (Element, Text, Comment, Processing_Instruction);Most common types:
- Element - Tags like
<task>,<title>, etc. - Text - Text content between tags
- Comment - XML comments
<!-- ... -->
Node_Id - Reference to a node in the document tree
type Node_Id is private;
Null_Node : constant Node_Id; -- Represents no nodeAlways check before using:
if Node /= Null_Node then
-- Safe to use Node
end if;Important: Parsed documents are immutable (read-only). They use Ada's limited types:
- Cannot be assigned or copied
- Can only be used within their declaration scope
- This ensures memory safety and prevents corruption
-- ✓ Correct - use constant with initialization
declare
Parse_Res : constant Parse_Result := Parse_File("doc.sml");
begin
if Parse_Res.Success then
-- Work with Parse_Res.Doc here
end if;
end;
-- ✗ Wrong - cannot assign limited types
Doc := Parse_Res.Doc; -- Won't compile!The lesson program includes three main procedures:
- Display_Task_List - Shows basic task information (title, status, priority)
- Count_Tasks_By_Status - Aggregates tasks by their status field
- Test_Parse_Error - Demonstrates error handling
These procedures demonstrate common DOM traversal patterns you'll use throughout the tutorial.
A simple, well-formed task database with 3 tasks:
<task_database>
<tasks>
<task>
<title>Create wireframes</title>
<priority>1</priority>
<status>done</status>
</task>
<task>
<title>Implement responsive navigation</title>
<priority>2</priority>
<status>in_progress</status>
</task>
<task>
<title>Design color scheme</title>
<priority>3</priority>
<status>todo</status>
</task>
</tasks>
</task_database>Purpose: Demonstrates successful parsing and DOM navigation
An intentionally malformed document for testing error handling:
<task_database>
<tasks>
<task id="invalid" <!-- Missing closing >
<title>Broken task</title>
</task>
</tasks>
</task_database>Purpose: Demonstrates parse error detection and reporting
function Find_Child_Element (Doc : Document;
Parent : Node_Id;
Element_Name : String) return Node_Id is
Child : Node_Id := First_Child (Doc, Parent);
begin
while Child /= Null_Node loop
if Kind (Doc, Child) = Element and then
Name (Doc, Child) = Element_Name then
return Child;
end if;
Child := Next_Sibling (Doc, Child);
end loop;
return Null_Node;
end Find_Child_Element;function Get_Element_Text (Doc : Document; Element : Node_Id) return String is
Text_Node : constant Node_Id := First_Child (Doc, Element);
begin
if Text_Node /= Null_Node and then Kind (Doc, Text_Node) = Text then
return Text_Value (Doc, Text_Node);
end if;
return "";
end Get_Element_Text;declare
Child : Node_Id := First_Child (Doc, Parent);
begin
while Child /= Null_Node loop
if Kind (Doc, Child) = Element then
-- Process element
end if;
Child := Next_Sibling (Doc, Child);
end loop;
end;Always check for Null_Node and verify node kind:
if Node /= Null_Node and then Kind(Doc, Node) = Element then
Name_Str : constant String := Name(Doc, Node);
-- Safe to use Node as an element
end if;Parse_File Function:
function Parse_File(Path : String) return Parse_Result;Parses an SML document from a file.
Returns: Parse_Result containing:
Success : Boolean- Whether parsing succeededDoc : Document- The parsed document (if successful)Error : Parse_Error- Error details (if failed)
Root Function:
function Root(Doc : Document) return Node_Id;Returns the root element of the document.
First_Child Function:
function First_Child(Doc : Document; Node : Node_Id) return Node_Id;Returns the first child of a node, or Null_Node if no children.
Next_Sibling Function:
function Next_Sibling(Doc : Document; Node : Node_Id) return Node_Id;Returns the next sibling of a node, or Null_Node if no more siblings.
Kind Function:
function Kind(Doc : Document; Node : Node_Id) return Node_Kind;Returns the type of node (Element, Text, Comment, etc.).
Name Function:
function Name(Doc : Document; Node : Node_Id) return String;Returns the element name (tag name). Only valid for Element nodes.
Text_Value Function:
function Text_Value(Doc : Document; Node : Node_Id) return String;Returns the text content. Only valid for Text nodes.
Parse_File Function:
function Parse_File(Path : String) return Parse_Result;Convenience function that reads a file and parses it. Same as SML.DOM.Parser.Parse_File.
Problem: Parse_File fails with file not found error
Solution:
- Run the program from the
lesson-1-basic-parsingdirectory - Or use absolute paths to the fixtures
- Check that
fixtures/directory exists
cd lesson-1-basic-parsing
./bin/lesson_1_basic_parsing # CorrectProblem: Document fails to parse with unexpected token error
Solution: Check that your SML document is well-formed:
- All tags are properly closed
- Tags are properly nested
- No special characters are unescaped
- Attributes are not used (SML doesn't support attributes currently)
Example of invalid SML:
<task>
<title>Broken</title>
<!-- Missing closing tag -->Problem: Program crashes with segmentation fault or access violation
Solution: Ensure you're checking for Null_Node before accessing nodes:
-- ✓ Correct
if Node /= Null_Node then
Text := Name(Doc, Node);
end if;
-- ✗ Wrong - may crash if Node is Null_Node
Text := Name(Doc, Node); -- Dangerous!Problem: Text_Value or Name returns empty string
Solution:
- For
Name: Ensure the node is an Element, not Text - For
Text_Value: Ensure the node is Text, not Element - Check that the element actually has text content
-- Common mistake
if Kind(Doc, Node) = Element then
Text := Text_Value(Doc, Node); -- Wrong! Element has no text value
end if;
-- Correct
if Kind(Doc, Node) = Element then
Text_Node := First_Child(Doc, Node);
if Text_Node /= Null_Node and then Kind(Doc, Text_Node) = Text then
Text := Text_Value(Doc, Text_Node); -- Correct!
end if;
end if;Documents use Ada's limited types, which means:
- No Assignment: Cannot copy or assign documents
- Scope-Bound: Documents are only valid within their declaration scope
- Constant Only: Must use
constantwhen declaring with initialization
-- ✓ Correct
declare
Parse_Res : constant Parse_Result := Parse_File("doc.sml");
begin
if Parse_Res.Success then
-- Use Parse_Res.Doc here
Process_Document(Parse_Res.Doc);
end if;
end; -- Document is destroyed here
-- ✗ Wrong - cannot pass across scopes
Global_Doc : Document; -- Not allowed
procedure Load_Doc is
Parse_Res : constant Parse_Result := Parse_File("doc.sml");
begin
Global_Doc := Parse_Res.Doc; -- Won't compile!
end Load_Doc;SML uses bounded memory allocation:
- Max Nodes: Default 2,000 nodes per document
- Max String Storage: Default 200KB for all text content
- Predictable: No dynamic allocation, all memory pre-allocated
For larger documents, adjust in SML library configuration:
Max_Document_Nodes : constant := 10_000;
Max_String_Storage : constant := 1_000_000;- Speed: Typically 100KB/sec on modern hardware
- Linear: O(n) in document size
- Memory: Fixed allocation, no garbage collection
- Suitable For: Documents up to several MB
Try modifying the program to:
- Count tasks by priority level (1-5)
- Find and display all high-priority (4-5) tasks
- List tasks that are blocked
- Calculate the percentage of completed tasks
- Find tasks with specific keywords in titles
- Build a helper to count all Element nodes
- Display the document structure as a tree
Proceed to Lesson 2: Schema Validation to learn how to validate documents against schemas with custom types and constraints.
Lesson 1 of 5 | Next: Lesson 2 →