-
Notifications
You must be signed in to change notification settings - Fork 5
Advanced Digital Editing April 2022
Have you taken an introduction to the Text Encoding Initiative (TEI), but are unsure of what to do next? Have you started a digital edition, but want to learn more about customisation, querying, and transforming your TEI into publishing formats? Taught by two scholars with years of practical editing experience, this advanced digital scholarly editing module can help take your digital editing further, showing through a mix of asynchronous materials and hands-on online workshops on how to implement advanced computational methods. After a brief refresher on TEI encoding, students will focus on XPath, XSLT, and publishing tools, the building blocks of querying and transforming TEI data.
Please note: Students in this module must have a basic understanding of TEI-XML (for example, previous TEI and digital editing short courses at SAS, EpiDoc workshops, and other DH workshops).
- Please make sure you have downloaded the oXygen XML editor. You can sign up for a 30-day free trial at https://www.oxygenxml.com/xml_editor/register.html.
- If you need to, please create an account on Github so that you can use our Issuetracker forum for questions and discussion between the live sessions.
This short course involves a combination of asynchronous and synchronous (live) sessions all hosted virtually. You must watch the relevant video tutorials and practice the exercises before each live session, at which we will review and answer questions, and discuss any other issues that arise. The zoom sessions will be workshops, not lectures.
If you need to refresh or revise your TEI / EpiDoc XML knowledge you can find tutorials and other training materials at:
The following tutorials on TEI schema and ODD customisation are optional, but we include them because customisation is an important aspect of advanced editing, and it is a good way to get you thinking about TEI data models:
15:30–17:00 (UK time): live zoom session: introductions, discussion, TEI refresher exercise.
To watch and practice before session 2:
- What’s the difference between CSS, XPath, XSLT and XQuery? (slides) (Gabriel Bodard and Christopher Ohge) (20 min)
- Introduction to XPath I: What is XPath and what is good for? (slides) (Christopher Ohge) (15 min)
- Introduction to XPath II: Concepts and Syntax (slides) (Christopher Ohge) (20 min)
- Introduction to XPath III: Axes review and quiz (slides) (Christopher Ohge) (23 min)
-
For analytic encoding, see Chapter 17 of the TEI Guidelines.
-
Regular expressions: see the Wikipedia entry.
-
NLP = Natural Language Processing. Melanie Walsh has a good tutorial if you’re interested in NLP at https://melaniewalsh.github.io/Intro-Cultural-Analytics/05-Text-Analysis/13-POS-Keywords.html?highlight=nlp
15:30–17:00 (UK time): live zoom session: exercises and questions on XPath
Aims of XPath Exercise
- Gain familiarity with traversing and searching your XML tree
- Understand the basic syntax for XPath functions
- Understand how to generate statistics about your document using XPath
(Use the bad-hamlet.xml file)
- Path expressions
- Write an absolute path that finds all speech elements
- Write an absolute path that finds all role attribute values in the cast list
- Write a relative path that finds all speaker elements
- Write a relative path that finds all who attributes
- Axis expressions
- Write an axis expression that finds all sibling elements of second-level divs
- Write an axis expression that finds all parent nodes of stage elements
- Write an axis expression that finds all speeches that come before or after a Hamlet speech.
- Predicates
- Find all speech elements for all speakers except Hamlet
- Find all speech elements by Hamlet and Ophelia
- Find the last line of each speech by Ophelia
- Functions
- Write a function to count speakers
- Write a function to list only the distinct speakers (so a list of the speakers)
- Write a function to return all lines in speeches that contain the string ‘Hamlet’ (except speaker elements)
- Find the string length of each of Hamlet’s speeches.
- Write a function to return all first lines of speeches greater than 100 characters
- Calculate the average character count of Hamlet’s speeches.
- List the distinct values of each speaker (i.e. list of characters) in Act 1
- Use the previous expression to build a list of distinct values of each speaker separated by commas
- Write an XPath expression to generate an alphabetised list of words spoken by Ophelia that came after ‘I’.
Find the answers to the XPath exercises here.
- How to run XSLT on XML in Oxygen (Christopher Ohge and Gabriel Bodard) (15 min)
- Rendering an edition in HTML: primer (slides) (Gabriel Bodard) (20 min)
- XSLT 1: Basics: push processing (slides) (Gabriel Bodard) (35 min)
- XSLT 2: Advanced features: pull processing (slides) (Gabriel Bodard) (30 min)
- ZIP package for XSLT exercises (download and extract to new folder)
15:30–17:00 (UK time): live zoom session: exercises and questions on XSLT; feedback
- Take the file
Dawn-1-1-1.xmlfrom/xmldirectory - Take the stylesheet transformer.xsl from /xslt directory
- Create a new Oxygen transformation scenario to apply the stylesheet to the xml file
- What do you see? Why?
- What would you like to see?
- What templates do you need to add to get there?
- Take the xml file
bad-hamlet.xmlfrom the/xmldirectory - Starting from the transformer.xsl and cruncher.xsl stylesheets, which we used on the Dawn file, create a new stylesheet to look for-each
<castItem> - Create a list of all the lines spoken by each cast member
- Can you find unique lines?
- Can you sort lines alphabetically?
- Can you think of anything else useful to do with them?