Advanced Digital Editing April 2022

Advanced Digital Editing: a 3-day short course

Instructors: Dr Gabriel Bodard and Dr Christopher Ohge

26, 27 and 29 April 2022, 15.30–17.00 GMT

Have you taken an introduction to the Text Encoding Initiative (TEI), but are unsure of what to do next? Have you started a digital edition, but want to learn more about customisation, querying, and transforming your TEI into publishing formats? Taught by two scholars with years of practical editing experience, this advanced digital scholarly editing module can help take your digital editing further, showing through a mix of asynchronous materials and hands-on online workshops on how to implement advanced computational methods. After a brief refresher on TEI encoding, students will focus on XPath, XSLT, and publishing tools, the building blocks of querying and transforming TEI data.

Please note: Students in this module must have a basic understanding of TEI-XML (for example, previous TEI and digital editing short courses at SAS, EpiDoc workshops, and other DH workshops).

Software

Please make sure you have downloaded the oXygen XML editor. You can sign up for a 30-day free trial at https://www.oxygenxml.com/xml_editor/register.html.
If you need to, please create an account on Github so that you can use our Issuetracker forum for questions and discussion between the live sessions.

Files for practice (download and extract to new folder)

Course format

This short course involves a combination of asynchronous and synchronous (live) sessions all hosted virtually. You must watch the relevant video tutorials and practice the exercises before each live session, at which we will review and answer questions, and discuss any other issues that arise. The zoom sessions will be workshops, not lectures.

Schedule

Before the workshop:

If you need to refresh or revise your TEI / EpiDoc XML knowledge you can find tutorials and other training materials at:

The following tutorials on TEI schema and ODD customisation are optional, but we include them because customisation is an important aspect of advanced editing, and it is a good way to get you thinking about TEI data models:

Day 1: Tuesday April 26, 2022

15:30–17:00 (UK time): live zoom session: introductions, discussion, TEI refresher exercise.

To watch and practice before session 2:

What’s the difference between CSS, XPath, XSLT and XQuery? (slides) (Gabriel Bodard and Christopher Ohge) (20 min)
Introduction to XPath I: What is XPath and what is good for? (slides) (Christopher Ohge) (15 min)
Introduction to XPath II: Concepts and Syntax (slides) (Christopher Ohge) (20 min)
Introduction to XPath III: Axes review and quiz (slides) (Christopher Ohge) (23 min)

Notes from the first session

For analytic encoding, see Chapter 17 of the TEI Guidelines.
Regular expressions: see the Wikipedia entry.
NLP = Natural Language Processing. Melanie Walsh has a good tutorial if you’re interested in NLP at https://melaniewalsh.github.io/Intro-Cultural-Analytics/05-Text-Analysis/13-POS-Keywords.html?highlight=nlp

Day 2: Wednesday April 27, 2022

15:30–17:00 (UK time): live zoom session: exercises and questions on XPath

Exercises Part I:

Aims of XPath Exercise

Gain familiarity with traversing and searching your XML tree
Understand the basic syntax for XPath functions
Understand how to generate statistics about your document using XPath

(Use the bad-hamlet.xml file)

Path expressions

Write an absolute path that finds all speech elements
Write an absolute path that finds all role attribute values in the cast list
Write a relative path that finds all speaker elements
Write a relative path that finds all who attributes

Axis expressions

Write an axis expression that finds all sibling elements of second-level divs
Write an axis expression that finds all parent nodes of stage elements
Write an axis expression that finds all speeches that come before or after a Hamlet speech.

Predicates

Find all speech elements for all speakers except Hamlet
Find all speech elements by Hamlet and Ophelia
Find the last line of each speech by Ophelia

Exercises Part II:

Functions

Write a function to count speakers
Write a function to list only the distinct speakers (so a list of the speakers)
Write a function to return all lines in speeches that contain the string ‘Hamlet’ (except speaker elements)
Find the string length of each of Hamlet’s speeches.
Write a function to return all first lines of speeches greater than 100 characters
Calculate the average character count of Hamlet’s speeches.
List the distinct values of each speaker (i.e. list of characters) in Act 1

Bonus exercises: XPath Builder

Use the previous expression to build a list of distinct values of each speaker separated by commas
Write an XPath expression to generate an alphabetised list of words spoken by Ophelia that came after ‘I’.

Exercise answers

Find the answers to the XPath exercises here.

To watch and practice before session 3:

How to run XSLT on XML in Oxygen (Christopher Ohge and Gabriel Bodard) (15 min)
Rendering an edition in HTML: primer (slides) (Gabriel Bodard) (20 min)
XSLT 1: Basics: push processing (slides) (Gabriel Bodard) (35 min)
XSLT 2: Advanced features: pull processing (slides) (Gabriel Bodard) (30 min)
ZIP package for XSLT exercises (download and extract to new folder)

Day 3: Friday April 29, 2022

15:30–17:00 (UK time): live zoom session: exercises and questions on XSLT; feedback

XSLT exercise 1: “Push!”

Take the file Dawn-1-1-1.xml from /xml directory
Take the stylesheet transformer.xsl from /xslt directory
Create a new Oxygen transformation scenario to apply the stylesheet to the xml file
- What do you see? Why?
- What would you like to see?
- What templates do you need to add to get there?

XSLT exercise 2: “Pull!”

Take the xml file bad-hamlet.xml from the /xml directory
Starting from the transformer.xsl and cruncher.xsl stylesheets, which we used on the Dawn file, create a new stylesheet to look for-each <castItem>
Create a list of all the lines spoken by each cast member
- Can you find unique lines?
- Can you sort lines alphabetically?
- Can you think of anything else useful to do with them?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Advanced Digital Editing April 2022

Advanced Digital Editing: a 3-day short course

Instructors: Dr Gabriel Bodard and Dr Christopher Ohge

26, 27 and 29 April 2022, 15.30–17.00 GMT

Software

Files for practice (download and extract to new folder)

Course format

Schedule

Before the workshop:

Day 1: Tuesday April 26, 2022

Notes from the first session

Day 2: Wednesday April 27, 2022

Exercises Part I:

Exercises Part II:

Bonus exercises: XPath Builder

Exercise answers

To watch and practice before session 3:

Day 3: Friday April 29, 2022

XSLT exercise 1: “Push!”

XSLT exercise 2: “Pull!”

Example solutions for exercises:

Other resources:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally