Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
175 changes: 175 additions & 0 deletions src/content/docs/guides/harvesting.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,188 @@ title: Harvesting
---

import { Aside } from "@astrojs/starlight/components";
import { Steps } from "@astrojs/starlight/components";
import { Badge } from '@astrojs/starlight/components';
import { Card } from '@astrojs/starlight/components';


<Aside title="Glossary">

- **Harvesting Sources** - External systems that Meru is harvesting content from. For example, the URL of an OAIPMH feed. The harvesting source record would contain any protocol details or authentication.

- **Harvesting Sets** - Collections of content within a harvest source. In OAIPMH that would be a set that was exposed by the feed. They could represent a journal, departments, or grouped content in general.

- **Harvest Mappings** - Define the relationship between the content and the external sources and locations in Meru. They have templating logic that explains how we transform a piece of content or data in the harvested data to what we store in Meru.

- **Harvesting Attempts** - An individual event where Meru tries to retrieve and process content from a configured harvest source, based on the mappings and settings you've provided.

- **Harvest Records** - Singular flat collection of metadata that could be transformed into one or more entities in Meru. An OAIPMH feed typically doesn’t expose the full hierarchical structure of journal content. It doesn’t tell us the Journal → the volume → the issue → and the article. We just get the article and have to infer the hierarchy. A single harvest record can lead to multiple entities getting created in Meru.
</Aside>

## Access Harvesting Dashboard

<Steps>

1. Log in to the Meru Admin.
2. To get to the harvesting dashboard, go to **Manage** in the main navigation and click on **Harvesting**.

<Badge text="Note" variant="note" /> _The harvesting UI is only available to users with `global_admin` privileges._

3. The main page will show a list of your existing Harvest Sources. You’ll see each source’s name, number of harvest sets, harvest records, metadata formats, and the source URL.
</Steps>

## Create a New Harvest Source

<Aside title="Required source configuration details">
- Source endpoint (the URL)
- Harvest protocol (OAI-PMH)
- Metadata format (Esploro, Jats, Mets, Mods, or OAIDC)
</Aside>

<Steps>
1. Click on the **Harvest source +** button at the top right under the main navigation.
2. Create a **Name** for your harvest source.
3. Add an **Identifier** (all lowercase). This cannot be changed after the source is created and is used to help reference the source.
4. Add the source URL to the **URL** field.
5. Select your **Harvest protocol** and **Metadata formats**.
<Badge text="Note" variant="note" /> _These cannot be edited once the harvest source has been saved._
6. Configure your **Extraction mapping template**

Choose one of the default presets available in the **Examples** dropdown.

_The extraction mapping template defines how to transform source metadata—such as JATS—into a structured entity hierarchy suitable for harvesting. It acts as a bridge between the raw data and the system's expected format._

_The code editor uses a combination of **XML** and **Liquid** <a href="https://shopify.dev/docs/api/liquid" target="blank" rel="noopener noreferrer"/>. In most cases, the default mapping templates will work without modification. You may only need to customize the template if the source data contains legacy structures or unexpected formatting._

<Aside title="Reasons the default template may need to be updated">

- Adding custom fields (e.g., cover images, extra metadata)
- Addressing incomplete or non-standard source metadata
- Adjusting how Meru interprets or organizes incoming data (e.g., refining the hierarchy or entity relationships)

</Aside>


7. **Mapping options** should be left alone in most cases, unless cic suggests otherwise for a specific source.

_**Use metadata mappings** allow you to connect specific data patterns to different points in the hierarchy. By default, harvesting sends all data to a single target entity, but data isn’t always organized in the most ideal way. Mappings let you define how certain types of data should be routed, assigning them to the appropriate parent in the tree structure._

_Options that are selected will take precedence over what is set elsewhere._

8. If you would like the default records that can be included per harvest to be different than 80k, set it in the **Max records** field.
9. Click **Save**.
</Steps>

<Badge text="Note" variant="note" /> _Upon harvest source creation, the endpoint is queried to figure out what the sets are, and this continues at regular intervals to make sure the shape of the sets are up to date. These act as a useful reference to use when adding sets to [mappings](#create-a-harvest-mapping) and [attempts](#create-a-harvest-attempt)._

## Create a Harvest Mapping

_There may be cases where you want more control over harvesting, such as targeting a specific set and directing it to a different entity within Meru, running a harvest on a custom schedule, or using a single source to populate multiple entities. This is especially useful when you don’t want to harvest the entire source or need greater flexibility in how and when content is imported. In these situations, creating a dedicated harvest mapping is the recommended approach._

<Badge text="Note" variant="note" /> _Harvesting is limited to the set level, you can’t target an individual item within a set._

### Create a New Target Entity

To start a new harvest mapping within a harvest source, you need to manually add an empty parent entity (e.g. journal or series) and new community if necessary.

<Steps>
1. Go to your **Collections** tab in the main navigation.
2. Click on the **Add collection +** button under the main navigation.
3. Add all of the desired details to your new collection (**Community**, **Title**, **Visibility**, **Correct schema**, etc.)
4. Once it has been created and saved, go back to **Manage** → **Harvest** → **[Current harvest source]**
</Steps>
<Badge text="Note" variant="note" /> _Once an entity is created, the harvest process will recognize it and overwrite the existing record during future harvests. However, if the entity is deleted, it will be recreated in the next harvest._

### Configure Your Harvest Mapping
Now that you have a target entity, you are ready to create your Harvest Mapping.
<Steps>
1. Click on **Harvest mapping +.**
2. In the **Target entity** field, add your newly created entity.
3. If you want to target a specific set, add it to the **Harvest set** field. (The sets tab on the source has a list of all available sets within this source)
4. Select the correct **Metadata format** and use the **Mapping template** to customize your metadata if needed.
5. **Mapping options** that are selected will take precedence over what is set elsewhere (e.g. on the harvest source level)
6. If you would like the default records that can be included per harvest to be different than 80k, set it in the **Max records** field.
7. Customize the harvest **Schedule**. This field accepts a cron expression or human-readable expression. Leave blank if attempts should only be triggered manually.
8. Click **Save**.
</Steps>

## Create a Harvest Attempt

_Each attempt may be triggered manually (by an admin) or automatically (on a schedule, if one is defined). During an attempt, Meru queries the source endpoint, fetches available records, applies any extraction and metadata mappings, and creates or updates content in Meru accordingly._

_All harvest attempts are logged and viewable under the **Attempts** tab._

To create an attempt, start within a harvest source and click either the **Harvest attempt +** or **Harvest mapping +** button at the top right under the navigation.

### Harvest Attempt
<Steps>
1. The harvest source and metadata formats are unchangeable.
2. If you'd like to harvest a specific set, select it from the **Harvest set** dropdown.
3. In the **Target entity** field, add your newly created journal.
4. You can customize your Extraction mapping template if desired.
5. If you want to add any additional notes for this attempt, there is a field at the bottom of the page.
6. Click **Save**.
</Steps>

_After saving the attempt, associated records will begin appearing immediately in the **Records** tab. You’ll also see a **Messages** tab, which captures logs, status updates, and any errors or warnings generated during the harvest process. This provides real-time visibility into how your harvest is progressing and helps identify any issues that need attention._

### Harvest Mapping
The harvest is triggered automatically based on the mapping's schedule. In this case, the **harvest source**, **metadata formats**, **set**, and **target entity** are all preset. (See [Create a Harvest Mapping](#create-a-harvest-mapping))


## Navigate the Harvest Records Tab

_After entities have been successfully harvested and created, their status in the **Records** tab will change from `Pending` to `Active`._

<Steps>
1. Click on any **Identifier** to view the full details of that individual record.
</Steps>


<Badge text="Note" variant="note" /> *Identifiers are external to Meru—they are provided by the source and not generated within the system. You can use them to cross-reference records in your own systems.*

<Card title="Details">

This section displays the full raw metadata as received from the source.

<Badge text="Note" variant="note" /> _If an entity is missing expected properties, check the latest harvest attempt’s metadata here to verify whether the data was present in the original source._

</Card>
<Card title="Entities">

This section shows which entities were created in Meru based on the incoming metadata. You can drill down into each level of the hierarchy. Selecting an entity will take you to its **Manage** section, where you can work with that specific item. Admins will also see a list of harvest records associated with the entity within that section.
</Card>

## Navigate the Messages Tab

_You’ll find a **Messages** tab available at multiple levels—including the overall harvest source and within each individual harvest run—each displaying logs and feedback relevant to that specific context._

_While Meru allows you to make manual changes, we recommend making corrections and updates in the source system whenever possible. This ensures consistency and avoids discrepancies between harvests. Use the overwrite protection in Meru only when necessary—for example, to preserve manual edits that can’t be reflected in the source._

<Steps>
1. Use the **Debug** toggle in the top-right corner to control the level of detail displayed.
</Steps>

<Card title="Debug on">
When **Debug** is turned **on**, you’ll see a full stream of processing logs, including detailed updates and any error messages.
</Card>

<Card title="Debug off">
When **Debug** is **off**, the view is filtered to show only key issues—such as errors, fatal errors, and warnings—making it easier to focus on problems that require attention.
</Card>

## Make Changes to Harvested Entities in Meru

<Steps>
1. Make any desired changes to the entity.
2. Once an entity is active, navigate to it and click **Manage** → **Details**.
3. Scroll down to the **Harvesting** section.
</Steps>
*By default, any changes made through the admin interface will be preserved during future harvests. They will not be overwritten.*
<Steps>
4. If you want future harvests to overwrite your manual changes, check the box labeled **Allow overwriting?**

*This checkbox is not persistent — it must be checked each time you want to allow harvesting to make changes.*
*This ensures that manual edits are protected by default, and you are explicitly opting in to allow them to be replaced.*
<Badge text="Note" variant="note" /> _Note that the overwrite setting applies only to the specific entity. If a higher-level entity (like a volume or issue) is locked, harvesting can still proceed and update lower-level entities (like articles) within that hierarchy._
</Steps>
8 changes: 3 additions & 5 deletions src/styles/_reset.css
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/*
/*
Designed for cascade layers while still using :where to keep a low specificity for unlayered setups.
Resources:
https://www.joshwcomeau.com/css/custom-css-reset/
https://www.joshwcomeau.com/css/custom-css-reset/
https://github.com/mayank99/reset.css?tab=readme-ov-file
*/

Expand Down Expand Up @@ -90,10 +90,8 @@ main[tabindex="-1"] {
text-wrap: balance;
}

/* Reset list styles without losing accessibility */
:where(ol, ul) {
padding: 0;
list-style-type: "";
padding-inline-start: 1.2em;
}

/* Set default underline thickness */
Expand Down