Skip to content

Conversation

@Querela
Copy link
Contributor

@Querela Querela commented Jan 5, 2026

This PR adds the feature for job deletion to the REST API and exposes it as menu item in the WebUI.

What is included:

  • Add API handler for JobResource for action=delete to delete the job via getEngine().deleteJob(cj).
    • Checks first for job application context and will abort if exists.
    • Otherwise, after deletion, job configs will be updated (rescan) to not list the deleted job anymore.
  • Add "Delete Job" menu item to job web page
    • is only "enabled" when job has no application context (new/unbuild/torn-down jobs)
    • opens a dialog (similar to "Copy Job" menu item) to warn of results, with trigger button
  • Add "Delete Job" section to REST API documentation (readthedocs)

Reasons to include this PR:

  • Almost the full job lifecycle is possible via REST API / WebUI, from job creation, various job actions but the cleanup of jobs was missing.
  • Job deletion is possible by using any other active job and running custom scripts that delete the target job folder. That is even possible for the active job itself (ie. a job with application context) but leaves Heritrix in a broken state...
  • Having a well-defined API endpoint to handle cleanup (job deletion) avoid pitfalls such as using scripts or using direct filesystem access (on the server) without knowing the job status which could leave Heritrix in an invalid state.

Discussion points:

  • I used action=delete with the JobResource.
    • The "delete" action is NOT explicitly listed in the job's list of actions, similar to the copyTo action. In my opinion, deletion should not be an action that is immediately visible like the normal lifecycle actions such as build, launch, ...
    • I was thinking about adding the "delete" action to the EngineResource like add/create with a new deletepath parameter but this seemed like a small detour. (And could not be easily done in the WebUI or wouldn't make much sense for users if it were so.)
  • I did not add an automatic "teardown" before the "delete" action as this seemed a bit dangerous. Users have to explicitly set the job into an unbuilt/torn-down state to be able to delete the job. This should avoid some accidental deletions.

Questions:

  • getResponse().redirectSeeOther("/engine") (at the end of JobResource) seemed to be the only way to get the redirection the the engine webpage? Or is there a better way? EngineApplication did hardcode those /engine* paths, so it might be ok but a way to dynamically get those URIs would be nice. Using the EngineResource somehow?

@ato
Copy link
Collaborator

ato commented Jan 6, 2026

a way to dynamically get those URIs would be nice. Using the EngineResource somehow?

I don't think there's a straightforward way in the Restlet framework to get a Reference from Resource because the same Resource can be mounted at multiple locations.

I guess we could rely on the fact that the base ref redirects to /engine and use redirectSeeOther(getRequest().getResourceRef().getBaseRef()) but that adds an extra redirect step and to me seems less readable than "/engine".

I've used this pattern before in other applications:

class Link {
     static String toEngine()              { return base() + "/engine"; }
     static String toJob(String shortName) { return toEngine() + "/jobs/" + escape(shortName); }

     static String to(CrawlJob job)        { return toJob(job.getShortName()); }

     private static String base() {
         // get context path from app config
         // or from a thread local request context if its dynamic
     }
}
getResponse().redirectSeeOther(Link.toEngine());
<a href="${Link.to(crawlJob)}">${crawlJob.name}</a>

@Querela
Copy link
Contributor Author

Querela commented Jan 6, 2026

Ok. Then let's keep it so for now with the redirect?

I was also a bit confused about the "copyTo" result link generation that seem to assume to just replace the last path segment as it just uses the new job name as a relative path if I understand this correctly, so /engine/job/abc → + def/engine/job/def. But it seems to work as intended. Maybe something to look at later.

getResponse().redirectSeeOther(copyTo);

Looking a bit further, BaseResource also hard codes the /engine/ path.

return rootRef + "/engine/static/" + resource;

I would like a Link for better and safer link generation but most paths are so hardwired in the restlet package that this seems not necessary.

@ato ato merged commit 1ff35f6 into internetarchive:master Jan 6, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants