-
Notifications
You must be signed in to change notification settings - Fork 17
Open
Description
When NameRes is run with LOAD_DATA=yes, we create three pods:
- A web pod, which acts independently of the others
- A restore job, which completes its work and goes to "Succeeded"
- A Solr pod, which has:
- An init-container that downloads the Solr database from RENCI, and then
- A Solr database that the restore job will talk to in order to start the restoration process
Somehow, on ITRB CI on May 2-3, 2024, we ran into a situation where
- On May 2, 2024, NameRes was restarted with LOAD_DATA=yes
- Some updates might have caused it to restart -- it's unclear whether or not it was restarted in LOAD_DATA=yes or no mode, but let's assume LOAD_DATA=no, as that's the default for
- On May 3, 2024, @pabbathreddya2 and I found that there were two pods (solr and web). We observed that there was around 158G of data in the Solr pod. Updating it with LOAD_DATA=no did not change the pods, so @pabbathreddya2 did a helm uninstall and then restarted it with LOAD_DATA=yes, which restarted in the Solr pod becoming essentially empty of data. My theory is that the 158G of data was the download, which is deleted at the start of a new download (since I think Rewrite NameRes script to delete the database later in the download process #842 is fixed now?), so that the database had in fact been wiped previously -- but how? We would see this if the PVC was wiped, but it's unclear how that would happen.
Essentially, this boils down to: how can the pods be in LOAD_DATA=no state (with two pods instead of three), but then restarting the Solr job causes it to start the download as if it's in LOAD_DATA=yes state?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels