-
Notifications
You must be signed in to change notification settings - Fork 21
ils: library KPIs #104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
ils: library KPIs #104
Conversation
868574c to
9471bc3
Compare
| 2. Outsider/Stakeholder Dashboard | ||
| The Audience of this Dashboard is not the librarians, but rather the patrons and management. | ||
| It displays simpler KPIs that show if the library is working well, while being less detailed and technical than the internal dashboard. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure which KPIs would fit in the internal which to external dashboard, was this voiced as a requirement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, but since this RFC just handles the API endpoints, this does not impact the implementation of this RFC.
| - `after_record_insert` | ||
| - `after_record_update` | ||
| - `after_record_delete` | ||
| - aggregate: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify this part - for loans, we already have all the loans indexed with the creation date - why do we need to generate state events? couldn't we just query the loan search and get the answer? is it that because we need it as a number per day?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because:
- With the way this will be implemented, we can easily track the creation, update and deletion of all types of records in ILS. While we could also search for the creation date of loans, this is not as easily possible with updates/deletions. If we also use this stat for loan creations, we stay consistent with how the stat is implemented for the other record types.
- deleted records would no longer show up if we just look at the creation date. While this is no problem for loans, as they can not be deleted, it is a problem for other record types (e.g. documents).
|
|
||
| ### The specific KPIs are implemented as follows | ||
| 1. Turnover rate of the Library collection: | ||
| 1. number of new loans / number of loanable items |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be great to note somewhere what kind of outcome is expected to measure. As in: why do we divide by number of loanable items?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This KPI is used to measure the rate of use of the collection.
It is described in ISO 11620:2023 A.2.1.1. I will also mention the ISO in the RFC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please note down what we decided for implementation of number of loanable items per unit of time
| - aggregate: | ||
| - count | ||
| - daily | ||
| - over composite field `loan_creation_method__document_availability_during_loan_creation`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets go through this section together IRL
| - we also add information about the provider to the event for interlibrary loans, so future aggregations can differentiate the waiting time based on the provider | ||
|
|
||
|
|
||
| 4. Number of changes to the Library collections: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this section I believe are missing to store also curators id - we are frequently asked for stats for an individual curator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will ask whether this is wanted.
| ## Drawbacks | ||
|
|
||
| ### Periodic Stats | ||
| By implementing periodic stats to be added as events, it is easy to run into situations, where invenio-stats always only aggregates one document per time period. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I understand this part, let's chat
rfcs/ils-0104-library-kpis.md
Outdated
| So a loan would also contain the field `waiting_time`. | ||
| This would allow to directly query the records indices for the KPIs, without the need of an additional stats index. | ||
| But this would introduce a lot of fields to the records indices, which are only used for KPIs. | ||
| Additionally, the dashboard would need to perform a lot of queries and aggregations, which might overload the search system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the stats will be queried by search, can you explain how is it different ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stats served by invenio-stats are already partially aggregated (e.g. on a daily basis).
If we use invenio-stats, a request asking for the number of new loans for a month leads to an aggreagtion that has to take a maxmimum of 31 documents into account.
When not using invenio-stats, the query triggers an aggregation over all new loans in those 31 days, which might be a large number of documents.
I updated the RFC to better describe this.
|
|
||
| ### KPIs | ||
|
|
||
| #### KPI 3.1 - Extracting for loan creation method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's discuss on this part
|
|
||
| Alternatively, we could just listen to the signal `after_record_insert` from `invenio_records`, filter for loans and only during event generation or preprocessing extract the creation method. (Unsure if possible) | ||
|
|
||
| #### Median vs. Average |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this discussed with the librarians? is it possible to easily have both or leave it up to the dashboard "client"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Average was the one requested in the ticket but we could also add both. But this also depends on this discussion
|
|
||
| #### Aggregation period | ||
| We aggregate most stats on a daily basis. | ||
| An exception of this are the loan durations and waiting times, which are aggregated monthly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the motivation behind montly aggregation? what do we gain/lose by aggregating daily too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it does not really matter for the average, it matters for the median (discussed here).
For average, we offer two separate queries that allows the dashboard to compute the average x/y
- sum of metric in index: x
- count of documents in index: y
We could aggregate both numbers daily and still, the dashboard could display the average for the whole month by doing
But for median we have to decide the granularity during the aggregation and here daily does not really make sense.
As we might want to add the median and currently only one granularity per aggregation is allowed by the StatAggregator in invenio-stats, I decided to go for month.
9471bc3 to
dd9db5a
Compare
|
|
||
| By the design of `invenio-stats`, all stats are aggregated. | ||
| Currently, this aggregation is always done over a certain `field` (a field to group the documents in the events index by). | ||
| Some of our KPIs do not have such a `field`, as all documents should be grouped together. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain better what is the expected change? It is unclear what all documents should be grouped together practically means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please see the subsection "Global Aggregation - global-aggregation" and the attached PR
dd9db5a to
c16a932
Compare
c16a932 to
c1ba1eb
Compare
closes: CERNDocumentServer/cds-ils#878