Skip to content

Conversation

@Rossi-Luciano
Copy link
Collaborator

O que esse PR faz?

Adiciona 25 campos MODS (Metadata Object Description Schema) ao ArticleOAIIndex, complementando Dublin Core com metadados bibliográficos estruturados.

Campos adicionados (7 grupos):

  • name.* (4 campos) - Autoria com ORCID e afiliações
  • subject.* (4 campos) - Taxonomias controladas (WoS, CAPES, Subject Areas)
  • relatedItem.host.* (5 campos) - Contexto de publicação (journal, volume, issue)
  • part.* (3 campos) - Paginação e elocation-id
  • identifier.* (3 campos) - ISSNs estruturados (print/electronic/linking)
  • originInfo.* (3 campos) - Publisher, local, data w3cdtf
  • accessCondition.* (3 campos) - Licenças e políticas Open Access

Benefícios:

  • Métricas bibliométricas avançadas (análise por instituição, autor, área)
  • Compliance com mandatos de fomento (FAPESP, CNPq, CAPES)
  • Interoperabilidade com repositórios institucionais e sistemas CRIS
  • Mantém 100% compatibilidade com índice Dublin Core existente

Onde a revisão poderia começar?

Arquivo: article/search_indexes.py

Ordem sugerida:

  1. Declaração dos campos (linhas ~60-140) - verificar nomenclatura metadata.mods.*
  2. Métodos prepare_mods_name_* (linhas ~200-350) - extração de ORCIDs e afiliações
  3. Métodos prepare_mods_subject_* (linhas ~350-450) - integração com taxonomias

Como este poderia ser testado manualmente?

1. Reindexar Solr:

python manage.py rebuild_index --noinput
python manage.py haystack_info

2. Verificar campos no Solr:

curl "http://localhost:8983/solr/<core>/select?q=*:*&rows=1&wt=json&indent=true"
curl "http://localhost:8983/solr/<core>/admin/luke?wt=json" | grep "metadata.mods"

3. Testar queries MODS:

# Artigos com ORCID
curl "http://localhost:8983/solr/<core>/select?q=metadata.mods.name.orcid:0000*"

# Artigos Open Access Gold
curl "http://localhost:8983/solr/<core>/select?q=metadata.mods.accessCondition.openAccess:gold"

# Artigos de categorias WoS
curl "http://localhost:8983/solr/<core>/select?q=metadata.mods.subject.wos:Medicine*"

Algum cenário de contexto que queira dar?

NA

Quais são tickets relevantes?

NA

Referências

Documentação MODS:

Elementos específicos:

Padrões relacionados:

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds 25 MODS (Metadata Object Description Schema) fields to the ArticleOAIIndex to complement the existing Dublin Core metadata with more structured bibliographic information. The additions enable advanced bibliometric analysis, better interoperability with institutional repositories, and compliance with funding agency mandates.

Key Changes:

  • Added 25 new MODS field declarations organized into 7 semantic groups (name, subject, relatedItem, part, identifier, originInfo, accessCondition)
  • Implemented 25 corresponding prepare_* methods to extract and format data from Article model relationships
  • Added 2 private helper methods for formatting affiliation and location data

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 1458 to 1472
if (pub_history.organization.acronym and
pub_history.organization.acronym.strip()):
pub_name += f" ({pub_history.organization.acronym.strip()})"

# Fallback para estrutura Institution legada
elif (pub_history.institution and
pub_history.institution.institution and
pub_history.institution.institution.institution_identification and
pub_history.institution.institution.institution_identification.name):

inst_id = pub_history.institution.institution.institution_identification
pub_name = inst_id.name.strip()

if inst_id.acronym and inst_id.acronym.strip():
pub_name += f" ({inst_id.acronym.strip()})"
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Code duplication in acronym handling. Lines 1458-1460 and 1471-1472 have identical logic for appending acronyms in parentheses. Consider extracting to a helper method:

def _format_name_with_acronym(self, name, acronym):
    if not name:
        return None
    result = name.strip()
    if acronym and acronym.strip():
        result += f" ({acronym.strip()})"
    return result

Copilot uses AI. Check for mistakes.
Comment on lines 1573 to 1579
if (hasattr(obj, 'pub_date_month') and obj.pub_date_month and
str(obj.pub_date_month).strip()):
date_parts.append(str(obj.pub_date_month).zfill(2))

if (hasattr(obj, 'pub_date_day') and obj.pub_date_day and
str(obj.pub_date_day).strip()):
date_parts.append(str(obj.pub_date_day).zfill(2))
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential issue with date formatting: The code uses zfill(2) to pad month and day values (lines 1575, 1579), but doesn't validate that the values are numeric or within valid ranges. If pub_date_month or pub_date_day contain non-numeric strings, zfill will still work but produce invalid dates like "2023-ab-15". Consider adding validation or using proper date formatting.

Suggested change
if (hasattr(obj, 'pub_date_month') and obj.pub_date_month and
str(obj.pub_date_month).strip()):
date_parts.append(str(obj.pub_date_month).zfill(2))
if (hasattr(obj, 'pub_date_day') and obj.pub_date_day and
str(obj.pub_date_day).strip()):
date_parts.append(str(obj.pub_date_day).zfill(2))
# Validate and add month
if (hasattr(obj, 'pub_date_month') and obj.pub_date_month and
str(obj.pub_date_month).strip()):
try:
month_int = int(obj.pub_date_month)
if 1 <= month_int <= 12:
date_parts.append(f"{month_int:02d}")
# Validate and add day only if month is valid
if (hasattr(obj, 'pub_date_day') and obj.pub_date_day and
str(obj.pub_date_day).strip()):
try:
day_int = int(obj.pub_date_day)
if 1 <= day_int <= 31:
date_parts.append(f"{day_int:02d}")
except (ValueError, TypeError):
pass
except (ValueError, TypeError):
pass

Copilot uses AI. Check for mistakes.
)

# MODS - originInfo
# Editora(s)
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling error: "Editora(s)" should use English "Publisher(s)" to maintain consistency with the codebase (all other comments are in English except Portuguese JUSTIFICATIVA sections).

Suggested change
# Editora(s)
# Publisher(s)

Copilot uses AI. Check for mistakes.
Comment on lines 696 to 698
researchers = obj.researchers.prefetch_related(
'researcheraka_set__researcher_identifier'
).all()
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prefetch_related path 'researcheraka_set__researcher_identifier' may cause N+1 queries if researcher_identifier has foreign keys. Consider using select_related for the researcher_identifier relationship as well: prefetch_related(Prefetch('researcheraka_set', queryset=ResearcherAKA.objects.select_related('researcher_identifier')))

Copilot uses AI. Check for mistakes.
Comment on lines 707 to 709
orcid = aka.researcher_identifier.identifier.strip()
if orcid and orcid not in orcids:
orcids.append(orcid)
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The logic checks if orcid and orcid not in orcids before appending, which should prevent duplicates. However, this check happens inside nested loops (researchers -> aka), and the same ORCID from the same researcher could still be checked multiple times unnecessarily. The current implementation is correct but not optimal - consider collecting into a set first, then converting to list at the end for better performance with large datasets.

Copilot uses AI. Check for mistakes.
Comment on lines 1743 to 1799
location = institution.location

# Cidade
if location.city and hasattr(location.city, 'name') and location.city.name:
parts.append(location.city.name.strip())

# Estado
if location.state:
state_text = location.state.acronym or location.state.name
if state_text:
parts.append(state_text.strip())

# País
if location.country and location.country.name:
parts.append(location.country.name.strip())

except (AttributeError, TypeError):
# Fallback para string simples
return str(affiliation) if affiliation else None

return " - ".join(parts) if parts else None

def _format_place_location(self, location):
"""
Formata localização em string: "Cidade, Estado, País"
Args:
location: objeto Location
Returns:
str: "São Paulo, SP, Brasil"
"""
if not location:
return None

parts = []

try:
# Cidade
if location.city and hasattr(location.city, 'name') and location.city.name:
parts.append(location.city.name.strip())

# Estado (preferir sigla)
if location.state:
state_text = location.state.acronym or location.state.name
if state_text:
parts.append(state_text.strip())

# País
if location.country and location.country.name:
parts.append(location.country.name.strip())

except (AttributeError, TypeError):
return None

return ", ".join(parts) if parts else None

Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Code duplication between _format_affiliation and _format_place_location. Both methods have similar logic for formatting location information (city, state, country). The main difference is the separator (- vs , ) and the inclusion of institution info in _format_affiliation. Consider extracting the common location formatting logic into a separate helper method to reduce duplication.

Suggested change
location = institution.location
# Cidade
if location.city and hasattr(location.city, 'name') and location.city.name:
parts.append(location.city.name.strip())
# Estado
if location.state:
state_text = location.state.acronym or location.state.name
if state_text:
parts.append(state_text.strip())
# País
if location.country and location.country.name:
parts.append(location.country.name.strip())
except (AttributeError, TypeError):
# Fallback para string simples
return str(affiliation) if affiliation else None
return " - ".join(parts) if parts else None
def _format_place_location(self, location):
"""
Formata localização em string: "Cidade, Estado, País"
Args:
location: objeto Location
Returns:
str: "São Paulo, SP, Brasil"
"""
if not location:
return None
parts = []
try:
# Cidade
if location.city and hasattr(location.city, 'name') and location.city.name:
parts.append(location.city.name.strip())
# Estado (preferir sigla)
if location.state:
state_text = location.state.acronym or location.state.name
if state_text:
parts.append(state_text.strip())
# País
if location.country and location.country.name:
parts.append(location.country.name.strip())
except (AttributeError, TypeError):
return None
return ", ".join(parts) if parts else None
location_str = self._format_location_parts(institution.location, separator=" - ")
if location_str:
parts.append(location_str)
except (AttributeError, TypeError):
# Fallback para string simples
return str(affiliation) if affiliation else None
return " - ".join(parts) if parts else None
def _format_location_parts(self, location, separator=", "):
"""
Helper to format location as "City, State, Country" or with custom separator.
Returns None if location is not valid.
"""
if not location:
return None
parts = []
try:
# Cidade
if location.city and hasattr(location.city, 'name') and location.city.name:
parts.append(location.city.name.strip())
# Estado (preferir sigla)
if location.state:
state_text = location.state.acronym or location.state.name
if state_text:
parts.append(state_text.strip())
# País
if location.country and location.country.name:
parts.append(location.country.name.strip())
except (AttributeError, TypeError):
return None
return separator.join(parts) if parts else None
def _format_place_location(self, location):
"""
Formata localização em string: "Cidade, Estado, País"
Args:
location: objeto Location
Returns:
str: "São Paulo, SP, Brasil"
"""
return self._format_location_parts(location, separator=", ")

Copilot uses AI. Check for mistakes.

def prepare_mods_part_elocation(self, obj):
"""
elocation-id (para artigos sem paginação tradicional)
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Inconsistent capitalization in comment: "elocation-id" starts with lowercase while all other field type comments in this section use title case or proper capitalization (e.g., "Página inicial do artigo", "ISSN da versão impressa"). Consider using "Elocation-id (para artigos sem paginação tradicional)" for consistency.

Suggested change
elocation-id (para artigos sem paginação tradicional)
Elocation-id (para artigos sem paginação tradicional)

Copilot uses AI. Check for mistakes.
Exemplo: "e370704"
"""
# Só retorna elocation se NÃO houver paginação tradicional
if not obj.first_page and obj.elocation_id:
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for elocation_id returns None if first_page exists, but doesn't check if first_page is actually a valid/non-empty value. If first_page is an empty string or whitespace-only, the method will return None even though there's a valid elocation_id. Consider checking: if not obj.first_page or not str(obj.first_page).strip() instead of just if not obj.first_page.

Suggested change
if not obj.first_page and obj.elocation_id:
if (not obj.first_page or not str(obj.first_page).strip()) and obj.elocation_id:

Copilot uses AI. Check for mistakes.
Comment on lines 1444 to 1445
if not (obj.journal and hasattr(obj.journal, 'publisher_history') and
obj.journal.publisher_history.exists()):
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Complex nested condition is hard to read. The condition spans lines 1444-1445 and checks three levels deep. Consider extracting to a helper variable for better readability:

has_publisher_history = (obj.journal and 
                         hasattr(obj.journal, 'publisher_history') and
                         obj.journal.publisher_history.exists())
if not has_publisher_history:
    return publishers
Suggested change
if not (obj.journal and hasattr(obj.journal, 'publisher_history') and
obj.journal.publisher_history.exists()):
has_publisher_history = (
obj.journal and hasattr(obj.journal, 'publisher_history') and obj.journal.publisher_history.exists()
)
if not has_publisher_history:

Copilot uses AI. Check for mistakes.
Comment on lines 1474 to 1475
if pub_name and pub_name not in publishers:
publishers.append(pub_name)
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Missing strip() call on comparison. Line 1474 checks if pub_name and pub_name not in publishers but doesn't call .strip() on pub_name before the comparison, while it was stripped when building it (lines 1456, 1469). If pub_name somehow contains only whitespace after stripping during construction (which would make it falsy), this could theoretically pass the if pub_name check. However, since the strings are already stripped during construction, this is likely safe but inconsistent with the defensive pattern used elsewhere.

Suggested change
if pub_name and pub_name not in publishers:
publishers.append(pub_name)
if pub_name:
pub_name = pub_name.strip()
if pub_name and pub_name not in publishers:
publishers.append(pub_name)

Copilot uses AI. Check for mistakes.
- Extrai métodos auxiliares para eliminar duplicação de código
- Adiciona validação de datas com verificação de ranges (mês 1-12, dia
1-31)
- Otimiza queries de ORCID com Prefetch e select_related
- Usa set para coleta de ORCIDs (O(1) vs O(n) em lookups)
- Remove verificações hasattr redundantes em blocos try-except
- Adiciona variáveis intermediárias para condições complexas
- Valida valores vazios/whitespace antes de comparações
from haystack import indexes
from legendarium.formatter import descriptive_format
from django.db.models import Prefetch
from researcher.models import ResearcherAKA
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rossi-Luciano não vai precisar de ResearcherAKA

)
).all()

for researcher in researchers:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rossi-Luciano consulte o modelo Article. Veja como é a relação m2m com Researcher. Consulte o modelo Researcher. Veja que ele tem property orcid:

def orcid(self):

Comment on lines 674 to 683
EXEMPLO XML (MODS 3.7):
<name type="personal">
<namePart>Silva, João</namePart>
<nameIdentifier type="orcid" authority="orcid">
0000-0001-2345-6789
</nameIdentifier>
<role>
<roleTerm type="text" authority="marcrelator">author</roleTerm>
</role>
</name>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rossi-Luciano como a partir de uma lista de orcid será possível associar cada orcid ao autor correspondente corretamente?

thematic_area = thematic_area_journal.thematic_area

# Usar hierarquia: level2 > level1 > level0
topic_text = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rossi-Luciano qual o motivo deste critério de escolha das áreas? Por que não foi usado todos os níveis?

Comment on lines 1079 to 1081
if obj.issue and obj.issue.volume:
return str(obj.issue.volume).strip()
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rossi-Luciano troque por:

try:
    return str(obj.issue.volume).strip()
except (AttributeError, TypeError, ValueError):
    return None

Comment on lines 1118 to 1121
if obj.issue and obj.issue.number:
return str(obj.issue.number).strip()
return None

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rossi-Luciano veja comentário anterior

str: Designação do suplemento ou None
Exemplo: "suppl.1"
"""
if obj.issue and obj.issue.supplement:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rossi-Luciano comentário anterior

Comment on lines 1410 to 1414
has_publisher_history = (
obj.journal
and hasattr(obj.journal, "publisher_history")
and obj.journal.publisher_history.exists()
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rossi-Luciano elimine os códigos que testem o .exists() porque isso é uma consulta. Eliminando isso, as consultas ao banco de dados caem pela metade.


parts = []

try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rossi-Luciano observe que nos próprios modelos este tipo de questão já está resolvida:

def formatted_location(self):

Copy link
Member

@robertatakenaka robertatakenaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rossi-Luciano Por favor, fiz alguns comentários que devem se estender a outros casos não comentados. Fazer as correções.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if official.issn_print and official.issn_print not in issns:
issns.append(official.issn_print)

if official.issn_electronic and official.electronic not in issns:
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a typo in the attribute name. Line 1038 uses official.electronic but should be official.issn_electronic to check for duplicates correctly. This will cause the condition to fail since electronic is not a valid attribute.

Suggested change
if official.issn_electronic and official.electronic not in issns:
if official.issn_electronic and official.issn_electronic not in issns:

Copilot uses AI. Check for mistakes.
Comment on lines 507 to 528
"""..."""
if obj.researchers:
researchers = obj.researchers.select_related("person_name").filter(
person_name__isnull=False
)
return set([str(researcher.person_name) for researcher in researchers])
return list(set([str(researcher.person_name) for researcher in researchers]))
return []

def prepare_collab(self, obj):
"""This is the instituional author."""
if obj.collab:
return set([collab.collab for collab in obj.collab.all()])
return list(set([collab.collab for collab in obj.collab.all()]))
return []

def prepare_kw(self, obj):
"""The keywords of the article."""
if obj.keywords:
return set([keyword.text for keyword in obj.keywords.all()])
return list(set([keyword.text for keyword in obj.keywords.all()]))
return []

def prepare_description(self, obj):
"""The abstracts of the articles
This is a property that filter by article ``DocumentAbstract.objects.filter(article=self)``
"""
"""The abstracts of the articles..."""
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The docstrings for several methods have been reduced to just "..." which loses important documentation. Consider keeping the original docstrings or providing meaningful descriptions. For example, prepare_creator previously explained the select_related optimization, and prepare_description explained the data source.

Copilot uses AI. Check for mistakes.
Comment on lines 875 to 897
def prepare_mods_subject_area(self, obj):
"""..."""
areas = []

# Padronizado: .exists() direto
if obj.journal and obj.journal.subject.exists():
for subject_area in obj.journal.subject.all():
if subject_area.value and subject_area.value.strip():
areas.append(subject_area.value.strip())

return areas

def prepare_mods_subject_wos(self, obj):
"""..."""
wos_categories = []

# Padronizado: .exists() direto
if obj.journal and obj.journal.wos_area.exists():
for wos_category in obj.journal.wos_area.all():
if wos_category.value and wos_category.value.strip():
wos_categories.append(wos_category.value.strip())

return wos_categories
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The docstrings for prepare_mods_subject_area and prepare_mods_subject_wos are abbreviated to "..." while other similar methods have comprehensive documentation. For consistency and maintainability, these methods should have full docstrings explaining their purpose, data sources, and MODS mapping like the other methods in this section.

Copilot uses AI. Check for mistakes.
return ["com_%s" % col for col in obj.collections]
try:
return ["com_%s" % col.acron3 for col in obj.collections if col.acron3]
except Exception:
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Using a bare except Exception: can hide unexpected errors and make debugging difficult. Consider catching specific exceptions like AttributeError or TypeError that might occur when accessing col.acron3. This makes the error handling more explicit and maintainable.

Suggested change
except Exception:
except (AttributeError, TypeError):

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants