Gitlab Community Edition Instance

Snippets Groups Projects

CollateX: Normalize txt

Feature Requests

Description

Some manuscripts provide their text with diacritics while others don't. As a result, CollateX doesn't recognize words that are actually the same (except for the diacritics) as different and produces false positives. In order to avoid this, we have to strip all diacritics from the text before we save the TXT files.

Code blocks to consider:

"Tashkil from ISO 8859-6", "Combining maddah and hamza", "Other combining marks", see https://unicode-table.com/en/blocks/arabic/ for the Arabic script
"Syriac punctuation and signs", Syriac points (vowels)", "Syriac marks" see https://unicode-table.com/en/blocks/syrian/ for Syriac script

This list may be extended.

User Stories

As a scholar I need normalized text in order to get good results with CollateX. Using the texts as they are produces too many errors.

Classification

Is this feature an enhancement of existing code or a completely new feature?

enhancement
new feature

Related Tickets

/cc Mathias Göbel, Frank Schneider, Michelle Weidling

1 of 2 checklist items completed

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information

Child items ...

Activity

Michelle Weidling changed milestone to %Ahikar Version 0.11.0 4 years ago

changed milestone to %Ahikar Version 0.11.0
Michelle Weidling added Ahikar label 4 years ago

added Ahikar label
Michelle Weidling marked this issue as related to #50 (closed) 4 years ago

marked this issue as related to #50 (closed)
Michelle Weidling added Doing label 4 years ago

added Doing label
Kristine Voigt changed milestone to %Ahikar Version 0.12.0 4 years ago

changed milestone to %Ahikar Version 0.12.0
Michelle Weidling @mrodzis · 4 years ago

Author Owner

The vocalization of the Arabic texts should be kept according to our talk with the scholars on 6th October.
Michelle Weidling mentioned in merge request !45 (merged) 4 years ago

mentioned in merge request !45 (merged)
Michelle Weidling mentioned in commit a5ad021e 4 years ago

mentioned in commit a5ad021e
Michelle Weidling mentioned in commit 5e2cc0c2 4 years ago

mentioned in commit 5e2cc0c2
Michelle Weidling added In Review label and removed Doing label 4 years ago

added In Review label and removed Doing label
Michelle Weidling added waiting for code review label and removed In Review label 4 years ago

added waiting for code review label and removed In Review label
Michelle Weidling mentioned in commit fc3d1653 4 years ago

mentioned in commit fc3d1653
Michelle Weidling added In Review label and removed waiting for code review label 4 years ago

added In Review label and removed waiting for code review label
Kristine Voigt closed 4 years ago

closed
Kristine Voigt removed In Review label 4 years ago

removed In Review label

Please register or sign in to reply