Gitlab Community Edition Instance

Skip to content

CollateX: Normalize txt

Feature Requests

Description

Some manuscripts provide their text with diacritics while others don't. As a result, CollateX doesn't recognize words that are actually the same (except for the diacritics) as different and produces false positives. In order to avoid this, we have to strip all diacritics from the text before we save the TXT files.

Code blocks to consider:

This list may be extended.

User Stories

As a scholar I need normalized text in order to get good results with CollateX. Using the texts as they are produces too many errors.

Classification

Is this feature an enhancement of existing code or a completely new feature?

  • enhancement
  • new feature

Related Tickets

#50 (closed)

/cc Mathias Göbel, Frank Schneider, Michelle Weidling

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information