Skip to content

Conversation

@juliangut
Copy link

Currently only supported Turkish, Spanish and Portuguese

Tests added only for charset validation. Message split should not be affected as National Language Shift Tables only affects charset and not number of bytes per char

@Codesleuth
Copy link
Owner

Awesome! Thanks @juliangut, please bear with me as I'm traveling and will have to review this when I get time.

@juliangut
Copy link
Author

Hello @Codesleuth, have you had time to review this PR?

@Codesleuth
Copy link
Owner

I'm currently looking at it - will post soon.

@juliangut
Copy link
Author

Hi @Codesleuth how did you find the PR? does it feel right?

Copy link
Owner

@Codesleuth Codesleuth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to hold off accepting this because of the reasons outlined in the comment. I'm thinking of an API in the future that will do something like this:

splitter.split('∞')
{
  "characterSet": "GSM",
  "shiftTable": "Portuguese language (Latin script)",
  "parts": [
    {
      "content": "",
      "length": 1,
      "bytes": 1
    }
  ],
  "bytes": 1,
  "length": 1,
  "remainingInPart": 159
}

And then in the case of mixed it would be:

splitter.split('∞Ø')
{
  "characterSet": "Unicode",
  "parts": [
    {
      "content": "∞Ø",
      "length": 2,
      "bytes": 4
    }
  ],
  "bytes": 4,
  "length": 2,
  "remainingInPart": 68
}

I think this will be a better API and won't give false positives for the mixed case being detected as GSM.

return existsInArray(character.charCodeAt(0), GSM_charCodes);
}
function validateCharacterWithShiftTable(character) {
var charCodes = GSM_charCodes.concat(GSM_TR_charCodes, GSM_ES_charCodes, GSM_PT_charCodes);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see how this appears to make sense if all you need to do is count all possible valid characters that exist in any shift table, but this creates an invalid situation where the whole text doesn't fit any common shift table, and should be detected as Unicode. Take the following two messages:

  1. valid in only Portuguese language (Latin script)
  2. Ø valid in either Spanish language (Latin script) or the Basic Character Set

If each example was a full message, they would each be valid. However, if we take a message containing both characters:

∞Ø

As far as we know, this is an invalid message because there is no common shift table that supports both at the same time, so it should be detected as Unicode.

This library's character set auto-detection mechanism is pretty important, and we should think about how that can be preserved.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the PR, when validating a whole message with this method is not used anymore, review message below

function validateMessageWithShiftTable(message) {
var charCodes = [GSM_charCodes, GSM_TR_charCodes, GSM_ES_charCodes, GSM_PT_charCodes];
for (var i = 0; i < charCodes.length; i++) {
if (validateMessageInCharCodesList(message, charCodes[i]))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validating against each shift table char-table independently solves the problem of mixed shift table characters combined within the same message

@juliangut
Copy link
Author

Hi @Codesleuth I can't argue against a new unified API, that would be great. About the messages with chars of mixed shitf tables you are right, I've just updated the code and tests accordingly, please review my comments

@Codesleuth
Copy link
Owner

I'm so sorry for the delay! I will be working on this repo this weekend, which will give me time to review and try out this PR properly. This PR is missing functional tests that cover the options and shift table output, but I can add them as I work on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants