-
Notifications
You must be signed in to change notification settings - Fork 1
DCT-based Fingerprinting #4
base: master
Are you sure you want to change the base?
Conversation
d66124e to
b1e5daf
Compare
b1e5daf to
fada4ef
Compare
| } | ||
|
|
||
| result[u*width+v] = float32(sum) | ||
| result[u*width+v] = int16(sum) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are the result no longer float? I am looking at the example of DCT II value that is shown below the DCT table picture in this IDCT example. The values in the matrix look like floats... and I would think that conterting them to int16 would remove most of the information. 🤔 What am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure how they’re representing the colour values in that example—they might be using 0.0–1.0 or they might be doing some scaling (there’s lots of leeway for scaling a DCT to make it easier to handle for different purposes that doesn’t affect the result).
Here’s the greyscale values I get for the same 8x8 'A' character image they use, expressed as 8-bit signed (I think this will be nicest for fingerprinting purposes, for reasons we can discuss):
127 127 127 127 127 127 127 127
127 127 93 -128 42 127 127 127
127 127 8 -94 -60 127 127 127
127 127 -111 42 -111 93 127 127
127 42 -128 -128 -128 8 127 127
127 -60 8 127 59 -111 127 127
93 -128 110 127 127 -94 42 127
127 127 127 127 127 127 127 127
Here’s the resulting DCT using the current function (which may benefit from some scaling eg to pack them into 8 or even 4 bits; I haven’t yet tested):
4439 -491 1791 215 228 395 -104 80
318 21 459 402 -800 -447 102 -260
1503 225 -1021 -277 80 -199 285 480
-337 -39 -266 -292 647 357 -146 362
396 23 125 222 -263 -75 -208 -602
94 43 -481 -296 483 293 -29 -133
457 55 -105 -22 -5 103 -168 -153
-428 -63 199 65 -115 -105 192 147
Code used: https://gist.github.com/mandykoh/e90b72ac22c5668dcddba61d690749be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Side discussion: this example isn’t doing gamma correction. Today, the x/image/draw doesn’t handle gamma but I think there’s some efforts to make it do so…we may end up having to handle it ourselves here before DCT.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got the color representation relevance 👍 and that the results are indeed ints when using the example code (thanks for the example code!). I'm playing with it to understand how the coefficients in the DCT modify the basic functions (the images in the matrix).
I'm not familiar with gamma correction, but I saw an example and I think I got the main idea of what the correction is doing, but maybe not what we would use it for.
Would that act as a sort of normalization, with a potentially different factor for each image? Would we use it as a way to make sure that we take into account as many details in the image as the "luminance? range" allows us to encode? (I'm not sure if the luminance is what we're extracting when converting the original image to a gray scale.)
I observe that two plain RGB #888820 and RGB #638888 images have the same DCT - the zero matrix using the example code - despite being at different "distances" from #888888 in terms of how much color was changed to go from #888888 to each of them, resp. 88 to 20 and 88 to 63. (Does that make sense?) And I imagine that different colors are not contributing in the same way to the image luminance. (I'll just keep saying luminance, it may not be the right term.)
That makes me think that depending on the color range of the image, part of the gray scale (I think of it as the encoding space we have for the luminance) could happen not to be used. If I understand well, a carefully chosen gamma correction would expand the pixel values to occupy all that space. Am I close?
It will probably be easier to talk anyway, but I wanted to write the idea down before I forget something :P
(I'm picturing that to myself as using close-IR illumination for security cameras at night: the details are there, we just don't see them. And it happens that the camera has some unused sensibility / "encoding space" in the close IR that can be leveraged if only we expand the image color range a little bit... (?) I might be getting both things wrong ^^)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, I think I tricked you. If you look closely, you’ll see the example code is only using the R component. :P
Re gamma: I think it will help with those cases where there are near-white images being compared, since it might help to separate out the detail in that range. Needs testing. I don’t think it’s as valuable for doing any sort of range compression for this kind of application, but I haven’t thought too deeply about it.
Currently, Simian fingerprints are simple resamplings of the pixel values in an image. We want to move to a Discrete Cosine Transform based fingerprint, which will provide the following advantages: