-
Notifications
You must be signed in to change notification settings - Fork 4
WIP: Splitting a file in several packets #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
sim590
wants to merge
20
commits into
master
Choose a base branch
from
split-packets
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Owner
Author
|
As this change does break in some way compatibility with previous versions, this change should be part of the 1.0.0 release to indicate the possible break. |
Up to 2MiB of storage is split among packets of size 56KiB each.
Exceeding data is not read from standard input and is thus ingnored.
The changes are summarized in the following bullet points:
* The Bin class:
* New public constants for better controling the parameters regarding
the size of every packets and maximal allowed file size:
DPASTE_NPACKETS_LEN, DPASTE_PIN_LEN, DPASTE_MAX_SIZE.
* Bin::paste's main implementation is now using std::stringstream
instead of std::vector<uint8_t>. This allows to avoid unnecessary
copies when creating individual packets.
* New helper functions such as parse_code_info, code_from_pin and
hexStrFromInt contain previously sparsely written code.
* The Bin::Packet structure has a new EXTRA_SERIALIZATION_BYTES constant
specifying the extra space needed for serialization of packets.
* Bin::Random structure encapsulates the random number generator for
Bin.
* The AES class:
* Removal of the unused CODE_PASS_OFFSET constant.
* Increasing of the PIN_WITH_PASS_LEN constant specifying the length of
the PIN.
* The HttpClient class:
* new isAvailable function;
* GET request: make sure every values are recovered from the REST server.
* Documentation related to changed function signatures or new ones have
been handled appropriately.
18 Hexadecimal characters is 32 + 2 + 32 bits.
A PIN is number while a code is a string possibly encoding more than one number.
These verification tests will be applied multiple times, so it is better to put them in functions.
This prevents conflict with OpenDHT where the word WARN from enable_log.h would be substituted by the brutal usage of macros by Catch2. The usage of the special macro definition CATCH_CONFIG_PREFIX_ALL helps going around this issue.
Include the number of packets in the code when testing the code as per the new code format.
Combine fields in a final code for Bin::parse_code_info to seperate and test if result is valid.
Bin::code_from_dpaste_uri has always been static. So should PirateBinTester::code_from_dpaste_uri.
Owner
Author
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This introduces a new feature: packet splitting. This effectively splits files bigger than the maximum size allowed by OpenDHT (which is around 56 KiB) and spreads the pieces in packets published evenly on the DHT. Let's consider the following regarding this new approach:
The paste code (used for sharing the pasted blob) now incorporates a new field for encoding the number of packets the file was split into. The format is now described as follows:
This suggests 4 possible code formats. The size for each field is given in the following table:
LOCATION_CODENPACKETSPWDFor now this change is not backward compatible with prior version such as 0.3.3 or even the latest master.
NPACKETSor not.While new versions of dpaste will support old code formats, it obviously won't be magically the case for the other way around except for values of size less than 56KiB (non splitted values) and without AES encryption.
As said above, the file is split in packets spread on evenly on the DHT. While publishing all packets around a same area on the DHT could improve network performance (subsequent requests would resolve faster after the first one), OpenDHT's rate limiting could interfere with the capacity of a node to publish the whole file it looks to publish.
The time issue mentioned above is in part due to the multiple gets/puts done sequentially inside Bin.
Tests have also been added to verify good behaviour of different new functions.
N.B: OpenDHT advertises 64KiB as its maximum value size, but it doesn't take serializing into account so the effective maximum size is rather around 56 KiB.