Skip to content

Conversation

@sim590
Copy link
Owner

@sim590 sim590 commented May 11, 2020

This introduces a new feature: packet splitting. This effectively splits files bigger than the maximum size allowed by OpenDHT (which is around 56 KiB) and spreads the pieces in packets published evenly on the DHT. Let's consider the following regarding this new approach:

  • The paste code (used for sharing the pasted blob) now incorporates a new field for encoding the number of packets the file was split into. The format is now described as follows:

      LOCATION_CODE + NPACKETS
      LOCATION_CODE + NPACKETS + PWD
      "dpaste:" + LOCATION_CODE + NPACKETS
      "dpaste:" + LOCATION_CODE + NPACKETS + PWD
    

    This suggests 4 possible code formats. The size for each field is given in the following table:

    Data Size Size (in characters)
    LOCATION_CODE 32 bits 8 hex chars
    NPACKETS 8 bits 2 hex chars
    PWD 32 bits 8 hex chars

    For now this change is not backward compatible with prior version such as 0.3.3 or even the latest master.

    • implement backward compatibility by parsing the field NPACKETS or not.

    While new versions of dpaste will support old code formats, it obviously won't be magically the case for the other way around except for values of size less than 56KiB (non splitted values) and without AES encryption.

  • As said above, the file is split in packets spread on evenly on the DHT. While publishing all packets around a same area on the DHT could improve network performance (subsequent requests would resolve faster after the first one), OpenDHT's rate limiting could interfere with the capacity of a node to publish the whole file it looks to publish.

  • The time issue mentioned above is in part due to the multiple gets/puts done sequentially inside Bin.

    • Parallelize the multiple get/put operations.

Tests have also been added to verify good behaviour of different new functions.

N.B: OpenDHT advertises 64KiB as its maximum value size, but it doesn't take serializing into account so the effective maximum size is rather around 56 KiB.

@sim590 sim590 self-assigned this May 11, 2020
@sim590
Copy link
Owner Author

sim590 commented May 11, 2020

As this change does break in some way compatibility with previous versions, this change should be part of the 1.0.0 release to indicate the possible break.

@sim590 sim590 added this to the 1.0.0 milestone May 11, 2020
sim590 added 18 commits May 11, 2020 03:01
Up to 2MiB of storage is split among packets of size 56KiB each.
Exceeding data is not read from standard input and is thus ingnored.
The changes are summarized in the following bullet points:

* The Bin class:
  * New public constants for better controling the parameters regarding
    the size of every packets and maximal allowed file size:
    DPASTE_NPACKETS_LEN, DPASTE_PIN_LEN, DPASTE_MAX_SIZE.
  * Bin::paste's main implementation is now using std::stringstream
    instead of std::vector<uint8_t>. This allows to avoid unnecessary
    copies when creating individual packets.
  * New helper functions such as parse_code_info, code_from_pin and
    hexStrFromInt contain previously sparsely written code.
* The Bin::Packet structure has a new EXTRA_SERIALIZATION_BYTES constant
  specifying the extra space needed for serialization of packets.
* Bin::Random structure encapsulates the random number generator for
  Bin.
* The AES class:
  * Removal of the unused CODE_PASS_OFFSET constant.
  * Increasing of the PIN_WITH_PASS_LEN constant specifying the length of
    the PIN.
* The HttpClient class:
  * new isAvailable function;
  * GET request: make sure every values are recovered from the REST server.
* Documentation related to changed function signatures or new ones have
  been handled appropriately.
18 Hexadecimal characters is 32 + 2 + 32 bits.
A PIN is number while a code is a string possibly encoding more than one
number.
These verification tests will be applied multiple times, so it is better
to put them in functions.
This prevents conflict with OpenDHT where the word WARN from
enable_log.h would be substituted by the brutal usage of macros by
Catch2. The usage of the special macro definition
CATCH_CONFIG_PREFIX_ALL helps going around this issue.
Include the number of packets in the code when testing the code as per
the new code format.
Combine fields in a final code for Bin::parse_code_info to seperate and
test if result is valid.
Bin::code_from_dpaste_uri has always been static. So should
PirateBinTester::code_from_dpaste_uri.
@sim590 sim590 changed the title Splitting a file in several packets WIP: Splitting a file in several packets May 13, 2020
@sim590
Copy link
Owner Author

sim590 commented Nov 3, 2024

  • For now (here: 2ab7cf0), unit tests suggest that values bigger than 56KiB are not successfully pasted. Fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants