AutoSketch is a sketch-oriented compiler for query-driven network telemetry. It can automatically compile high-level data-stream operators into sketch instances that can be readily deployed with low resource usage and incur limited accuracy drop. This work has been accepted by NSDI'24.
The major contributions of AutoSketch are as follows:
- Combine the strengths of both sketch-based telemetry algorithms and query-driven network telemetry
- Extend the capacity of conventional telemetry languages to perceive and control accuracy intent
- Reduce the burden on users to select, configure, and implement Sketch algorithms
- A framework capable of integrating many novel Sketch optimization techniques (e.g., SketchLib [NSDI’22], FlyMon [SIGCOMM’22], Sketchovsky [NSDI’23], BitSense [SIGCOMM’23], OmniWindow [SIGCOMM’23])
We require the following dependencies to run AutoSketch programs.
-
Software Dependencies
pip3 install ply pip3 install jinja2 sudo apt install libboost-all-dev -y sudo apt install libjsoncpp-dev -y sudo apt install libpcap-dev -y # spdlog git clone https://github.com/gabime/spdlog.git cd spdlog && mkdir build && cd build cmake .. && make -j && sudo make install
-
Switch SDE: Tofino SDE 9.13.1 is needed to compile the P4 code generated by AutoSketch. (Hint: Older versions of SDE should work correctly. However, we have not fully verified this in other environments.)
-
Trace data: We provide an archive including the pre-processed CAIDA trace file for running experiments. Due to the large size of the data, please download and extract it to the
${AutoSketch_dir}/data/directory. PKU Drive
AutoSketch requires traffic data for benchmark-based searching to identify the sketch configurations with the minimal resource overhead that can meet the user's accuracy intent. The trace files we provide have already been preprocessed. If users need to use other trace files, they should preprocess them according to the following steps.
$ cd trace; mkdir build; cd build
$ cmake ..; make
$ ./preprocess ${AutoSketch_dir}/data/ equinix-nyc.dirB.20180419-130000.UTC.anon.pcap search_trace.bin
$ ./preprocess ${AutoSketch_dir}/data/ equinix-nyc.dirB.20180419-130100.UTC.anon.pcap verify_trace.bin-
One Command to generate the backend P4 program
$ python compiler.py -i examples/newconn.py -p4o output/newconn/newconn.p4 -p4s [-p4v]
The
-iparameter specifies the source file for the input query code.The
-p4oparameter specifies the path and filename for the generated P4 code.The
-p4sparameter indicates that during the compilation process, benchmark-based searching will be automatically executed to search for configuration parameters.The
-p4vparameter indicates that the obtained configuration parameters are verified during the search. -
AutoSketch also supports step-by-step compilation to facilitate debugging.
-
Generate the benchmark-based searching program
$ python compiler.py -i examples/newconn.py -s output/newconn
The
-iparameter specifies the source file for the input query code.The
-sparameter specifies the directory for generating the profiling program and related configuration files. -
Run the benchmark-based searching
$ cd output/newconn $ ls autosketch-newconn.cpp conf.json Makefile $ make $ ./autosketch-newconn ./conf.json --search ./app-conf.jsonThe
--searchparameter writes the searched configuration to the specified file. -
Verify the searched configuration
$ ./autosketch-newconn ./conf.json --verify ./app-conf.json
The
--verifyparameter verifies the configuration in the specified file. -
Generate the P4 program based on the searched configuration
$ python compiler.py -i examples/newconn.py -p4o output/newconn/newconn.p4 -p4c output/newconn/app-conf.json
The
-p4cparameter indicates that an existing configuration file.
-
The input program consists of several modules, each of which can be a User-Defined Function (UDF) or a definition of a task. Here is an example.
def remap_key(tcp.flag):
if tcp.flag == SYNACK:
nkey = ipv4.src_addr
else:
nkey = ipv4.dst_addr
def sf(tcp.flag, tcp.seq, tcp.ack):# cnt nextseq
if tcp.flag == SYNACK:
nextseq = tcp.seq + 1
cnt += 1
elif nextseq == tcp.ack:
cnt -= 1
syn_flood = PacketStream()
.filter(left_value="ipv4.protocol", op="eq", right_value="IP_PROTOCOLS_TCP")
.filter(left_value="tcp.flags", op="eq", right_value="TCP_FLAG_ACK")
.groupby(func_name="remap_key", index=[], args=["tcp.flags"], registers=[], out=["nkey"])
.groupby(func_name="sf", index=["nkey"], args=["tcp.flags", "tcp.seq", "tcp.ack"], registers=["nextseq", "cnt"], out=["cnt"])
.filter(left_value="cnt", op="gt", right_value="Thrd_SYN_FLOOD")
.distinct(distinct_keys=["nkey"])Description of the UDF format
The format for User-Defined Functions (UDF) takes inspiration from Python's function definition syntax but with some differences.
def func_name(args) # persist_state
statements-
argsparameter specifies the arguments passed in, usually fields from the header. -
persist_statedefines variables that need to be globally saved across multiple operations, separated by spaces (think of it as defining a global table). This annotation cannot be omitted unless the function does not utilize a global state. For more details, refer to the description of thegroupbyoperation in the operators section.
Description of the telemetry application format
name = PacketStream()
.operators()The name is the name of the task, PacketStream is a fixed identifier, and operators are the several operators that follow it.
Description of AutoSketch operators
Here are the conventions for each type of operator format:
- filter: The parameters are
(left_value, op, right_value), fundamentally acting as a conditional expression. - map: The parameters are
(map_keys, new_import), wheremap_keysselects which key-value pairs from the original set to continue processing, andnew_importintroduces new key-value pairs, formatted as{"key": "value"}. - reduce: The parameters are
(reduce_keys, result), wherereduce_keysindicates which key(s) to use as a reference for the reduce operation, andresultstores the outcome of the reduce. - zip: The parameters are
(stream_name, left_key, right_key), withstream_nameindicating which stream's operator sequence results to merge, andleft_keyandright_keyindicating which keys to use as the basis for merging from the current stream and the incoming stream, respectively. - distinct: The parameters are
(distinct_keys), indicating which keys to use as the basis for deduplication. - groupby: The parameters are
(func_name, index, args, registers, out), wherefunc_namecorresponds to the function name in the UDF described above,indexindicates which keys to use as a basis for building the lookup table,argscorresponds to the arguments passed,registersare the registers the function needs to save, andoutdefines the result output to the key-value pair.