Botacin-s-Lab · mabonmn · Mar 13, 2025 · Mar 13, 2025
diff --git a/BUILD.windows b/BUILD.windows
@@ -0,0 +1,69 @@
+Note:  These are building instructions for sdhash on windows,
+please use the executables we provide instead, as this is
+very annoying to do by hand.  
+
+To build/link sdhash on visual studio (very, very alpha.)
+
+I have tried this on: windows 7/x64, visual studio 2010 pro.
+
+boost:  
+I used 1.49.  Build two stage directories of static libraries, one 
+for 32-bit linkage, and one for 64-bit linkage.
+
+Boost's instructions for 32-bit builds are fine.  
+
+If you have all day, build everything once in 32-bit mode, 
+suggest to change the stage dir to something with --stagedir=.\stage32
+and then build it all in 64-bit. 
+
+The currently required libraries are: thread, regex, program_options,
+filesystem, date_time
+
+To build 64-bit:
+
+bjam --stagedir=.\stage64 --with-thread --with-regex --with-program_options 
+--with-filesystem --with-date_time --toolset=msvc-10.0 address-model=64
+--build-type=complete link=static
+
+openssl:
+I used version 1.0.1c.  Same thing here, build one set of 32-bit
+static libs, and one set of 64-bit static libs.  Make sure to note that
+the libraries are compiled \MD -multithreaded DLL to adjust build settings
+accordingly, or change that.  In order to keep sdhash all static,
+build the openssl libraries to \MT -multithreaded, and build the program
+the same way. 
+
+openssl has completely adequate install instructions in their windows
+readme.  Do everything from the appropriate 32 or 64-bit visual
+studio command prompt, as they have all the environments set up.
+
+Then....
+
+Create a new msvc++ console project in visual studio.
+Drop sdbf, sdhash-src, and base64 directories into the project's folder in Explorer.
+
+Add all of these files to the project (right click project (not solution)
+add->existing items. )
+
+Set the active configuration to Release/Win32
+
+Change the project properties (right click project->properties) 
+Edit c++ options to add include directories for boost and openssl and
+turn off precompiled headers.   Edit linker options to add extra
+library directories (stage/lib from boost, and openssl's lib directory).
+Edit linker options under Input to add ssleay32.lib and libeay32.lib.
+Also add setargv.obj to this to support *.foo globbing. 
+Edit c++ options for Code Generation to change Runtime Library to /MD if
+it is not already.  To build completely static use /MT instead.
+
+Then the active configuration to Release/x64, do all of those options
+again, and change the directories to 64-bit ones for the library includes
+where applicable.  
+
+Do ctrl-shift-b to build.  "Cannot find libraryX" is either it's not
+present where you said it was, it's spelled wrong, or it's 32 bit when
+you are trying to compile 64.  Other errors probably my fault. 
+
+Find the built executable in the project directory from a command prompt
+and run!
+
diff --git a/ChangeLog b/ChangeLog
@@ -0,0 +1,148 @@
+New in version 3.4
+
+New constructor for sdbf class to accept strings. 
+
+Major GPU program speedup via reduction changes.
+
+Bugfix for current boost version and c++ 11
+
+Bugfix for underflow in hash generation 
+
+Updated/tested python swig interface
+
+x86-32 bit linux compilation fix for cpuid
+
+Updated test program.
+
+Thanks to Tom Sires, Alex Nelson, Jesse Kornblum, Simon Spero, and Andrei Costin for
+patches and requests.
+
+--------
+New in version 3.3
+
+Major bugfix involving self-to-self block-based comparison scoring,
+we are now evenly discounting unfilled filters.
+
+Added --separator change option.
+
+Clarified indexing options/formats.
+
+Enabled support for multiple lists-of-files to hash.
+
+Now passing -Wall on gcc on Linux in main program.
+
+--------
+New in version 3.2.5
+
+Significant speedup of CPU comparisons via openMP threads.  Now 
+requiring openMP. 
+
+Program will automatically use the maximum number of cores/threads
+as given by openMP, on generation and comparison.
+--------
+New in version 3.2
+
+CUDA-accelerated GPU comparison program available - sdhash-gpu. 
+
+Changes to indexing to auto-generate indexes of larger, more 
+usable size (640MB) to search much more information.
+
+Small change to bloom filter fullness threshold for comparison,
+we now require 16 elements instead of 6.
+--------
+New in version 3.1
+
+New comparison code using the POPCNT instruction.  Marked speed improvements 
+on intel, cache performance improvements on AMD.
+
+Refined the searching/index components, including auto-generation of reference
+set from large quantity of small files, search shortcuts, etc.  Indexes are
+now stored compressed on disk using lz4.
+
+More additions to --verbose as progress indicators.
+
+Overhaul of sdhash-cli options, matching sdhash as closely as possible and much 
+more flexible.  Also removing last traces of posix from sdhash-cli for inclusion 
+into windows port.
+--------
+New in version 3.0
+
+sdbf_set indexing method, to be generated with --index while hashing
+and searched with --search-index while hashing.
+
+temp-bugfix: some speedup for very small reference data
+bugfix: segmenting was seeing too-large final segment on windows
+--------
+New in version 2.3
+
+Web interface using python/jquery -- talks to thrift
+--validate option
+tons of --verbose debugging output
+New thrift-server sources directory
+bugfix: for invalid sdbf file reading no longer crashes
+bugfix: windows recursive search filesystem permissions crash fixed
+
+--------
+New in version 2.2
+
+Very beta native win32/win64 port
+Threads updated to use boost::thread instead of pthreads directly
+Program options updated to support all command line options in sdhash.cfg
+Filesystem access now primarily c++/boost not posix
+Thrift client/server now supports asynchronous mode comparisions
+output file option -o --output 
+
+--------
+
+New in version 2.1
+
+recursive directory hashing with --deep -r option
+sdhash supports input from stdin
+argument --segment-size -z to customize read-in segment size
+argument --name -n to rename stdin hashes
+new client/server programs using Thrift
+api updates to use const and clearer naming conventions
+api updated with formal 'set' class
+sdhash auto-switching to block mode when files are >16MB
+new base64 encoding library
+
+--------
+
+
+New in version 2.0
+
+gnu-style long-options for standalone and client programs
+configuration file capability for standalone sdhash sdhash.cfg
+fixed: sdhash buffers large files instead of memory mapping the entire thing
+sdhash server program sdhashd, and sdhash-cli client program.
+Client and server should be considered beta software.
+sdhash-dd program removed and changed to -b option for block size
+sdbf version upgraded to add 'original size' into the header information
+"view" option to create subsets of sets matching an expression
+added: Boost libraries for regular expressions
+fixed: import as file list now can use DD mode
+fixed: sampling now non-destructive
+swig-python bindings now exist with a sample program
+Now large block hashes are "streamed" in chunks instead of as one giant hash.
+
+---------
+
+New in version 1.8
+
+-i file.txt option to generate hashes from a file list. 
+conversion to C++ 
+beta libsdbf API 
+fixed: off-by-one error on last line of query file
+
+---------
+
+New in version 1.7
+
+-c query.sdbf target.sdbf option
+-s sample option
+Support for cygwin
+manpage for sdhash
+make install supported
+fixed: spaces in filenames now readable on input sdbf
+
+---------
diff --git a/Dockerfile b/Dockerfile
@@ -0,0 +1,44 @@
+FROM ubuntu:latest
+
+RUN apt-get update && apt-get install -y \
+    software-properties-common \
+    apt-transport-https \
+    wget \
+    git \
+    && rm -rf /var/lib/apt/lists/*
+
+RUN apt-get update && \
+    apt-get install -y openjdk-11-jdk
+
+ENV JAVA_HOME /usr/lib/jvm/java-11-openjdk-amd64
+ENV PATH $JAVA_HOME/bin:$PATH
+
+RUN apt-get update && \
+    apt-get install -y maven
+
+RUN java -version
+RUN mvn -version
+
+WORKDIR /usr/src/app
+
+RUN apt-get install -y vim
+RUN apt-get install -y python3
+RUN apt-get update
+RUN apt install -y python3-pip
+RUN apt install -y python3.12-venv
+RUN apt install -y libfuzzy-dev
+ENV VIRTUAL_ENV=/opt/venv
+RUN python3 -m venv $VIRTUAL_ENV
+ENV PATH="$VIRTUAL_ENV/bin:$PATH"
+# COPY requirements.txt /usr/src/app/requirements.txt
+# RUN pip3 install -r requirements.txt
+
+# wget releases
+
+# Copy the current directory contents into the container
+# COPY *.py /usr/src/app/
+# COPY utils utils
+# COPY augmented_predictor augmented_predictor
+
+# Run the application
+CMD ["bash"]