Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
69 changes: 69 additions & 0 deletions BUILD.windows
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
Note: These are building instructions for sdhash on windows,
please use the executables we provide instead, as this is
very annoying to do by hand.

To build/link sdhash on visual studio (very, very alpha.)

I have tried this on: windows 7/x64, visual studio 2010 pro.

boost:
I used 1.49. Build two stage directories of static libraries, one
for 32-bit linkage, and one for 64-bit linkage.

Boost's instructions for 32-bit builds are fine.

If you have all day, build everything once in 32-bit mode,
suggest to change the stage dir to something with --stagedir=.\stage32
and then build it all in 64-bit.

The currently required libraries are: thread, regex, program_options,
filesystem, date_time

To build 64-bit:

bjam --stagedir=.\stage64 --with-thread --with-regex --with-program_options
--with-filesystem --with-date_time --toolset=msvc-10.0 address-model=64
--build-type=complete link=static

openssl:
I used version 1.0.1c. Same thing here, build one set of 32-bit
static libs, and one set of 64-bit static libs. Make sure to note that
the libraries are compiled \MD -multithreaded DLL to adjust build settings
accordingly, or change that. In order to keep sdhash all static,
build the openssl libraries to \MT -multithreaded, and build the program
the same way.

openssl has completely adequate install instructions in their windows
readme. Do everything from the appropriate 32 or 64-bit visual
studio command prompt, as they have all the environments set up.

Then....

Create a new msvc++ console project in visual studio.
Drop sdbf, sdhash-src, and base64 directories into the project's folder in Explorer.

Add all of these files to the project (right click project (not solution)
add->existing items. )

Set the active configuration to Release/Win32

Change the project properties (right click project->properties)
Edit c++ options to add include directories for boost and openssl and
turn off precompiled headers. Edit linker options to add extra
library directories (stage/lib from boost, and openssl's lib directory).
Edit linker options under Input to add ssleay32.lib and libeay32.lib.
Also add setargv.obj to this to support *.foo globbing.
Edit c++ options for Code Generation to change Runtime Library to /MD if
it is not already. To build completely static use /MT instead.

Then the active configuration to Release/x64, do all of those options
again, and change the directories to 64-bit ones for the library includes
where applicable.

Do ctrl-shift-b to build. "Cannot find libraryX" is either it's not
present where you said it was, it's spelled wrong, or it's 32 bit when
you are trying to compile 64. Other errors probably my fault.

Find the built executable in the project directory from a command prompt
and run!

148 changes: 148 additions & 0 deletions ChangeLog
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
New in version 3.4

New constructor for sdbf class to accept strings.

Major GPU program speedup via reduction changes.

Bugfix for current boost version and c++ 11

Bugfix for underflow in hash generation

Updated/tested python swig interface

x86-32 bit linux compilation fix for cpuid

Updated test program.

Thanks to Tom Sires, Alex Nelson, Jesse Kornblum, Simon Spero, and Andrei Costin for
patches and requests.

--------
New in version 3.3

Major bugfix involving self-to-self block-based comparison scoring,
we are now evenly discounting unfilled filters.

Added --separator change option.

Clarified indexing options/formats.

Enabled support for multiple lists-of-files to hash.

Now passing -Wall on gcc on Linux in main program.

--------
New in version 3.2.5

Significant speedup of CPU comparisons via openMP threads. Now
requiring openMP.

Program will automatically use the maximum number of cores/threads
as given by openMP, on generation and comparison.
--------
New in version 3.2

CUDA-accelerated GPU comparison program available - sdhash-gpu.

Changes to indexing to auto-generate indexes of larger, more
usable size (640MB) to search much more information.

Small change to bloom filter fullness threshold for comparison,
we now require 16 elements instead of 6.
--------
New in version 3.1

New comparison code using the POPCNT instruction. Marked speed improvements
on intel, cache performance improvements on AMD.

Refined the searching/index components, including auto-generation of reference
set from large quantity of small files, search shortcuts, etc. Indexes are
now stored compressed on disk using lz4.

More additions to --verbose as progress indicators.

Overhaul of sdhash-cli options, matching sdhash as closely as possible and much
more flexible. Also removing last traces of posix from sdhash-cli for inclusion
into windows port.
--------
New in version 3.0

sdbf_set indexing method, to be generated with --index while hashing
and searched with --search-index while hashing.

temp-bugfix: some speedup for very small reference data
bugfix: segmenting was seeing too-large final segment on windows
--------
New in version 2.3

Web interface using python/jquery -- talks to thrift
--validate option
tons of --verbose debugging output
New thrift-server sources directory
bugfix: for invalid sdbf file reading no longer crashes
bugfix: windows recursive search filesystem permissions crash fixed

--------
New in version 2.2

Very beta native win32/win64 port
Threads updated to use boost::thread instead of pthreads directly
Program options updated to support all command line options in sdhash.cfg
Filesystem access now primarily c++/boost not posix
Thrift client/server now supports asynchronous mode comparisions
output file option -o --output

--------

New in version 2.1

recursive directory hashing with --deep -r option
sdhash supports input from stdin
argument --segment-size -z to customize read-in segment size
argument --name -n to rename stdin hashes
new client/server programs using Thrift
api updates to use const and clearer naming conventions
api updated with formal 'set' class
sdhash auto-switching to block mode when files are >16MB
new base64 encoding library

--------


New in version 2.0

gnu-style long-options for standalone and client programs
configuration file capability for standalone sdhash sdhash.cfg
fixed: sdhash buffers large files instead of memory mapping the entire thing
sdhash server program sdhashd, and sdhash-cli client program.
Client and server should be considered beta software.
sdhash-dd program removed and changed to -b option for block size
sdbf version upgraded to add 'original size' into the header information
"view" option to create subsets of sets matching an expression
added: Boost libraries for regular expressions
fixed: import as file list now can use DD mode
fixed: sampling now non-destructive
swig-python bindings now exist with a sample program
Now large block hashes are "streamed" in chunks instead of as one giant hash.

---------

New in version 1.8

-i file.txt option to generate hashes from a file list.
conversion to C++
beta libsdbf API
fixed: off-by-one error on last line of query file

---------

New in version 1.7

-c query.sdbf target.sdbf option
-s sample option
Support for cygwin
manpage for sdhash
make install supported
fixed: spaces in filenames now readable on input sdbf

---------
44 changes: 44 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
FROM ubuntu:latest

RUN apt-get update && apt-get install -y \
software-properties-common \
apt-transport-https \
wget \
git \
&& rm -rf /var/lib/apt/lists/*

RUN apt-get update && \
apt-get install -y openjdk-11-jdk

ENV JAVA_HOME /usr/lib/jvm/java-11-openjdk-amd64
ENV PATH $JAVA_HOME/bin:$PATH

RUN apt-get update && \
apt-get install -y maven

RUN java -version
RUN mvn -version

WORKDIR /usr/src/app

RUN apt-get install -y vim
RUN apt-get install -y python3
RUN apt-get update
RUN apt install -y python3-pip
RUN apt install -y python3.12-venv
RUN apt install -y libfuzzy-dev
ENV VIRTUAL_ENV=/opt/venv
RUN python3 -m venv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
# COPY requirements.txt /usr/src/app/requirements.txt
# RUN pip3 install -r requirements.txt

# wget releases

# Copy the current directory contents into the container
# COPY *.py /usr/src/app/
# COPY utils utils
# COPY augmented_predictor augmented_predictor

# Run the application
CMD ["bash"]
Loading
Loading