GitHub - tender-hack-spb-manaslu/src

About

The project aims to implement a sophisticated search algorithm using a plain tsv file with input entries.

Starting REST API server

Start rest api server:

python rest-api.py --verbose --w2v /home/dima/models/ArModel100w2v.txt --port 8080 --debug

Parameters:

verbose - controls quantity of active prints primarily during processing the query (i.e. output of tokens and lemmas)
w2v - path to the w2v model for looking up similar words (if not set, then w2v model will have not been created and as a consequence server will start much faster) it is possible that in future other embeddings will be available as well (suche as bert, elmo, fasttext and so on)
port - port on which web server will be running on
debug - passes over directly to the app.run method of the flask module

Sending queries

curl --request POST \
  --url http://localhost:8080/search \
  --header 'content-type: application/json' \
  --data '{
	"query": "грачи улятели",
	"fix-misspellings": true,
	"use-embeddings": false,
	"similar-tokens-score-weight": 1.5,
	"similar-tokens-quantity": 2,
	"products-quantity": 10,
	"similar-products-quantity": 5,
	"min-word-difference-ratio": 90
}' | python -m json.tool | ascii2uni -a U -q

Parameters:

query - search query to use
fix-misspellings - wether or not use module for fixing spelling mistakes by applying Levenstein distance
max-word-difference-ratio - gives minimal score of similarity between word to fix and found fixed word for exchange to make sense
use-embeddings - wether or not apply embeddings to search for similar tokens
similar-tokens-score-weight - how much score brings one match of lemma or token when searching for similar words
similar-tokens-quantity - how many similar tokens to consider
products-quantity - how many products return as a result of direct search
similar-products-quantity - how many products return as a result of search by similar words found using embeddings Response example:

{
  "http://tender.hack.spb/1159309": {
    "name": "Кonos фильтр бумажный для кофеварок №4, 80 шт (Folie)",
    "score": 1,
    "keywords": [
      "бумажный"
    ]
  },
  "http://tender.hack.spb/1100321": {
    "name": "Полотенца бумажные 110 шт., KIMBERLY-CLARK Scott, комплект 16 шт., Slimfold, белые, 29,5х19 см, М-fold, диспенсер 601535, АРТ.5856",
    "score": 1,
    "keywords": [
      "бумажный"
    ]
  },
  "http://tender.hack.spb/1159308": {
    "name": "Кonos фильтр бумажный для кофеварок №4, 100 шт (Folie)",
    "score": 1,
    "keywords": [
      "бумажный"
    ]
  },
  "http://tender.hack.spb/1159304": {
    "name": "Кonos фильтр бумажный для кофеварки №4, 100 шт",
    "score": 1,
    "keywords": [
      "бумажный"
    ]
  },
  "http://tender.hack.spb/1161231": {
    "name": "Бумажные фильтр-мешки Karcher 5 шт. 6.904-322.0",
    "score": 1,
    "keywords": [
      "бумажный"
    ]
  }
}

Additional scripts

make-graph.py - to convert data from tsv format to ttl
good_inqury.py - for fixing spelling mistakes
data_parsers.py - for parsing tsv rows and converting them to dicts
calculate-name-embeddings.py - for calculating embedding vectors of product names with ELMo as a base model (requires a lot of computing power)
load_elmo_embeddings.py - to test that ELMo embeddings were saved properly
preprocess-user-input.py - the main model which processes user's query - splits it into tokens, lemmatizes and connects to some other modules
rest-api.py - code for starting flask http server
run-remote-sparql-query.py - basic example of running a simple sparqle query referring to a remote blazegraph instance
run-sparql-query.py - basic example of running a simple sparqle query referring to a local blazegraph instance

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
README.md		README.md
aggregate-lemma-querying-results.py		aggregate-lemma-querying-results.py
all_words.txt		all_words.txt
calculate-name-embeddings.py		calculate-name-embeddings.py
data_parsers.py		data_parsers.py
description.json		description.json
good_inqury.py		good_inqury.py
load_elmo_embeddings.py		load_elmo_embeddings.py
make-graph.py		make-graph.py
preprocess-user-input.py		preprocess-user-input.py
requirements.txt		requirements.txt
rest-api.py		rest-api.py
run-remote-sparql-query.py		run-remote-sparql-query.py
run-sparqle-query.py		run-sparqle-query.py
small-data.tsv		small-data.tsv
small-data.ttl		small-data.ttl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Starting REST API server

Sending queries

Additional scripts

About

Uh oh!

Releases

Packages

Languages

tender-hack-spb-manaslu/src

Folders and files

Latest commit

History

Repository files navigation

About

Starting REST API server

Sending queries

Additional scripts

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages