Text-Search-engine

Based on Hadoop and Django

Project name: Text search Engine based on Hadoop

Goal: Retrive text context related to the query and order them according to the relavence.

Major components: 1.Text files collection: uses a static crawler to search 99 English articles and 200 MB blog collections. Then he split the text into small articles. 2.Index and Rank:Using local code, hadoop and lucene to index the crawled text. And use hadoop to calculate the value of TF-IDF. Then use python to sort the TF-IDF values and output the results. 3.Web server and interface: to implement web server and interface though djongo

Topics covered： Hadoop, web crawling, text processing, python rank, map-reduce, web frontend and django.

group information:
Dongna CHEN Linna implement Web server and interface
Yuchen JIANG Marco implement Text files collection
Dingtao YANG Young implement index and Rank

BNU-HKBU UIC 2019 Summer

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
search_django		search_django
Project.java		Project.java
README.md		README.md
Scrape_CET-6_Reading .ipynb		Scrape_CET-6_Reading .ipynb
lucene.java		lucene.java
project1.java		project1.java
sort.ipynb		sort.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Text-Search-engine

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

LinnaChen/Text-Search-engine

Folders and files

Latest commit

History

Repository files navigation

Text-Search-engine

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages