- clone this repo with
git clone https://github.com/bor1e/regios.git cd regiospip install -r requirements.txt- creating superuser
python3 manage.py createsuperuser
- first start the the scrapy server on
localhost:19860with:cd scrapy_app/ && scrapyd - go back to the regios folder and start the Django Server:
cd .. && python3 manage.py runserver
If you wnat to change the port of scrapydthan you need to update the http_portvalue in scrapy_app/scrapy.cfg and api/views.py
check if scrapyd is running (simple post for list)- ISSUE on page relaod while botspider running > Domain in DB but no info > zip and similar details cannot be displayed because of RelatedObjectError < handled with dummy data so far. Needs further check up...
- block in layout for errors
get the url if set in session inside start inputfilter return back to display- check the todos inside the code
- scrap meta tags
- by infoscan try to run for simple url without targeting specific webpage
- pandas on db
- update scrapy to 1.6.0?
- what about second level reference. If level 0 is referencing level 1 which in turn references level 2 which in turn references children of level 0, siblings of level 1?
- after fullscan of selected, list all found potential partners (infoscan an all external_links of combined selected) and run next level selected. Then display results.
- include
kontakt-sites for zip code findings time required for fullscan update- display strange behaviour (e.g. zero external links ) as black nodes in sigma graph
- try to get links from javascript
process_valuevia scrapy docs - analyse structure of website with most external links using urllib for textanalyse / site classification
- selenium acceptance tests?!