-
Notifications
You must be signed in to change notification settings - Fork 1
A python module which support an other project named bububa.Lego provide several advance web scrape functions.
bububa/SuperMario
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
= About SuperMario =
SuperMario is an advance web cralwer library written in python. It
provides a number of methods to mine data from kinds of sites.
== License ==
BSD License
See 'LICENSE' for details.
== Requirements ==
Platform: *nix like system (Unix, Linux, Mac OS X, etc.)
Python: 2.5+
Storage: mongodb
Some other python models:
- simplejson
- BeautifulSoup
- eventlet
- PIL
- pycurl
- chardet
- feedparser
- mongokit
- templatemaker
- flickrapi
- pyyaml
- MySQLdb
- dateutil
== Features ==
+ robots.txt protocol supported;
+ cache URL 's HTML;
+ normalize URL;
+ convert all content into unicode;
+ extract MainText from HTML by specific a * link-threshold *
+ convert partial RSS feed to full RSS feed;
+ proxies list support;
+ cookie keep support;
+ login support;
About
A python module which support an other project named bububa.Lego provide several advance web scrape functions.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published