Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions .gitattributes
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,13 @@
*.dbproj merge=union

# Standard to msysgit
*.doc diff=astextplain
*.DOC diff=astextplain
*.doc diff=astextplain
*.DOC diff=astextplain
*.docx diff=astextplain
*.DOCX diff=astextplain
*.dot diff=astextplain
*.DOT diff=astextplain
*.pdf diff=astextplain
*.PDF diff=astextplain
*.rtf diff=astextplain
*.RTF diff=astextplain
*.PDF diff=astextplain
*.rtf diff=astextplain
*.RTF diff=astextplain
16 changes: 8 additions & 8 deletions README.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ BoilerPy
About
---------------------------------------

BoilerPy is a native Python port of Christian Kohlschütter's Boilerpipe library, released under the Apache 2.0 Licence. (http://code.google.com/p/boilerpipe/
BoilerPy is a native Python port of Christian Kohlschutter's Boilerpipe library, released under the Apache 2.0 Licence. (http://code.google.com/p/boilerpipe/
)

I created this port since I don't have access to Java on my webhost and I wanted to create a pure Python version. Another Python version which consists of Python hooks to the original Java library can be found here : (https://github.com/misja/python-boilerpipe
Expand All @@ -20,19 +20,19 @@ Installation

BoilerPy was packaged with distutils. In can be installed from the command-line with the following line:

``>python setup.py install``
``>python setup.py install``

Usage
---------------------------------------

``import boilerpy``
``import boilerpy``

``boilerpy.extractors.ARTICLE_EXTRACTOR.getContentFromUrl('http://www.example.com/')``
``boilerpy.extractors.ARTICLE_EXTRACTOR.getContentFromUrl('http://www.example.com/')``

``boilerpy.extractors.ARTICLE_EXTRACTOR.getContentFromFile('site/example.html')``
``boilerpy.extractors.ARTICLE_EXTRACTOR.getContentFromFile('site/example.html')``

``htmlText='<html><body><h1>Example</h1></body></html>'``
``boilerpy.extractors.ARTICLE_EXTRACTOR.getContent(htmlText)``
``htmlText='<html><body><h1>Example</h1></body></html>'``
``boilerpy.extractors.ARTICLE_EXTRACTOR.getContent(htmlText)``



Expand Down Expand Up @@ -83,4 +83,4 @@ A full-text extractor which is tuned towards extracting sentences from news arti
Version
---------------------------------------

1.0 - Created 14 Feb 2013
1.0 - Created 14 Feb 2013
5 changes: 2 additions & 3 deletions boilerpy/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,12 @@
# * (the "License"); you may not use this file except in compliance with
# * the License. You may obtain a copy of the License at
# *
# * http://www.apache.org/licenses/LICENSE-2.0
# * http://www.apache.org/licenses/LICENSE-2.0
# *
# * Unless required by applicable law or agreed to in writing, software
# * distributed under the License is distributed on an "AS IS" BASIS,
# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# * See the License for the specific language governing permissions and
# * limitations under the License.
#

import extractors,filters,parser,document
from . import extractors, filters, parser, document
Loading