Extractor problem

Hi there,

With all due respect, fullyfeedly seem to be a very awesome browser extension, can help saving time and focus on the interested content, with less browser tabs switching!

I just noticed that there is an issue: the recommended **Mercury** extractor by default isn't powerful enough to work on many websites, looks like it need many custom extractor/parser to deal with different websites, the non-default **Boilerpipe** is very powerful, but not only the limited quota issue mentioned in the README.md, I also found that the request from fullyfeedly to Boilerpipe web service will face [CORS error issues](https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS/Errors/CORSMissingAllowOrigin), which means it's not working right now, combined the different situation together, fullyfeedly will only be 100% working on limited websites.

Not sure if it's because the websites I frequently visit can't be properly parsed by Mercury is a coincidence, but I do compare the extracted result with Boilerpipe's, Boilerpipe works pretty better, in contrast, Mercury sometimes just extracted not meaningful html tags.

For the first part, I guess I can only write custom extractors and send pull requests to Mercury, but it could really consumed time, and not pretty scalable. 

For second part: I've opened an issue at https://github.com/kohlschutter/boilerpipe/issues/28, if anyone is also looking for a workaround, here it is: https://add0n.com/access-control.html (CORS Unblock).

Not sure if there is anything we can do to help improve the issue, will hosting an individual Boilerpipe web service be a considerable option? Or it's better to find some alternatives?

Thanks a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extractor problem #54

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Extractor problem #54

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions