Skip to content

Scraping from Arabic news#2016

Draft
RabeaAffan24 wants to merge 6 commits intodata-for-change:devfrom
RabeaAffan24:dev
Draft

Scraping from Arabic news#2016
RabeaAffan24 wants to merge 6 commits intodata-for-change:devfrom
RabeaAffan24:dev

Conversation

@RabeaAffan24
Copy link

Next steps:

  • Get all archive (+-300 pages)
  • generate JSON (JSON serialisation)
  • integrate Panet scraping into Anyway ETL

@atalyaalon
Copy link
Collaborator

@BusinessLanguage looks good!
However right now this code writes into a local file. Not sure we want to merge this in that way.
We can merge just to make sure code is in our repo - however it's not a code that will run in prod.
@ziv17 @shaysw any thoughts?

@ziv17
Copy link
Collaborator

ziv17 commented Nov 30, 2021

Very nice!

Do we want to use these accidents for our reports and infographics?
If yes, then we need to add them to our database. To do this, I think we need:

  • coordinate the values of entities in the results file to those that are used in our database (e.g. injury severity, street name, accident severity etc.) Currently for these entities we use codes(numbers), and English names in the code, and have translation to Hebrew using pybabel.
  • Then add the data (accidents, injured, etc.) to our database.

@atalyaalon atalyaalon marked this pull request as draft January 23, 2022 17:44
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I would like us to use type hints in function parameters and variables.

Comment on lines +32 to +34
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

  • In python we use lowercase and underscore for variables and function/method names.
  • I prefer not to change the variable loop in the loop. It is better to use a different variable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we OK that the API key is in our code, in a public repo?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still working on this one..
Accidently pushed it to the PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, the style we use for file names is lowercase with underscore between words.

Copy link
Collaborator

@ziv17 ziv17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, Well done!
See some technical comments below.
How is this code going to be incorporated in our application? I think it worth a discussion.

@RabeaAffan24
Copy link
Author

@ziv17 Thank you for your comments. Will amend those issues soon.

regarding your question, incorporating the obtained data in the database will be carried out after the newsflash been translated (using Google API) and then will undergo the same process as your mainstream data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments