Skip to content

Conversation

@rosle
Copy link
Owner

@rosle rosle commented Apr 3, 2020

What happened

✅ Store the keyword search result

  • Total results
  • Non-AdWords Link
  • AdWords Link

✅ Display the result on the keyword page

Insight

Add a new table links

User -(has many)-> Keywords -(has many)-> Links

Link store the url result and the property if that url is an AdWords or not and where it appears on the page (Could be top or bottom)

Proof Of Work

Screen Shot 2563-04-16 at 12 07 03

end

defmodule GoogleCrawler.Google.Scrapper do
alias GoogleCrawler.Google.ScrapperResult

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is ScrapperResult defined?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@olivierobert In this file above here 😆 ⬆️

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. Not sure about having more than one module per file though 🤔

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree 👍 I think it's better to separate it 🤔

@rosle rosle force-pushed the feature/scrap-content branch from 236cd35 to 007238c Compare April 14, 2020 11:17
@@ -1,12 +1,74 @@
defmodule GoogleCrawler.Google.Scrapper do
Copy link

@olivierobert olivierobert Apr 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered from the beginning why you added two p to "scrapper". Only today, I checked you were right about it as I making some improvements to my implementation as well 🙈


Well according to Google, scrapper is probably not what you are looking for:

image


While scraping/scraper is:

image

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, which one is correct 😂?? So It's scraper, right? Now I'm confused 😲

Copy link

@olivierobert olivierobert Apr 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, scraper (the one below). A scrapper is a tool to scrap things off, not related to web scraping.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix the name 😂 👍

@rosle rosle force-pushed the feature/scrap-content branch 2 times, most recently from 2b145dc to 26dbb21 Compare April 15, 2020 08:34
@rosle rosle force-pushed the feature/scrap-content branch from 26dbb21 to 4690bfb Compare April 15, 2020 08:36
@rosle rosle force-pushed the feature/scrap-content branch from 9119f50 to d5d2274 Compare April 15, 2020 11:43
@@ -1,4 +1,9 @@
defmodule GoogleCrawler.SearchKeywordWorker do
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am really not confident about this genserver 😂

I think it's because

  1. The state is quite complex I think 🤔 💭 - Now it stores a map of %{task_ref -> {%Keyword{}, retry_count}}. So I don't know if it is hard to read or not

  2. I am not sure about the error handling part. There are 2 cases now

  • When the task is failed -> This handles by handle_info({:DOWN}, ...)
  • When something is wrong when it insert the record -> This also needs to retry, so I extracted the retry logic out to a new function. It looks a bit weird.

Main problem is I'm not sure if it should be like this or not. Is it the right way to do it or not. Probably I need to learn more about the Genserver 😵

@rosle rosle marked this pull request as ready for review April 16, 2020 05:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants