Skip to content

Conversation

@GYFX35
Copy link
Owner

@GYFX35 GYFX35 commented Sep 23, 2025

This commit integrates the Google Safe Browsing API to enhance the scam detection capabilities of the social media analyzer.

The key changes include:

  • A new function check_google_safe_browsing is added to scam_detector.py to check URLs against the Google Safe Browsing API.
  • The is_url_suspicious function is updated to use the new Google Safe Browsing check.
  • The main application now retrieves the GOOGLE_API_KEY from environment variables and passes it to the analysis functions.
  • A new heuristic weight GOOGLE_SAFE_BROWSING_HIT is added to give a high score to URLs flagged by the API.
  • A requirements.txt file is added for the social_media_analyzer project with the requests dependency.
  • Unit tests with mocking are added to test_runner.py to verify the integration.

Summary by Sourcery

Integrate Google Safe Browsing API for real-time URL threat detection in the scam detector, update CLI to handle API key, adjust threat scoring, and add verification tests.

New Features:

  • Add check_google_safe_browsing function to call Google Safe Browsing API and integrate results into URL analysis.
  • Retrieve GOOGLE_API_KEY from environment and pass it through CLI commands for scam detection.

Enhancements:

  • Update is_url_suspicious and analyze_text_for_scams to use Safe Browsing checks and apply a new GOOGLE_SAFE_BROWSING_HIT heuristic weight.
  • Define GOOGLE_SAFE_BROWSING_HIT weight in heuristics.

Build:

  • Add requirements.txt with requests dependency.

Tests:

  • Add unit tests mocking Google Safe Browsing API responses to verify integration.

This commit integrates the Google Safe Browsing API to enhance the scam detection capabilities of the social media analyzer.

The key changes include:
- A new function `check_google_safe_browsing` is added to `scam_detector.py` to check URLs against the Google Safe Browsing API.
- The `is_url_suspicious` function is updated to use the new Google Safe Browsing check.
- The main application now retrieves the `GOOGLE_API_KEY` from environment variables and passes it to the analysis functions.
- A new heuristic weight `GOOGLE_SAFE_BROWSING_HIT` is added to give a high score to URLs flagged by the API.
- A `requirements.txt` file is added for the `social_media_analyzer` project with the `requests` dependency.
- Unit tests with mocking are added to `test_runner.py` to verify the integration.
@sourcery-ai
Copy link

sourcery-ai bot commented Sep 23, 2025

Reviewer's Guide

This PR integrates the Google Safe Browsing API into the scam detection flow by adding a dedicated check function, extending URL analysis logic to leverage real-time threat data, propagating the API key through the main application, updating heuristics for flagged URLs, adding the required HTTP library, and covering the new logic with unit tests.

File-Level Changes

Change Details Files
Integrated Google Safe Browsing check into URL analysis
  • Added check_google_safe_browsing function with request payload, error handling, and threat match parsing
  • Extended is_url_suspicious signature to accept api_key and perform API check before existing heuristics
  • Updated analyze_text_for_scams to pass api_key to URL check and apply higher score for Google hits
social_media_analyzer/scam_detector.py
Propagated API key through main application
  • Created get_api_key to fetch GOOGLE_API_KEY from environment
  • Modified analyze_website_url and analyze_social_media to accept and forward api_key
  • Added user warnings in main when API key is missing
social_media_analyzer/main.py
Added heuristic weight for Google Safe Browsing hits
  • Introduced GOOGLE_SAFE_BROWSING_HIT with high score in heuristic weights
social_media_analyzer/heuristics.py
Declared HTTP dependency for Google API integration
  • Created requirements.txt with requests entry
social_media_analyzer/requirements.txt
Covered Safe Browsing integration with unit tests
  • Added TestScamDetector class with mocks for malicious and clean API responses
  • Refactored test_runner to include unittest setup and test execution
social_media_analyzer/test_runner.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@GYFX35 GYFX35 merged commit fbb601e into main Sep 23, 2025
0 of 6 checks passed
@guardrails
Copy link

guardrails bot commented Sep 23, 2025

⚠️ We detected 1 security issue in this pull request:

Vulnerable Libraries (1)
Severity Details
Medium pkg:pypi/requests@0.0.0 upgrade to: 2.32.4

More info on how to fix Vulnerable Libraries in Python.


👉 Go to the dashboard for detailed results.

📥 Happy? Share your feedback with us.

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Consider adding caching or batching for Safe Browsing API results to prevent performance bottlenecks and API rate limit exhaustion when checking multiple URLs.
  • The test_runner module mixes manual example runs with the unit test suite; consider splitting demonstration code into a separate script to keep the unit tests clean and focused.
  • Pin the requests dependency to a specific version or version range in requirements.txt to avoid unexpected breaking changes in future releases.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider adding caching or batching for Safe Browsing API results to prevent performance bottlenecks and API rate limit exhaustion when checking multiple URLs.
- The test_runner module mixes manual example runs with the unit test suite; consider splitting demonstration code into a separate script to keep the unit tests clean and focused.
- Pin the requests dependency to a specific version or version range in requirements.txt to avoid unexpected breaking changes in future releases.

## Individual Comments

### Comment 1
<location> `social_media_analyzer/scam_detector.py:43-44` </location>
<code_context>
         try:
             choice = int(input("Enter your choice (1-4): "))
             if choice == 1:
</code_context>

<issue_to_address>
**suggestion:** Consider handling non-JSON responses from Google Safe Browsing API.

Catching ValueError alongside RequestException will ensure the code handles unexpected response formats without crashing.
</issue_to_address>

### Comment 2
<location> `social_media_analyzer/scam_detector.py:155-156` </location>
<code_context>
         if is_susp:
-            score += HEURISTIC_WEIGHTS.get("SUSPICIOUS_URL_PATTERN", 3.0)
+            # Increase score significantly if flagged by Google
+            if "Google Safe Browsing" in reason:
+                score += HEURISTIC_WEIGHTS.get("GOOGLE_SAFE_BROWSING_HIT", 10.0)
+            else:
+                score += HEURISTIC_WEIGHTS.get("SUSPICIOUS_URL_PATTERN", 3.0)
</code_context>

<issue_to_address>
**suggestion (bug_risk):** String matching for Google Safe Browsing reason may be brittle.

Using substring matching for 'Google Safe Browsing' may lead to errors if the message format changes. It's better to have is_url_suspicious return a dedicated flag for Safe Browsing hits.

Suggested implementation:

```python
    for url_str in found_urls:
        is_susp, reason, is_google_safe_browsing = is_url_suspicious(url_str, platform, api_key)
        url_analysis = {"url": url_str, "is_suspicious": is_susp, "reason": reason}
        if is_susp:
            # Increase score significantly if flagged by Google Safe Browsing
            if is_google_safe_browsing:
                score += HEURISTIC_WEIGHTS.get("GOOGLE_SAFE_BROWSING_HIT", 10.0)
            else:
                score += HEURISTIC_WEIGHTS.get("SUSPICIOUS_URL_PATTERN", 3.0)
            indicators_found.append(f"Suspicious URL found: {url_str} (Reason: {reason})")
        urls_analyzed_details.append(url_analysis)

```

You must also update the `is_url_suspicious` function definition and all its call sites to return a third value: `is_google_safe_browsing` (a boolean). This flag should be set to True if the URL was flagged by Google Safe Browsing, and False otherwise.
</issue_to_address>

### Comment 3
<location> `social_media_analyzer/test_runner.py:54-63` </location>
<code_context>
+class TestScamDetector(unittest.TestCase):
</code_context>

<issue_to_address>
**suggestion (testing):** Missing unit tests for error conditions and edge cases in Google Safe Browsing integration.

Please add tests for missing API key, non-200 status codes, request exceptions, and malformed responses to improve error handling coverage.

Suggested implementation:

```python
class TestScamDetector(unittest.TestCase):
    @patch('social_media_analyzer.scam_detector.requests.post')
    def test_google_safe_browsing_malicious(self, mock_post):
        # Mock the API response for a malicious URL
        mock_response = Mock()
        mock_response.status_code = 200
        mock_response.json.return_value = {
            "matches": [
                {
                    "threatType": "MALWARE",
                    "platformType": "ANY_PLATFORM",
                }
            ]
        }
        mock_post.return_value = mock_response
        # Call the function under test (assume check_url_with_google_safe_browsing exists)
        from social_media_analyzer.scam_detector import check_url_with_google_safe_browsing
        result = check_url_with_google_safe_browsing("http://malicious.com", api_key="fake-key")
        self.assertTrue(result['is_suspicious'])
        self.assertIn("MALWARE", result['reason'])

    def test_google_safe_browsing_missing_api_key(self):
        from social_media_analyzer.scam_detector import check_url_with_google_safe_browsing
        # Call with missing API key
        result = check_url_with_google_safe_browsing("http://example.com", api_key=None)
        self.assertFalse(result['is_suspicious'])
        self.assertIn("Missing Google Safe Browsing API key", result['reason'])

    @patch('social_media_analyzer.scam_detector.requests.post')
    def test_google_safe_browsing_non_200_status(self, mock_post):
        mock_response = Mock()
        mock_response.status_code = 500
        mock_response.json.return_value = {}
        mock_post.return_value = mock_response
        from social_media_analyzer.scam_detector import check_url_with_google_safe_browsing
        result = check_url_with_google_safe_browsing("http://example.com", api_key="fake-key")
        self.assertFalse(result['is_suspicious'])
        self.assertIn("Google Safe Browsing API error", result['reason'])

    @patch('social_media_analyzer.scam_detector.requests.post')
    def test_google_safe_browsing_request_exception(self, mock_post):
        mock_post.side_effect = Exception("Network error")
        from social_media_analyzer.scam_detector import check_url_with_google_safe_browsing
        result = check_url_with_google_safe_browsing("http://example.com", api_key="fake-key")
        self.assertFalse(result['is_suspicious'])
        self.assertIn("Exception during Google Safe Browsing check", result['reason'])

    @patch('social_media_analyzer.scam_detector.requests.post')
    def test_google_safe_browsing_malformed_response(self, mock_post):
        mock_response = Mock()
        mock_response.status_code = 200
        mock_response.json.side_effect = ValueError("Malformed JSON")
        mock_post.return_value = mock_response
        from social_media_analyzer.scam_detector import check_url_with_google_safe_browsing
        result = check_url_with_google_safe_browsing("http://example.com", api_key="fake-key")
        self.assertFalse(result['is_suspicious'])
        self.assertIn("Malformed response from Google Safe Browsing", result['reason'])

```

These tests assume that your `check_url_with_google_safe_browsing` function in `scam_detector`:
- Returns a dict with keys `is_suspicious` (bool) and `reason` (str)
- Handles missing API key, non-200 status, exceptions, and malformed JSON as described

If your function does not currently handle these cases, you will need to update its implementation to do so.
</issue_to_address>

### Comment 4
<location> `social_media_analyzer/test_runner.py:77-91` </location>
<code_context>
+        self.assertTrue(any("Google Safe Browsing" in reason for reason in result["indicators_found"]))
+        self.assertEqual(result['urls_analyzed'][0]['is_suspicious'], True)
+
+    @patch('social_media_analyzer.scam_detector.requests.post')
+    def test_google_safe_browsing_clean(self, mock_post):
+        # Mock the API response for a clean URL
+        mock_response = Mock()
+        mock_response.status_code = 200
+        mock_response.json.return_value = {}
+        mock_post.return_value = mock_response
+
+        message = "this is a clean site http://www.google.com"
+        result = analyze_text_for_scams(message, api_key="fake_key")
+
+        self.assertFalse(any("Google Safe Browsing" in reason for reason in result["indicators_found"]))
+        self.assertEqual(result['urls_analyzed'][0]['is_suspicious'], False)
+
+if __name__ == '__main__':
</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding test coverage for multiple URLs in a single message.

Please add a test with a message containing multiple URLs, both malicious and clean, to ensure all are correctly analyzed and flagged.

```suggestion
    @patch('social_media_analyzer.scam_detector.requests.post')
    def test_google_safe_browsing_clean(self, mock_post):
        # Mock the API response for a clean URL
        mock_response = Mock()
        mock_response.status_code = 200
        mock_response.json.return_value = {}
        mock_post.return_value = mock_response

        message = "this is a clean site http://www.google.com"
        result = analyze_text_for_scams(message, api_key="fake_key")

        self.assertFalse(any("Google Safe Browsing" in reason for reason in result["indicators_found"]))
        self.assertEqual(result['urls_analyzed'][0]['is_suspicious'], False)

    @patch('social_media_analyzer.scam_detector.requests.post')
    def test_multiple_urls_mixed_suspicion(self, mock_post):
        # Prepare mock responses for two URLs: one malicious, one clean
        def side_effect(url, *args, **kwargs):
            mock_resp = Mock()
            mock_resp.status_code = 200
            if "malicious.com" in kwargs['json']['threatInfo']['threatEntries'][0]['url']:
                mock_resp.json.return_value = {
                    "matches": [
                        {
                            "threatType": "MALWARE",
                            "platformType": "ANY_PLATFORM",
                            "threat": {
                                "url": "http://malicious.com"
                            }
                        }
                    ]
                }
            else:
                mock_resp.json.return_value = {}
            return mock_resp

        mock_post.side_effect = side_effect

        message = "Check these: http://malicious.com and http://www.google.com"
        result = analyze_text_for_scams(message, api_key="fake_key")

        urls = {url_info['url']: url_info for url_info in result['urls_analyzed']}
        self.assertIn("http://malicious.com", urls)
        self.assertIn("http://www.google.com", urls)
        self.assertTrue(urls["http://malicious.com"]['is_suspicious'])
        self.assertFalse(urls["http://www.google.com"]['is_suspicious'])
        self.assertTrue(any("Google Safe Browsing" in reason for reason in urls["http://malicious.com"]['reason']))
        self.assertFalse(any("Google Safe Browsing" in reason for reason in urls["http://www.google.com"]['reason']))

if __name__ == '__main__':
```
</issue_to_address>

### Comment 5
<location> `social_media_analyzer/main.py:66` </location>
<code_context>
def analyze_social_media(api_key):
    """Handles the analysis of social media platforms."""
    platforms = sorted([
        "facebook", "instagram", "whatsapp", "tiktok", "tinder", "snapchat",
        "wechat", "telegram", "twitter", "pinterest", "linkedin", "line",
        "discord", "teams", "zoom", "amazon", "alibaba", "youtube", "skype",
        "vk", "reddit", "email", "viber", "signal", "badoo", "binance",
        "sharechat", "messenger", "qzone", "qq", "vimeo", "musical.ly", "kuaishou", "douyin"
    ])

    while True:
        print("\nSelect the social media platform you want to analyze:")
        for i, p in enumerate(platforms, 1):
            print(f"{i}. {p.capitalize()}")

        try:
            choice = int(input(f"Enter your choice (1-{len(platforms)}): "))
            if 1 <= choice <= len(platforms):
                platform = platforms[choice - 1]
                break
            else:
                print("Invalid choice. Please try again.")
        except ValueError:
            print("Invalid input. Please enter a number.")

    while True:
        print(f"\nWhat do you want to do for {platform.capitalize()}?")
        print("1. Analyze a profile for signs of being fake.")
        print("2. Analyze a profile for identity usurpation.")
        print("3. Analyze a message for phishing or scam attempts.")

        try:
            analysis_choice = int(input("Enter your choice (1-3): "))
            if analysis_choice == 1:
                profile_url = input(f"Enter the {platform.capitalize()} profile URL to analyze: ").strip()
                if profile_url:
                    fake_profile_detector.analyze_profile_based_on_user_input(profile_url, platform)
                else:
                    print("No profile URL entered.")
                break
            elif analysis_choice == 2:
                profile_url = input(f"Enter the {platform.capitalize()} profile URL to analyze for impersonation: ").strip()
                if profile_url:
                    fake_profile_detector.analyze_identity_usurpation(profile_url, platform)
                else:
                    print("No profile URL entered.")
                break
            elif analysis_choice == 3:
                message = input("Paste the message you want to analyze: ").strip()
                if message:
                    result = scam_detector.analyze_text_for_scams(message, platform, api_key=api_key)
                    print("\n--- Scam Analysis Results ---")
                    print(f"Score: {result['score']} (Higher is more suspicious)")
                    print("Indicators Found:")
                    if result['indicators_found']:
                        for indicator in result['indicators_found']:
                            print(f"- {indicator}")
                    else:
                        print("No specific scam indicators were found.")
                else:
                    print("No message entered.")
                break
            else:
                print("Invalid choice. Please try again.")
        except ValueError:
            print("Invalid input. Please enter a number.")

</code_context>

<issue_to_address>
**issue (code-quality):** We've found these issues:

- Use named expression to simplify assignment and conditional [×3] ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
- Low code quality found in analyze\_social\_media - 21% ([`low-code-quality`](https://docs.sourcery.ai/Reference/Default-Rules/comments/low-code-quality/))

<br/><details><summary>Explanation</summary>


The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.

How can you solve this?

It might be worth refactoring this function to make it shorter and more readable.

- Reduce the function length by extracting pieces of functionality out into
  their own functions. This is the most important thing you can do - ideally a
  function should be less than 10 lines.
- Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
  sits together within the function rather than being scattered.</details>
</issue_to_address>

### Comment 6
<location> `social_media_analyzer/main.py:139` </location>
<code_context>
def main():
    """Main function to run the security analyzer."""
    api_key = get_api_key()
    print("--- Universal Security Analyzer ---")
    print("This tool helps you analyze social media, messages, and websites for potential scams and fake news.")
    if not api_key:
        print("\n[!] Google Safe Browsing API key not found.")
        print("    To enable real-time URL checking against Google's threat database,")
        print("    please set the GOOGLE_API_KEY environment variable.")

    while True:
        print("\n--- Main Menu ---")
        print("1. Analyze a Social Media Platform")
        print("2. Analyze a Website URL for Scams")
        print("3. Analyze a News URL for Fake News")
        print("4. Exit")

        try:
            choice = int(input("Enter your choice (1-4): "))
            if choice == 1:
                analyze_social_media(api_key)
            elif choice == 2:
                analyze_website_url(api_key)
            elif choice == 3:
                analyze_news_url()
            elif choice == 4:
                print("Exiting. Stay safe!")
                break
            else:
                print("Invalid choice. Please try again.")
        except ValueError:
            print("Invalid input. Please enter a number.")

</code_context>

<issue_to_address>
**issue (code-quality):** We've found these issues:

- Extract duplicate code into function ([`extract-duplicate-method`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/extract-duplicate-method/))
- Simplify conditional into switch-like form ([`switch`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/switch/))
</issue_to_address>

### Comment 7
<location> `social_media_analyzer/scam_detector.py:45-53` </location>
<code_context>
def check_google_safe_browsing(url, api_key):
    """
    Checks a URL against the Google Safe Browsing API.
    Returns a tuple: (is_suspicious, reason)
    """
    if not api_key:
        return False, "Google Safe Browsing API key not configured."

    api_url = f"https://safebrowsing.googleapis.com/v4/threatMatches:find?key={api_key}"
    payload = {
        "client": {
            "clientId": "social-media-analyzer",
            "clientVersion": "1.0.0"
        },
        "threatInfo": {
            "threatTypes": ["MALWARE", "SOCIAL_ENGINEERING", "UNWANTED_SOFTWARE", "POTENTIALLY_HARMFUL_APPLICATION"],
            "platformTypes": ["ANY_PLATFORM"],
            "threatEntryTypes": ["URL"],
            "threatEntries": [{"url": url}]
        }
    }
    try:
        response = requests.post(api_url, json=payload, timeout=10)
        if response.status_code == 200:
            data = response.json()
            if "matches" in data:
                threat_type = data["matches"][0]["threatType"]
                return True, f"Flagged by Google Safe Browsing as {threat_type}."
            else:
                return False, "Clean according to Google Safe Browsing."
        else:
            return False, f"Google Safe Browsing API error: {response.status_code}"
    except requests.RequestException as e:
        return False, f"Could not connect to Google Safe Browsing: {e}"

</code_context>

<issue_to_address>
**issue (code-quality):** We've found these issues:

- Swap if/else branches ([`swap-if-else-branches`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/swap-if-else-branches/))
- Remove unnecessary else after guard condition ([`remove-unnecessary-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-unnecessary-else/))
</issue_to_address>

### Comment 8
<location> `social_media_analyzer/scam_detector.py:77` </location>
<code_context>
def is_url_suspicious(url, platform=None, api_key=None):
    """
    Checks if a URL is suspicious based on various patterns and lists,
    including Google Safe Browsing.
    Returns a tuple: (bool_is_suspicious, reason_string)
    """
    # 1. Google Safe Browsing Check
    if api_key:
        is_susp, reason = check_google_safe_browsing(url, api_key)
        if is_susp:
            return True, reason

    # 2. Local Heuristics
    normalized_url = url.lower()
    domain = get_domain_from_url(url)
    legitimate_domains = get_legitimate_domains(platform)

    # Check if the domain is in the legitimate list for the platform
    if domain in legitimate_domains:
        # Still check for impersonation patterns that might include the legit domain
        for pattern in SUSPICIOUS_URL_PATTERNS:
            if re.search(pattern, normalized_url, re.IGNORECASE):
                if not domain.endswith(tuple(legitimate_domains)):
                    return True, f"URL impersonates a legitimate domain: {pattern}"
        return False, "URL domain is on the legitimate list."

    # Check against known suspicious patterns
    for pattern in SUSPICIOUS_URL_PATTERNS:
        if re.search(pattern, normalized_url, re.IGNORECASE):
            return True, f"URL matches suspicious pattern: {pattern}"

    # Check for suspicious TLDs
    suspicious_tld_regex = re.compile(r"\.(" + "|".join(tld.lstrip('.') for tld in SUSPICIOUS_TLDS) + r")$", re.IGNORECASE)
    if suspicious_tld_regex.search(domain):
        return True, f"URL uses a potentially suspicious TLD."

    # Check if a known legitimate service name is part of the domain, but it's not official
    for service in LEGITIMATE_DOMAINS.keys():
        if service != "general" and service in domain:
            return True, f"URL contains the name of a legitimate service ('{service}') but is not an official domain."

    return False, "URL does not match common suspicious patterns."

</code_context>

<issue_to_address>
**issue (code-quality):** We've found these issues:

- Use the built-in function `next` instead of a for-loop ([`use-next`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-next/))
- Replace f-string with no interpolated values with string ([`remove-redundant-fstring`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-redundant-fstring/))
</issue_to_address>

### Comment 9
<location> `social_media_analyzer/scam_detector.py:118` </location>
<code_context>
def analyze_text_for_scams(text_content, platform=None, api_key=None):
    """
    Analyzes a block of text content for various scam indicators.
    """
    if not text_content:
        return {"score": 0.0, "indicators_found": [], "urls_analyzed": []}

    text_lower = text_content.lower()
    score = 0.0
    indicators_found = []
    urls_analyzed_details = []

    # 1. Keyword-based checks
    keyword_checks = {
        "URGENCY": URGENCY_KEYWORDS,
        "SENSITIVE_INFO": SENSITIVE_INFO_KEYWORDS,
        "TOO_GOOD_TO_BE_TRUE": TOO_GOOD_TO_BE_TRUE_KEYWORDS,
        "GENERIC_GREETING": GENERIC_GREETINGS,
        "TECH_SUPPORT": TECH_SUPPORT_SCAM_KEYWORDS,
        "PAYMENT_REQUEST": PAYMENT_KEYWORDS,
    }

    for category, keywords in keyword_checks.items():
        for keyword in keywords:
            if keyword in text_lower:
                message = f"Presence of '{category.replace('_', ' ').title()}' keyword: '{keyword}'"
                if message not in indicators_found:
                    indicators_found.append(message)
                    score += HEURISTIC_WEIGHTS.get(category, 1.0)

    # 2. Regex-based checks
    found_urls = URL_PATTERN.findall(text_content)
    for url_str in found_urls:
        is_susp, reason = is_url_suspicious(url_str, platform, api_key)
        url_analysis = {"url": url_str, "is_suspicious": is_susp, "reason": reason}
        if is_susp:
            # Increase score significantly if flagged by Google
            if "Google Safe Browsing" in reason:
                score += HEURISTIC_WEIGHTS.get("GOOGLE_SAFE_BROWSING_HIT", 10.0)
            else:
                score += HEURISTIC_WEIGHTS.get("SUSPICIOUS_URL_PATTERN", 3.0)
            indicators_found.append(f"Suspicious URL found: {url_str} (Reason: {reason})")
        urls_analyzed_details.append(url_analysis)

    # 3. Financial Identifiers
    for id_name, pattern in FINANCIAL_ADDRESS_PATTERNS.items():
        if pattern.search(text_content):
            message = f"Potential {id_name} identifier found."
            if message not in indicators_found:
                indicators_found.append(message)
                score += HEURISTIC_WEIGHTS.get(f"{id_name}_ADDRESS", 2.5)

    # 4. Phone Numbers
    if PHONE_NUMBER_PATTERN.search(text_content):
        message = "Phone number detected in text."
        if message not in indicators_found:
            indicators_found.append(message)
            score += HEURISTIC_WEIGHTS.get("PHONE_NUMBER_UNSOLICITED", 1.0)

    return {
        "score": round(score, 2),
        "indicators_found": indicators_found,
        "urls_analyzed": urls_analyzed_details
    }

</code_context>

<issue_to_address>
**issue (code-quality):** Low code quality found in analyze\_text\_for\_scams - 25% ([`low-code-quality`](https://docs.sourcery.ai/Reference/Default-Rules/comments/low-code-quality/))

<br/><details><summary>Explanation</summary>The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.

How can you solve this?

It might be worth refactoring this function to make it shorter and more readable.

- Reduce the function length by extracting pieces of functionality out into
  their own functions. This is the most important thing you can do - ideally a
  function should be less than 10 lines.
- Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
  sits together within the function rather than being scattered.</details>
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

print(f"- {indicator}")

def analyze_social_media():
def analyze_social_media(api_key):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): We've found these issues:


Explanation

The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.

How can you solve this?

It might be worth refactoring this function to make it shorter and more readable.

  • Reduce the function length by extracting pieces of functionality out into
    their own functions. This is the most important thing you can do - ideally a
    function should be less than 10 lines.
  • Reduce nesting, perhaps by introducing guard clauses to return early.
  • Ensure that variables are tightly scoped, so that code using related concepts
    sits together within the function rather than being scattered.

return False, "URL does not match common suspicious patterns."

def analyze_text_for_scams(text_content, platform=None):
def analyze_text_for_scams(text_content, platform=None, api_key=None):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Low code quality found in analyze_text_for_scams - 25% (low-code-quality)


ExplanationThe quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.

How can you solve this?

It might be worth refactoring this function to make it shorter and more readable.

  • Reduce the function length by extracting pieces of functionality out into
    their own functions. This is the most important thing you can do - ideally a
    function should be less than 10 lines.
  • Reduce nesting, perhaps by introducing guard clauses to return early.
  • Ensure that variables are tightly scoped, so that code using related concepts
    sits together within the function rather than being scattered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants