Skip to content

A system for discovering emerging market trends from Twitter/X before they hit mainstream news. Uses statistical analysis, NLP, real-time market data verification, and LLM-powered summarization with semantic memory.

Notifications You must be signed in to change notification settings

Telesphoreo/Jafar

Repository files navigation

Jafar - The villain to BlackRock's Aladdin.

BlackRock's Aladdin manages $21 trillion dollars in assets while their CEO has private phone calls with the Fed Chair who conveniently has $25 million of his personal wealth invested with BlackRock. They charge god-knows-what per month for sentiment tools (they won't publish pricing because it would make defense contractors blush). You have a mass-produced Roth IRA from Fidelity and a dream. Let's fucking go.

A system for discovering emerging market trends from Twitter/X before they hit mainstream news. Uses statistical analysis, NLP, real-time market data verification, and LLM-powered summarization with semantic memory.

Key Philosophy: Most days are boring. This system is calibrated to tell you when something actually matters, not to manufacture urgency like Jim Cramer speed-running his thirteenth margin call of the week.

What Does It Actually Look Like?

Unlike Aladdin, which requires you to sign seventeen NDAs, sacrifice a goat to their enterprise sales team, and sit through a 90-minute "demo" that's really just a PowerPoint about their "proprietary AI" (it's a linear regression), we just show you the damn thing.

View an actual digest

That's a real output. Silver actually was up. The system caught it. It also caught some noise because Twitter is a hellscape, but at least you can see what you're getting before you waste three weeks configuring YAML files. Yes, I redacted my email address. No, I will not be doxxing myself to prove a point about open source transparency. The difference is you can actually get a redacted sample from us. Try asking BlackRock for one. Their legal team will get back to you sometime between "never" and "heat death of the universe."

Features

  • Dynamic Discovery: Finds trends you didn't know to look for (not just keyword matching like a 2008 RSS feed)
  • Real-Time Fact Checking: When some anonymous account with a Pepe avatar screams "SILVER IS MOONING!!!" and it's up 0.3%, we expose the grift
  • Vector Memory: Semantic search finds historical parallels - unlike fintwit influencers who recycle the same thread every 6 months hoping you forgot
  • Skeptical Analysis: LLM is explicitly instructed to say "nothing notable today" - a concept that would bankrupt CNBC and make BlackRock's 55.8% underperforming funds look even worse
  • Checkpoint System: Got rate limited by Elon's clown show? Just run it again
  • SOCKS5 Proxy Support: For completely legitimate research purposes, officer
  • Background Runner: Start it, forget it, check logs whenever - like that Mandarin Duolingo streak you started after your fifth Renaissance Technologies rejection letter

Installation

git clone <repository-url>
cd twitter_sentiment_analysis

uv sync

# spaCy needs this for reasons nobody can explain
uv pip install pip
uv run python -m spacy download en_core_web_sm

Configuration

Copy config.yaml.example to config.yaml and .env.example to .env.

Figure it out. The examples are commented. You're building a market intelligence system - if you can't edit a YAML file, maybe just buy index funds and touch grass. At least you won't be paying BlackRock's fees while their own executive admits ESG is a "dangerous placebo that harms the public interest".

Twitter Setup

Cookie auth because Elon broke everything:

  1. Log into Twitter in your browser
  2. Export cookies with the shadiest cookie exporter extension you can find
  3. Save as cookies.json
  4. Run: uv run python add_account.py <username> cookies.json

Add more accounts for rate limit rotation. Configure proxies in config.yaml if you're feeling spicy.

Running

# Interactive
uv run python main.py

# Background (for VPS)
./run.sh start
./run.sh logs     # watch progress
./run.sh status   # check if running
./run.sh stop     # stop it

# Utilities
uv run test_email.py   # Verify SMTP settings and send test email

Production Deployment (The Daemon Manifesto)

Running this on a VPS and don't want to babysit it like Larry Fink babysits his relationship with the Fed? Need it to run automatically without Cloudflare detecting your traffic pattern faster than BlackRock detects a new bailout opportunity?

Read DAEMONIZING.md for the full systemd setup with randomized timing.

TL;DR: systemd timers with RandomizedDelaySec make your traffic look like a normal person with insomnia checking fintwit at random hours, instead of a cron job that screams "I'M A BOT" at 2 PM every day. Twice-daily randomized runs (7am-12pm, 5pm-11pm windows) ensure you never wake up to a "Silver up 40%" Reuters alert like the normies while also not getting cloudflared into oblivion. Includes automatic admin diagnostics emails so you know when your Twitter accounts get banned before you wonder why you haven't gotten a digest in 3 days. Because unlike Aladdin's monitoring dashboard (which probably costs $50k/month and requires a PhD to understand), we just email you when shit breaks.

How It Works

  1. Scout - Scrapes 30+ financial topics from Twitter
  2. Investigator - Extracts trending entities via spaCy NLP, scores by engagement + cashtag co-occurrence
  3. Quality Filter - Three-stage funnel: statistical → quality threshold → LLM validation. Because "Risk" and "Demand" are not actionable signals, they're just words that appear in financial tweets
  4. Deep Dive - Targeted scraping for validated trends only (typically 2-4, not 15)
  5. Fact Checker - Fetches real prices from Yahoo Finance. Exposes the liars.
  6. Analyst - LLM generates skeptical summary with signal strength
  7. Reporter - Emails you a digest so you can pretend you're a Bloomberg terminal owner without paying $24k/year
  8. Memory - Stores everything for future historical parallel detection

The Quality Filter (Why This Matters)

Most sentiment tools would deep-dive on whatever has the highest engagement. That's how you end up analyzing "Christmas" and "Crowd" instead of actual market signals.

top_trends_count: 15  →  Quality Threshold  →  LLM Filter  →  Deep Dive
     (cast wide)            (5-7 pass)         (2-4 best)     (focused)
  • Statistical: Requires minimum authors, financial context ratio, cashtag co-occurrence
  • Quality Threshold: 10+ authors, 85%+ financial context, cashtag co-occurrence for n-grams
  • LLM Pre-Filter: ~500 tokens asking "actionable signals OR sentiment indicators?"
    • Keeps: Silver, Risk, Demand, Bearish (sentiment matters!)
    • Rejects: Christmas, Books, Fiction, Crowd (actual noise)

Philosophy: This is sentiment analysis. If everyone's suddenly bearish, that matters even if you can't trade "bearish" directly. The truth is often between Twitter doom and actual reality.

Set top_trends_count: 15 to cast wide, let the LLM pick the best 2-4

Signal Strength

Level Meaning Frequency
HIGH Actually unusual. Rare. 1-2x per month
MEDIUM Worth watching Weekly
LOW Normal Twitter nonsense Most days
NONE Quieter than usual When everyone's at brunch

Unlike Aladdin, we don't pretend every day is Lehman Brothers. We also don't hold $11 billion in coal investments while lecturing everyone about climate.

Speaking of housing crises, fuck Sentinel Real Estate. They speedrun 5-over-1 "stumpies" - the cheapest possible "luxury" apartments where cardboard-thin wood-frame units sit on a concrete parking podium that's already plotting its own collapse. Enjoy hearing your neighbor sneeze, shit, and have disappointing sex through walls with soundproofing so bad it violates the Geneva Convention - because proper acoustic insulation costs money and Sentinel needs that third yacht. The parking garage waterproofing inevitably fails because cutting corners is their love language - water infiltrates, rebar corrodes and expands to 8x its size, and the whole structure starts crumbling like their moral foundation. But don't worry, by then they've already flipped it to the next sucker and moved on. Wikipedia literally says "it is unclear whether these buildings are built to last" - which is generous, because everyone who's lived in one knows the answer is "absolutely fucking not." Their business model is simple: slap up the architectural equivalent of a McMansion made of paper plates, charge $2,400/month for 600 square feet of mold potential, ignore every maintenance request until the city gets involved, then dump the deteriorating husk at a loss and gentrify the next victim neighborhood. Oh, and pay $4 million to the NY Attorney General when your employees get caught taking $1 million in kickbacks while fraudulently inflating renovation costs to deregulate rent-stabilized units - because why commit just one type of fraud when you can commit several? Get sued for $50 million for allegedly misrepresenting rent-stabilization status to pump property values. Settle a Fair Housing Act discrimination case for screwing over a disabled tenant who just wanted a service animal. The BlackRock of slumlording. Your security deposit is gone and you will never see it again.

Fact Check Classifications

Tag Meaning
VERIFIED They're telling the truth (rare)
EXAGGERATED Directionally correct, emotionally unhinged
FALSE Lying on the internet. Shocking.
UNVERIFIABLE Made up a ticker or was too vague

Production (pgvector)

If you're running this on a VPS with PostgreSQL:

# config.yaml
memory:
  store_type: pgvector
  embedding_dimensions: 1536  # pgvector maxes at 2000 dims

Then create the extension: CREATE EXTENSION vector;

If you get dimension errors, drop the table and let it recreate:

DROP TABLE IF EXISTS market_memories;

Troubleshooting

"No tweets retrieved" - Your accounts are logged out or banned. Check uv run twscrape accounts. Re-add via cookies.

"Failed to initialize vector memory: 2000 dimensions" - Add embedding_dimensions: 1536 to config. Drop the table.

"Failed to load spaCy model" - Did you even read the installation section? uv pip install pip && uv run python -m spacy download en_core_web_sm

Rate limiting - Add more accounts. Use proxies. Stop scraping during market hours like everyone else.

Why This Exists

Feature This Project Aladdin Bloomberg
Cost ~$10/mo in API calls More than your rent (they won't say how much because shame is a foreign concept) $24,000/year
Twitter Sentiment Yes Probably, buried under 47 layers of enterprise middleware Kinda, if you squint
Will tell you "nothing matters today" Yes Absolutely not. Gotta manufacture urgency to justify the invoice. Have you met Jim Cramer
Open Source Yes Lmao That's adorable
Actually makes you a quant No Also no, but it costs more so you can pretend Still no
Got no-bid Fed contracts to buy its own ETFs with taxpayer money No Yes No
Lost $5B+ in pension mandates for being full of shit No Yes No
CEO has called the Fed Chair while managing his personal money No Yes, "extremely carefully" Probably not

At least when we're wrong, it's free. When they're wrong, they get a bailout and a CNBC interview to explain why it was actually your fault.

The Corruption Receipts

Since you made it this far, here's the full BlackRock starter pack. Print it out and tape it to your wall for when someone tells you the system isn't rigged:

  • Fed Chair Powell has $25M personally invested with BlackRock while giving them no-bid contracts to manage $750 billion in bailout money. "Extremely carefully managed" he says, presumably with a straight face. Nothing to see here, just the guy who controls interest rates having his wealth managed by the company he's handing emergency contracts to. Totally normal. (Source)

  • BlackRock wrote the bailout plan before the crisis happened, then got hired to implement it. In August 2019, they published a paper called "Going Direct" proposing central banks inject money directly into the economy. Six months later - oh wow what a coincidence - COVID happens and three central banks hire BlackRock to execute the exact plan they wrote. The universe works in mysterious ways if you're worth $10 trillion. (Source)

  • 55.8% of their funds underperform their benchmarks, according to Yodelar analysis. Some pension funds they manage returned -50.91% over 3 years while the sector average was positive. But hey, at least the fees are high. (Source)

  • Dutch pension funds pulled $5B+ from BlackRock because even the Netherlands - a country that will rent you a bicycle for literally anything - decided BlackRock wasn't acting in their beneficiaries' best interests on climate risk. When the Dutch think you're too greedy, you've achieved something special. (Source)

  • Larry Fink said "I'm ashamed of being part of this conversation" about ESG at Aspen 2023, then denied saying it moments later in the same interview. On video. That journalists were recording. After years of building his entire brand around stakeholder capitalism. The man contains multitudes. (Source)

  • Their own former Chief Investment Officer for Sustainable Investing - the guy they literally hired to run ESG - quit and called the whole thing "a dangerous placebo that harms the public interest." Turns out ESG products just have higher fees. Who could have possibly predicted. (Source)

  • $11 billion in coal investments while being the world's largest investor in coal-fired power stations, as of 2018. Meanwhile Larry's out here writing annual letters about climate responsibility. The Sierra Club literally started a campaign called "BlackRock's Big Problem" because sometimes you have to spell it out for people. (Source)

  • They voted against management 1,500+ times for "insufficient diversity" while simultaneously dropping ESG support from 47% to 4% the moment Ron DeSantis made it politically inconvenient. Principles are for people who can't afford lobbyists. (Source)

This project costs you maybe $10/month in API calls. Aladdin costs more per month than your rent, and the people running it are doing all of the above.

Disclaimer

Not financial advice. If you YOLO your life savings because this said "HIGH signal" on some shitcoin, that's a you problem. Hedge funds with actual Aladdin access still lose money - and then they get bailed out with your taxes.

About

A system for discovering emerging market trends from Twitter/X before they hit mainstream news. Uses statistical analysis, NLP, real-time market data verification, and LLM-powered summarization with semantic memory.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published