Can we group tarot cards algorithmically based on how beneficial or harmful their meanings are — both upright and reversed?
People have found great fascination drawing and interpreting tarot cards. With major arcana cards typically representating major life changes and the minor arcana revealing day-to-day circumstances. Additionally, cards drawn in reverse provide a counterpoint to the card’s traditional upright meaning.
This project applies natural language processing and unsupervised machine learning to the 78-card Rider-Waite-Smith tarot deck. Each card is scored across dimensions of positivity, severity (i.e. intensity of meaning), and domain relevance (love, career, health, etc.) based on each card’s traditional interpretations (both upright and reversed), then visualized on an interactive 2D map. With this project, I wanted to see if I could apply a Kmeans clustering algorithm to group cards by their historical meanings.
The pipeline combines web scraping (beautifulsoup), VADER sentiment analysis, and the Claude API to produce a 156-row dataset (78 cards × 2 orientations), which is then explored using K-Means clustering and scatter plot visualization.
The full interactive map loads when you first visit the page — hover over any dot to see the card name, keywords, scores, and Rider-Waite-Smith image. Use the filter buttons to explore by suit, orientation, or arcana type. Green = upright, red = reversed, gold ring = Major Arcana. Use the ✦ View Map button (bottom-right) to return to it at any time.
Keyword meanings were scraped from Labyrinthos, targeting only the structured keyword table on each card page — not the full prose descriptions.
Why keywords only? Prose text is editorially biased toward positive framing even for difficult cards. Keywords are more neutral and discriminative — “reckless, careless, distracted” scores very differently from “beginnings, freedom, innocence”.
labyrinthos.co/blogs/tarot-card-meanings-listURL pattern:
https://labyrinthos.co/blogs/tarot-card-meanings-list/{card-slug}
Each card row was scored across 7 features using a blend of VADER sentiment analysis and Claude API scoring:
| Feature | Type | Source | Description |
|---|---|---|---|
positivity |
float −1 to +1 | VADER + LLM blend | Overall sentiment of the keywords |
positivity_vader |
float −1 to +1 | VADER | Raw VADER compound score |
positivity_llm |
float −1 to +1 | Claude API | LLM score grounded in keywords only |
severity |
float 0 to 1 | Claude API | How intense or life-changing the card is |
domain_love |
binary 0/1 | Claude API | Associated with relationships or love |
domain_career |
binary 0/1 | Claude API | Associated with work or finances |
domain_health |
binary 0/1 | Claude API | Associated with physical or mental health |
domain_spirit |
binary 0/1 | Claude API | Associated with spirituality or inner life |
energy_active |
binary 0/1 | Claude API | Action/change vs passive/receptive |
The final positivity score is the average of VADER and
LLM scores, giving a more robust signal than either alone.
VADER (Valence Aware Dictionary and sEntiment Reasoner) was applied directly to the raw keyword strings. VADER works well on comma-separated word lists since it evaluates sentiment without requiring sentence structure.
Claude API was prompted with the same keywords to produce structured JSON scores. The prompt explicitly required traditional Rider-Waite-Smith interpretation — not modern positive-psychology reframing — and included explicit scoring benchmarks:
Scoring reminders:
- ALL reversed cards should score negative or near zero on positivity
- The Tower, Ten of Swords, Three of Swords upright should score below -0.6
- Severity must be spread across the full 0.0-1.0 range
- Minor everyday cards score 0.1-0.3 severity
- Catastrophic cards (The Tower) score 0.9-1.0 severity
Key data quality finding: Early iterations using full prose descriptions inflated positivity scores dramatically — reversed cards averaged +0.255 instead of negative values, and The Tower upright scored as “Very Positive”. Switching to keywords-only fixed this. Final dataset: reversed cards average −0.504, upright cards average +0.353.
K-Means clustering was explored across k=2 through k=15 with multiple feature set combinations. Silhouette score was the primary evaluation metric.
| Feature Set | Best k | Best Silhouette | Outcome |
|---|---|---|---|
positivity only |
2 | 0.7699 | Clean split but trivially positive/negative |
positivity + severity |
15 | 0.5790 | Still climbing — no clear peak |
positivity + domains |
15 | 0.4032 | Domain flags adding noise |
| All 7 features | 15 | 0.2911 | Worst — severity variance too low |
| Without severity | 15 | 0.4877 | Better but still no peak |
Why K-Means didn’t produce a clean result:
Given these findings, the scatter plot map was adopted as the primary output. THis was a more honest representation of the data’s actual structure than forcing arbitrary cluster labels.
The interactive map plots all 156 cards on a positivity × severity plane using HTML5 Canvas.
Features:
| Card | Positivity | Severity |
|---|---|---|
| Ten of Swords (upright) | −0.940 | 0.9 |
| Three of Swords (upright) | −0.934 | 0.8 |
| Queen of Swords (reversed) | −0.925 | 0.7 |
| The Tower (upright) | −0.918 | 1.0 |
| Five of Swords (upright) | −0.883 | 0.7 |
| Card | Positivity | Severity |
|---|---|---|
| The Sun (upright) | +0.964 | 0.4 |
| Nine of Cups (upright) | +0.923 | 0.2 |
| Six of Wands (upright) | +0.921 | 0.6 |
| The Empress (upright) | +0.909 | 0.2 |
| Strength (upright) | +0.907 | 0.5 |
| Quadrant | Representative Cards |
|---|---|
| Intensely Negative (top-left) | The Tower, Ten of Swords, Three of Swords, Five of Swords, Eight of Swords |
| Mildly Negative (bottom-left) | Most reversed cards, The Moon, The Hanged Man, Seven of Cups |
| Intensely Positive (top-right) | The World, Judgement, Wheel of Fortune (upright), The Chariot |
| Mildly Positive (bottom-right) | Nine of Cups, Ten of Cups, The Empress, Six of Pentacles |
Full prose descriptions from Labyrinthos inflated positivity scores dramatically. Even The Tower (a notoriously challenging card) was framed in growth-oriented language. Keywords proved far more neutral and reliable for sentiment analysis than prose.
Without explicit anchor examples in the prompt, the LLM defaulted 46% of cards to a severity of 0.6. Adding named reference cards (e.g. “Two of Cups = 0.2, The Tower = 1.0”) in the prompt instructions spread the distribution, though variance remained low (0.025–0.028).
The strongest clustering signal in the data is upright vs. reversed orientation — upright cards average +0.353 positivity, reversed average −0.504. This isn’t a flaw; it reflects the fundamental structure of tarot interpretation.
| Tool / Library | Purpose |
|---|---|
| Python 3 | Primary language |
requests + BeautifulSoup |
Web scraping of Labyrinthos keyword tables |
vaderSentiment |
NLP sentiment scoring on keyword strings |
anthropic (Python SDK) |
Claude API calls for structured JSON scoring |
pandas |
Data manipulation and CSV management |
scikit-learn |
StandardScaler, KMeans,
silhouette_score, PCA |
matplotlib + seaborn |
Exploratory scatter plots and elbow/silhouette charts |
| HTML5 Canvas (vanilla JS) | Interactive map — no frontend framework required |
| File | Description |
|---|---|
tarot_keywords.csv |
Full 156-row dataset with all features |
tarot_map_full.html |
Standalone interactive visualization |
tarot_failed_cards.csv |
Cards that failed scraping or scoring |
Rider-Waite-Smith tarot imagery is in the public domain. Keyword data sourced from Labyrinthos for research purposes.