blob: 211510e2fa001d22a42956abb9c314edc96b34b0 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
|
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
SORMARK is an AI-powered bookmark management application that analyzes and categorizes Twitter bookmarks using LLMs. It fetches bookmarks from Twitter, processes them with Google's Gemini AI to categorize content, and provides a workflow to review and organize bookmarks by category.
## Tech Stack
- **Frontend**: Waku (React framework with server components)
- **Runtime**: Bun
- **Styling**: Tailwind CSS v4
- **Language**: TypeScript with strict configuration
- **Development**: Devenv for environment management
- **LLM**: Google Gemini 2.5-flash for categorization
- **Data Storage**: JSON files and potential Obsidian integration
## Development Commands
### Getting Started
```bash
# Enter development environment
direnv allow # or devenv shell
# Install dependencies
bun install
```
### Running the Application
```bash
# Development server
bun dev
# Build for production
bun build
# Start production server
bun start
```
### Environment Setup
The project uses devenv for environment management. Key environment variables:
- `GEMINI_API_KEY`: Google Gemini API key for LLM categorization
- `TWITTER_COKI`: Cookie used on x.com frontend for bookmark access
- `TWATTER_COKI`: Alternative environment variable name for Twitter cookie
## Project Structure
```
app/
├── src/
│ ├── pages/ # Waku pages (file-based routing)
│ │ ├── _layout.tsx # Root layout component
│ │ ├── index.tsx # Home page - displays all bookmarks
│ │ ├── categorize.tsx # Bookmark categorization workflow
│ │ └── about.tsx # About page
│ ├── components/ # React components
│ │ ├── counter.tsx # Client-side counter example
│ │ ├── header.tsx # Site header
│ │ └── footer.tsx # Site footer
│ ├── lib/ # Core libraries
│ │ ├── twitter-api.ts # Twitter API integration
│ │ ├── categorization.ts # Category definitions and types
│ │ ├── llm-service.ts # Google Gemini integration
│ │ ├── llm-prompts.ts # LLM categorization prompts
│ │ ├── bookmark-storage.ts # Bookmark storage utilities
│ │ └── testData.json # Test bookmark data
│ └── styles.css # Global styles with Tailwind
├── public/ # Static assets
└── package.json # Dependencies and scripts
```
## Core Features Implemented
### ✅ Twitter Integration
- **Bookmark Fetching**: Fetches all bookmarks from Twitter using authenticated API calls
- **Media Processing**: Extracts images and video thumbnails from bookmarks
- **Rate Limiting**: Respects Twitter API limits with 1-second delays between requests
- **Error Handling**: Comprehensive error handling for API failures
### ✅ AI Categorization
- **LLM Integration**: Uses Google Gemini 2.5-flash for intelligent content categorization
- **Image Analysis**: Analyzes bookmark images to enhance categorization accuracy
- **Custom Categories**: User-defined category system with criteria
- **Multi-category Support**: Allows bookmarks to belong to multiple categories
- **Confidence Scoring**: Provides confidence levels for categorization suggestions
### ✅ Server-Rendered Categorization Workflow
- **Progressive Processing**: Processes bookmarks one-by-one with server-side LLM calls
- **Progress Tracking**: Shows "Bookmark X of Y" with visual progress bar
- **Category Selection**: Checkbox-based interface for selecting categories
- **Save & Next**: Form-based navigation to next bookmark after categorization
- **Skip Functionality**: Allows skipping bookmarks without categorization
### ✅ Bookmark Management
- **Complete Data**: Stores tweet text, author info, media, hashtags, and URLs
- **Search & Filter**: Filter bookmarks by categories and search text
- **Remove Bookmarks**: API endpoint to remove bookmarks from Twitter
- **Export Options**: Integration with Obsidian vault for note storage
## Next Steps & Roadmap
### 🔧 Immediate Improvements
1. **Database Integration**: Replace JSON storage with SQLite/PostgreSQL for persistence
2. **Category Management**: Add UI for managing custom categories
3. **Search/Filter**: Implement advanced filtering by date, author, categories
4. **Bulk Operations**: Allow bulk categorization of similar bookmarks
### 🚀 Advanced Features
1. **Smart Suggestions**: Improve LLM prompts based on user feedback
2. **Content Analysis**: Extract and analyze linked article content
3. **Calendar Integration**: Create events from bookmarks with dates
4. **RSS Export**: Generate RSS feeds for categorized bookmarks
5. **Browser Extension**: Chrome/Firefox extension for bookmarking directly
### 🎯 User Experience
1. **Bookmark Queues**: Create queues for different processing priorities
2. **Keyboard Shortcuts**: Add keyboard navigation for faster categorization
3. **Dark Mode**: Implement dark theme support
4. **Mobile Responsive**: Optimize for mobile devices
### 📝 Data Persistence
1. **Save Configuration**: Store user category preferences
2. **Processing State**: Save categorization progress across sessions
3. **Export Formats**: Add CSV, JSON, and Markdown export options
4. **Backup/Restore**: Implement bookmark backup and restore functionality
## Environment Variables Required
```bash
# Required
GEMINI_API_KEY=your_google_gemini_api_key
TWITTER_COKI=your_twitter_authentication_cookie
# Optional
LLM_BASE_URL=custom_llm_endpoint_if_needed
LLM_API_KEY=custom_llm_api_key_if_needed
```
## Development Tips
- **Test Data**: Use `testData.json` for development without Twitter API calls
- **Cookie Refresh**: Twitter cookies expire frequently - refresh as needed
- **Rate Limits**: Be mindful of Twitter API rate limits during development
- **LLM Costs**: Monitor Gemini API usage during development
|