Cross-Hashlist Crack Synchronization¶
Overview¶
KrakenHashes implements a sophisticated cross-hashlist synchronization system that ensures when a hash is cracked, ALL hashlists containing that hash are automatically updated. This system maintains consistency across multiple hashlists while minimizing redundant work and ensuring agents always have current data.
Core Concepts¶
Hash Deduplication Model¶
KrakenHashes stores hashes in a central hashes table with two key fields:
hash_value: The canonical hash value (e.g.,5F4DCC3B5AA765D61D8327DEB882CF99)original_hash: The complete original line from upload (e.g.,Administrator:500:...:5F4DCC3B5AA765D61D8327DEB882CF99:::)
Key Insight: Multiple users can share the same password hash but have different original_hash values. The system deduplicates by hash_value for cracking efficiency while preserving all original entries.
Many-to-Many Relationship¶
The hashlist_hashes join table links hashlists to hashes:
When hash_123 is cracked, all three hashlists need their files regenerated.
How It Works¶
1. Crack Detection¶
When an agent reports cracked hashes via the crack batch mechanism:
crackedHashes := []string{
"5F4DCC3B5AA765D61D8327DEB882CF99:password123",
"098F6BCD4621D373CADE4E832627B4F6:test",
}
2. Hash Update¶
The system updates the central hashes table: - Sets is_cracked = true - Stores the plaintext password - Updates last_updated timestamp
Important: ALL hashes with the same hash_value are marked as cracked, regardless of original_hash or hashlist association.
3. Affected Hashlist Identification¶
The system queries which hashlists contain the cracked hashes:
SELECT DISTINCT hl.*
FROM hashlists hl
JOIN hashlist_hashes hh ON hl.id = hh.hashlist_id
JOIN hashes h ON hh.hash_id = h.id
WHERE h.hash_value = ANY($1)
This identifies ALL hashlists that need file regeneration.
4. Counter Updates¶
For each affected hashlist, the system increments its cracked_hashes counter:
// Example: If 2 cracked hashes belong to hashlists [98, 98, 99, 100]:
// - Hashlist 98 increments by 2
// - Hashlist 99 increments by 1
// - Hashlist 100 increments by 1
5. File Regeneration¶
Each affected hashlist file is regenerated from scratch:
Process: 1. Query all uncracked hashes for the hashlist 2. Write to temporary file: {hashlist_id}.hash.tmp 3. Atomically rename to {hashlist_id}.hash 4. Calculate new MD5 hash of the file
Example:
Before crack:
Administrator:500:...:5F4DCC3B5AA765D61D8327DEB882CF99:::
User1:501:...:098F6BCD4621D373CADE4E832627B4F6:::
Guest:502:...:E10ADC3949BA59ABBE56E057F20F883E:::
After cracking 5F4DCC3B... and 098F6BCD...:
Guest:502:...:E10ADC3949BA59ABBE56E057F20F883E:::
6. Agent Synchronization¶
For each affected hashlist, the system updates all agent records: 1. Updates agent_hashlists.file_hash to new MD5 2. This marks agent copies as outdated 3. On next connection, agents detect the mismatch 4. Agents automatically download the updated file
Benefits¶
1. Consistency Across Hashlists¶
If the same hash appears in multiple hashlists (e.g., corporate environments with shared passwords), cracking it once updates all:
Scenario: Password "Summer2024!" used by:
- Hashlist A: john@domain.com
- Hashlist B: john.doe@otherdomain.com
- Hashlist C: jdoe@thirddomain.com
Result: Cracking ANY of these updates ALL three hashlists automatically
2. Efficient Cracking¶
Hashcat never receives duplicate hashes: - Uses DISTINCT hash_value when generating hashlist files - Even if 1000 users share password "Password1", hashcat only cracks it once - System propagates the crack to all 1000 entries automatically
3. Real-Time Updates¶
Agents always work with current data: - Stale hashlists automatically detected via MD5 mismatch - Fresh files downloaded before task execution - Prevents wasted work on already-cracked hashes
Implementation Details¶
Code Flow¶
Backend: HandleCrackBatch in job_websocket_integration.go
// Track affected hashlists (map[hashlist_id]crack_count)
affectedHashlists := make(map[int64]int)
// Process each crack
for _, crackedEntry := range crackedHashes {
// Update hash in database
hash.IsCracked = true
hash.Password = &plaintext
// Find which hashlists contain this hash
hashlistIDs, _ := s.hashRepo.GetHashlistIDsForHash(ctx, hash.ID)
// Increment counter for each affected hashlist
for _, hashlistID := range hashlistIDs {
affectedHashlists[hashlistID]++
}
}
// Update counters and regenerate files
for hashlistID, count := range affectedHashlists {
s.hashlistRepo.IncrementCrackedCount(ctx, hashlistID, count)
}
// Trigger file regeneration for all affected hashlists
s.hashlistSyncService.UpdateHashlistAfterCracks(ctx, hashlistID, crackedHashValues)
Backend: UpdateHashlistAfterCracks in hashlist_sync_service.go
// Find ALL hashlists containing these cracked hashes
affectedHashlists := s.hashlistRepo.GetHashlistsContainingHashes(ctx, hashValues)
// Regenerate each hashlist file
for _, hashlist := range affectedHashlists {
// Get uncracked hashes
uncrackedHashes := s.hashRepo.GetUncrackedHashValuesByHashlistID(ctx, hashlist.ID)
// Write to temp file
file.WriteString(hash + "\n") // for each uncrackedHash
// Atomic replace
os.Rename(tempFile, actualFile)
// Update agent records
for _, agentHashlist := range distribution {
agentHashlist.FileHash = &newMD5
s.agentHashlistRepo.CreateOrUpdate(ctx, agentHashlist)
}
}
Agent: ensureHashlist in jobs.go
// ALWAYS re-download for each task to ensure fresh copy
if _, err := os.Stat(localPath); err == nil {
debug.Info("Removing existing hashlist to download fresh copy")
os.Remove(localPath)
}
// Download fresh copy from backend
fileInfo := &filesync.FileInfo{
Name: fmt.Sprintf("%d.hash", hashlistID),
FileType: "hashlist",
ID: int(hashlistID),
MD5Hash: "", // Skip verification for speed
}
s.fileSync.DownloadFileFromInfo(ctx, fileInfo)
Performance Considerations¶
Scalability¶
File Regeneration Cost: O(U) where U = uncracked hashes per hashlist - Small hashlists (< 10k): Instant regeneration - Medium hashlists (10k-100k): < 1 second - Large hashlists (100k-1M): 1-5 seconds - Very large (> 1M): 5-30 seconds
Multi-Hashlist Impact: If 10 hashlists share hashes, ALL 10 regenerate - Sequential processing prevents race conditions - Failures on one hashlist don't block others - Agents notified asynchronously
Optimization Strategies¶
- Batched Processing: Cracks processed in batches (default: 50 per batch)
- Atomic Updates: Temp files prevent partial writes
- Lazy Agent Sync: Agents discover updates on-demand, not pushed
- Distinct Queries: Hashcat receives deduplicated hashes
Database Efficiency¶
Counter Updates: Batch increments reduce transaction overhead
// Single update per hashlist, not per hash
IncrementCrackedCount(hashlistID, totalCracksForThisHashlist)
Index Utilization: - hash_value indexed for fast duplicate detection - hashlist_hashes join table indexed on both FKs - is_cracked index for uncracked hash queries
Edge Cases¶
1. Hash in Multiple Hashlists¶
Scenario: Same hash in 5 different hashlists
Behavior: - Hash marked cracked once in hashes table - All 5 hashlists get counter increments - All 5 hashlist files regenerated - All agents with any of the 5 hashlists notified
2. Partially Failed Regeneration¶
Scenario: Hashlist file regeneration fails for 1 of 5 affected hashlists
Behavior: - Error logged but processing continues - Other 4 hashlists still regenerated successfully - Failed hashlist can retry on next crack - Database counters still updated correctly
3. Agent Offline During Update¶
Scenario: Agent offline when hashlist updated
Behavior: - Agent's file_hash still updated in database - On reconnection, file sync detects mismatch - Agent automatically downloads fresh file - No manual intervention required
4. Empty Hashlist After Cracks¶
Scenario: All hashes in a hashlist get cracked
Behavior: - Hashlist file becomes empty (0 bytes) - File still exists (prevents 404 errors) - Hashlist status remains "ready" - Progress shows 100% cracked
Monitoring and Debugging¶
Key Metrics¶
Monitor these for cross-hashlist sync health:
- File Regeneration Time: Track duration per hashlist
- Affected Hashlist Count: How many hashlists per crack batch
- Agent Sync Lag: Time between file update and agent download
Debug Logging¶
Enable debug logging to trace sync flow:
DEBUG: Found affected hashlists for cross-hashlist update
DEBUG: affected_count=3, hashlist_ids=[98,99,100]
DEBUG: Regenerating hashlist file 98
DEBUG: Found 4523 uncracked hashes for hashlist 98
DEBUG: Updated hashlist 98 file and marked 5 agents for sync
Common Issues¶
Issue: Agents keep downloading same hashlist repeatedly
Cause: File regeneration producing different MD5 each time
Solution: Ensure consistent hash ordering in queries (ORDER BY hash_value)
Issue: Hashlist counters don't match file contents
Cause: Counter increments succeeded but file regeneration failed
Solution: Check backend logs for file write errors, verify disk space
Issue: Cross-hashlist updates slow
Cause: Many hashlists sharing same hashes
Solution: Normal behavior, consider separating unrelated hashlists
Best Practices¶
For Users¶
- Separate Unrelated Hashlists: Don't combine disparate hash sources if not needed
- Monitor Crack Rates: Expect brief spikes in file I/O during large crack batches
- Agent Connectivity: Keep agents connected for timely file updates
For Administrators¶
- Disk I/O Monitoring: Watch for I/O spikes during high-volume cracking
- Database Indexing: Ensure indexes on join tables are maintained
- Log Review: Periodically check for file regeneration failures
For Developers¶
- Transaction Boundaries: Always wrap counter updates and file operations together
- Error Handling: Log failures but continue processing other hashlists
- Atomic File Operations: Use temp files + rename for atomic updates
- Consistent Ordering: Always ORDER BY hash_value for deterministic output
Related Systems¶
- Crack Batching System: How cracks are collected and sent
- Job Update System: How jobs adapt to file changes
- File Sync: Agent file synchronization mechanism
Summary¶
Cross-hashlist crack synchronization ensures consistency across the entire KrakenHashes system. By automatically propagating cracks to all affected hashlists and regenerating files, the system eliminates redundant work while keeping all components synchronized. This architecture scales efficiently from small deployments to enterprise environments with thousands of hashlists.