Crack Batching System¶
Overview¶
The crack batching system is a critical performance optimization that prevents agent crashes and message flooding during high-volume password cracking operations. By batching cracked passwords before transmission and implementing a dual-channel architecture, the system achieves a 100x reduction in message volume while maintaining zero data loss.
The Problem¶
Prior to the crack batching implementation, agents faced severe issues during high-volume cracking:
Message Flood Scenario¶
- Volume: When thousands of hashes crack simultaneously (e.g., 4,000+ in seconds)
- Individual Messages: Each crack generated a separate WebSocket message
- Channel Overflow: 256-message outbound buffer would fill instantly
- Agent Crashes: Buffer overflow caused panic and agent disconnection
- Data Loss Risk: Cracked passwords could be lost during crashes
Real-World Impact¶
Before: 4,000 cracks = 8,000+ WebSocket messages
After: 4,000 cracks = 80 batched messages (100x reduction)
Architecture¶
Dual-Channel Message System¶
The system separates status updates from crack data transmission:
1. Status Channel (Synchronous)¶
Purpose: Real-time job progress without crack payload
Message Type: JobStatus
{
"task_id": "uuid",
"keyspace_processed": 1234567,
"progress_percent": 45.2,
"hash_rate": 1500000000,
"cracked_count": 150,
"device_metrics": [...],
"status": "running"
}
Characteristics: - Sent every progress update interval - Contains only crack count, not actual data - Lightweight for real-time UI updates - Includes all performance metrics
2. Crack Channel (Asynchronous)¶
Purpose: Batched transmission of actual cracked passwords
Message Type: CrackBatch
{
"task_id": "uuid",
"is_retransmit": false,
"cracked_hashes": [
{
"hash": "5f4dcc3b5aa765d61d8327deb882cf99",
"plaintext": "password",
"original_line": "user:5f4dcc3b5aa765d61d8327deb882cf99"
},
...
]
}
Characteristics: - Sent in batches (10,000 cracks or 500ms window) - Independent from progress updates - Optimized for bulk database operations - Preserves all crack metadata
Batching Parameters¶
| Parameter | Value | Purpose |
|---|---|---|
| Batch Window | 500ms | Time to accumulate cracks before flush |
| Buffer Size | 10,000 cracks | Maximum cracks per batch |
| Channel Buffer | 4,096 messages | Agent outbound message queue |
| WebSocket Max | 50MB | Maximum message size (backend) |
Batching Logic¶
graph TD
A[Hashcat Outputs Crack] --> B{Buffer Full?}
B -->|Yes 10k cracks| C[Flush Immediately]
B -->|No| D[Add to Buffer]
D --> E{Timer Expired?}
E -->|Yes 500ms| C
E -->|No| F[Continue Accumulating]
C --> G[Send CrackBatch Message]
G --> H[Reset Buffer & Timer]
F --> I[Wait for More Cracks]
I --> B Implementation Details¶
Agent-Side Processing¶
Crack Detection and Buffering¶
Location: agent/internal/jobs/hashcat_executor.go
- Outfile Monitoring
- Hashcat writes cracks to
--outfile - Agent monitors file with offset tracking
-
New lines detected via polling (every 100ms)
-
Deduplication
-
Batch Buffer Management
-
Timer-Based Flushing
Message Transmission¶
Connection Management: agent/internal/agent/connection.go
-
Increased Channel Buffer
-
Channel Monitoring
-
Graceful Drop Handling
Backend-Side Processing¶
Crack Batch Handler¶
Location: backend/internal/integration/job_websocket_integration.go
New Handler Function:
func (s *JobWebSocketIntegration) HandleCrackBatch(
ctx context.Context,
agentID int,
crackBatch *models.CrackBatch,
) error {
// Validate task exists and belongs to agent
task := s.jobTaskRepo.GetByID(ctx, crackBatch.TaskID)
// Process cracks in optimized batches
err := s.processCrackedHashes(ctx, crackBatch.TaskID, crackBatch.CrackedHashes)
return err
}
Optimized Bulk Processing¶
-
Single Bulk Lookup
-
Mini-Batch Transactions
-
Pre-Loaded Settings
// OLD: Query settings for every crack (N+1 problem) for _, crack := range crackedHashes { potfileEnabled := getSystemSetting("potfile_enabled") } // NEW: Load once before loop (includes client potfile settings) potfileEnabled := getSystemSetting("potfile_enabled") clientPotfilesEnabled := getSystemSetting("client_potfiles_enabled") hashlistExcludeGlobal := hashlist.ExcludeFromPotfile hashlistExcludeClient := hashlist.ExcludeFromClientPotfile for _, crack := range crackedHashes { // Use pre-loaded values for routing to global/client potfiles }
WebSocket Configuration¶
Location: backend/internal/handlers/websocket/handler.go
Capacity Calculation: - 10,000 cracks/batch - ~500 bytes/crack (with metadata) - ~5MB total per batch - 50MB max provides 10x safety margin
Performance Impact¶
Message Reduction¶
| Scenario | Before | After | Reduction |
|---|---|---|---|
| 4,000 cracks | 8,000+ messages | 80 messages | 100x |
| 100,000 cracks | 200,000+ messages | 100 messages | 2000x |
| 1M cracks | 2M+ messages | 100 messages | 20000x |
Network Efficiency¶
- WebSocket Frame Batching
- Multiple small messages → Frequent frame overhead
- Large batched messages → Amortized frame cost
-
Compression works better on larger payloads
-
Database Efficiency
- Individual inserts → N round trips
- Bulk operations → Single round trip
-
Transaction overhead reduced by 99%+
-
CPU Utilization
- Serialization overhead reduced
- Fewer context switches
- Better cache locality
Monitoring and Observability¶
Agent-Side Metrics¶
Channel Fullness Warnings:
[WARNING] Outbound channel filling up (78.2%)
[ERROR] Outbound channel critically full (92.5%)
[ERROR] Dropped message - channel full (95.0%)
Batch Flush Events:
[INFO] Crack batch buffer reached size limit for task abc-123 (10000 cracks), flushing immediately
[DEBUG] Flushing crack batch for task abc-123: 523 cracks
Backend-Side Metrics¶
Batch Processing Logs:
[INFO] Processing crack batch from agent 5: task=abc-123, crack_count=8472
[DEBUG] Bulk lookup found 8472 hashes in database
[INFO] Processed 8472 cracked hashes in 2.3 seconds
Performance Tracking:
-- Monitor batch processing times
SELECT
DATE_TRUNC('minute', timestamp) as minute,
COUNT(*) as batch_count,
AVG(crack_count) as avg_cracks_per_batch,
AVG(processing_time_ms) as avg_processing_ms
FROM crack_batch_metrics
WHERE timestamp > NOW() - INTERVAL '1 hour'
GROUP BY minute
ORDER BY minute DESC;
Error Handling and Recovery¶
Agent Crash Protection¶
- Outfile Offset Tracking
- Persistent offset prevents duplicate sends
- Recovery from agent restart
-
No crack data loss
-
Timer Cleanup
-
Buffer Cleanup
Backend Resilience¶
- Task Validation
- Verify task exists before processing
- Check agent ownership
-
Ignore orphaned batches
-
Transaction Safety
-
Graceful Degradation
- Individual crack failures don't stop batch
- Partial success logged and tracked
- Retry logic for transient errors
Interaction with Other Systems¶
Potfile Integration¶
Crack batches feed into the potfile staging system with client routing context:
// Stage entire batch at once - includes client context and exclusion flags
entries := []PotfileStagingEntry{
{
Password: "password123",
HashValue: "5f4dcc3b...",
ClientID: &clientUUID, // nil if hashlist has no client
ExcludeFromGlobal: false, // from hashlists.exclude_from_potfile
ExcludeFromClient: false, // from hashlists.exclude_from_client_potfile
},
// ...
}
potfileService.StageBatch(ctx, entries)
The background worker processes staged entries and routes them to the global potfile and/or the client's potfile based on the three-level cascade (System → Client → Hashlist). See Potfile Management for details on the cascade system.
Job Completion Detection¶
The AllHashesCracked flag is sent via status messages, not crack batches:
// Status message includes completion flag
status := JobStatus{
TaskID: taskID,
AllHashesCracked: true, // Detected from hashcat exit code 6
// ...
}
See Job Completion System for details.
Progress Tracking¶
Crack counts in status messages provide real-time feedback:
// UI shows count without waiting for batch
status.CrackedCount = 1523 // Updated immediately
// Actual crack data arrives asynchronously
Processing Status Integration¶
The crack batching system integrates with the processing status workflow to ensure jobs don't complete before all crack batches are received.
Workflow:
- Task Completes Execution:
- Agent sends final progress message with
Status="completed"andCrackedCountfield - Backend transitions task to
processingstatus -
expected_crack_countset from progress message -
Crack Batches Transmitted:
- Agent sends crack batches via
crack_batchWebSocket messages - Backend increments
received_crack_countfor each batch -
Batches processed and stored in database
-
Batch Completion Signal:
- Agent sends
crack_batches_completeWebSocket message - Backend sets
batches_complete_signaledto true -
Agent is free to accept new work
-
Task Completion Check:
- Backend checks:
received_crack_count >= expected_crack_count AND batches_complete_signaled == true - When conditions met: Task transitions from
processingtocompleted - Job completion check triggered
New WebSocket Message:
crack_batches_complete (Agent → Backend):
New Database Fields (job_tasks): - expected_crack_count (INTEGER): Expected cracks from final progress message - received_crack_count (INTEGER): Cracks received via batches - batches_complete_signaled (BOOLEAN): Agent signaled all batches sent
Backend Handler:
func (s *JobWebSocketIntegration) HandleCrackBatchesComplete(
ctx context.Context,
agentID int,
message *models.CrackBatchesComplete,
) error {
// Mark batches complete
err := s.jobTaskRepo.MarkBatchesComplete(ctx, message.TaskID)
// Check if task ready to complete
ready, err := s.jobTaskRepo.CheckTaskReadyToComplete(ctx, message.TaskID)
if ready {
// Complete the task
s.checkTaskCompletion(ctx, message.TaskID)
}
return nil
}
Repository Methods:
// JobTaskRepository
SetTaskProcessing(taskID, expectedCracks) // Transition to processing
IncrementReceivedCrackCount(taskID, count) // Track received batches
MarkBatchesComplete(taskID) // Signal batches done
CheckTaskReadyToComplete(taskID) // Verify completion conditions
Benefits: - ✅ Jobs don't complete prematurely - ✅ Completion emails have accurate crack counts - ✅ No race conditions between crack batches and job completion - ✅ Agent can accept new work immediately after signaling completion
See Job Completion System for full processing status workflow.
Outfile Acknowledgment Protocol¶
The Outfile Acknowledgment Protocol ensures reliable crack recovery when an agent reconnects after a disconnection. This prevents data loss from outfiles that weren't fully transmitted before the agent went offline.
Problem Solved¶
When an agent disconnects mid-job: - Outfiles may contain cracks that weren't transmitted - Agent restart could lose these cracks without proper recovery - Manual recovery is error-prone and time-consuming
Protocol Flow¶
sequenceDiagram
participant Agent
participant Backend
Note over Agent: Agent connects/reconnects
Agent->>Backend: pending_outfiles (list of outfile paths)
loop For each outfile
Backend->>Agent: request_crack_retransmit (task_id, outfile_path)
Agent->>Backend: crack_batch (is_retransmit=true)
Agent->>Backend: crack_batches_complete (is_retransmit=true)
alt Retransmit successful
Backend->>Agent: outfile_delete_approved (task_id, outfile_path)
Note over Agent: Agent deletes outfile
else Retransmit failed
Backend->>Agent: outfile_delete_rejected (task_id, reason)
Note over Agent: Agent retains outfile for retry
end
end WebSocket Messages¶
1. pending_outfiles (Agent → Backend)
Sent when agent connects with existing outfiles from previous sessions:
{
"type": "pending_outfiles",
"outfiles": [
{
"task_id": "uuid-1",
"outfile_path": "/data/agent/outfiles/task-uuid-1.out",
"line_count": 1523
},
{
"task_id": "uuid-2",
"outfile_path": "/data/agent/outfiles/task-uuid-2.out",
"line_count": 847
}
]
}
2. request_crack_retransmit (Backend → Agent)
Backend requests agent to retransmit cracks from a specific outfile:
{
"type": "request_crack_retransmit",
"task_id": "uuid-1",
"outfile_path": "/data/agent/outfiles/task-uuid-1.out"
}
3. outfile_delete_approved (Backend → Agent)
Backend confirms all cracks received, agent can delete the outfile:
{
"type": "outfile_delete_approved",
"task_id": "uuid-1",
"outfile_path": "/data/agent/outfiles/task-uuid-1.out"
}
4. outfile_delete_rejected (Backend → Agent)
Backend indicates retransmission incomplete, agent should retain outfile:
{
"type": "outfile_delete_rejected",
"task_id": "uuid-1",
"outfile_path": "/data/agent/outfiles/task-uuid-1.out",
"reason": "Line count mismatch: expected 1523, received 1400"
}
Retransmit Flag Behavior¶
When is_retransmit: true is set on crack_batch and crack_batches_complete messages:
- Duplicate Prevention: Backend uses the flag to identify retransmitted cracks
- Idempotent Processing: Cracks are matched against existing records
- Count Tracking:
retransmit_countincremented injob_taskstable - Timestamp Recording:
last_retransmit_atupdated for monitoring
// Backend handling of retransmit batches
if crackBatch.IsRetransmit {
// Increment retransmit tracking
repo.IncrementRetransmitCount(ctx, taskID)
// Use idempotent upsert for crack processing
// Duplicate cracks are silently ignored
}
Safety Checks¶
Line Count Verification: - Agent reports line_count in pending_outfiles - Backend tracks received crack count - Mismatch triggers outfile_delete_rejected
Race Condition Prevention: - Backend validates task ownership before processing - Retransmits only processed for tasks in processing or completed state - Concurrent retransmits from same outfile are serialized
O(1) Hashlist Lookup: - Backend pre-loads hashlist into memory map for fast crack matching - Prevents O(n) scans during large retransmissions - Map keyed by hash value for instant lookup
// Optimized hashlist lookup during crack processing
hashlistMap := make(map[string]*models.Hash)
for _, hash := range hashlist.Hashes {
hashlistMap[hash.HashValue] = hash
}
// O(1) lookup per crack
if existingHash, ok := hashlistMap[crack.Hash]; ok {
// Process crack
}
Database Fields for Retransmit Tracking¶
Added to job_tasks table: - retransmit_count (INTEGER, default 0): Number of retransmission attempts - last_retransmit_at (TIMESTAMP): When last retransmission was requested
These fields help identify problematic agents or network issues that cause frequent retransmissions.
Benefits¶
- ✅ Zero data loss during agent disconnections
- ✅ Automatic recovery without manual intervention
- ✅ Idempotent processing prevents duplicate cracks
- ✅ Verification ensures complete retransmission
- ✅ Agent cleanup only after backend confirmation
Configuration and Tuning¶
Default Settings¶
The default configuration is optimized for most deployments:
| Setting | Default | Rationale |
|---|---|---|
| Batch window | 500ms | Balance latency vs efficiency |
| Buffer size | 10,000 | Safe under 50MB message limit |
| Channel buffer | 4,096 | Handles burst traffic |
Custom Tuning¶
For specialized environments, adjust parameters:
// Low-latency environment (faster batches)
crackBatchInterval = 100 * time.Millisecond
// High-volume environment (larger batches)
crackBatchBufferSize = 50000 // Requires testing max message size
// Constrained memory (smaller buffers)
channelBufferSize = 1024
⚠️ Warning: Custom tuning requires thorough testing to prevent message drops or memory exhaustion.
Best Practices¶
For Administrators¶
- Monitor Channel Fullness
- Set up alerting for 75%+ warnings
- Investigate persistent high fullness
-
Check network bandwidth limitations
-
Watch Batch Sizes
- Typical batches: 500-2000 cracks
- Large batches (5k+): High-volume cracking (normal)
-
Max batches (10k): Possible tuning needed
-
Database Performance
- Monitor crack processing time
- Should be <5 seconds for 10k batch
- Longer times indicate DB bottleneck
For Developers¶
- Preserve Deduplication
- Always check
OutfileSentHashesbefore adding - Use consistent key format:
"hash:plaintext" -
Clear map on task completion
-
Respect Buffer Limits
- Never skip flush when buffer is full
- Always flush remaining cracks on cleanup
-
Handle timer cleanup properly
-
Error Handling
- Log but don't crash on send failures
- Track dropped messages for debugging
- Implement retry logic carefully
Testing and Validation¶
Test Scenarios¶
-
High-Volume Cracking
-
Agent Crash Recovery
-
Network Disruption
Success Criteria¶
- ✅ No agent crashes during high-volume cracking
- ✅ Message volume reduced by >90%
- ✅ Zero crack data loss
- ✅ Channel fullness stays <50% under normal load
- ✅ Batch processing time <5 seconds for 10k cracks
Troubleshooting¶
Channel Fullness Warnings¶
Symptom: Regular warnings about channel filling up
Causes: - Backend processing too slow - Network bandwidth insufficient - Database bottleneck
Solutions: 1. Check backend logs for slow crack processing 2. Monitor database query times 3. Verify network bandwidth 4. Consider reducing batch size
Missing Cracks¶
Symptom: Crack count doesn't match expected
Diagnostic:
# Check agent logs for drops
grep -i "dropped message" agent.log
# Check outfile for all cracks
wc -l /path/to/hashcat.outfile
# Compare with database count
SELECT COUNT(*) FROM hashes WHERE is_cracked = true AND hashlist_id = ?
Common Causes: - Deduplication working correctly (not an issue) - Channel overflow (check fullness warnings) - Database transaction failures (check backend logs)
Large Batch Performance¶
Symptom: Slow processing of large batches
Solutions: 1. Increase database connection pool 2. Add database indexes on hash_value 3. Optimize bulk update queries 4. Consider reducing batch size
Future Enhancements¶
Potential improvements under consideration:
- Adaptive Batching
- Adjust window based on crack rate
- Smaller batches for low volume
-
Larger batches for sustained high volume
-
Priority Queuing
- Critical messages bypass batching
- Completion notifications sent immediately
-
Error reports prioritized
-
Compression
- Compress batch payloads
- Reduce network bandwidth
-
Trade CPU for I/O savings
-
Persistent Queuing
- Disk-backed buffer for extreme volumes
- Survive agent crashes
- Replay on reconnection
Related Documentation¶
- Job Completion System - Hashlist completion detection
- Potfile Management - Crack storage and reuse
- Agent Troubleshooting - Connection and stability issues
- System Monitoring - Performance tracking