Agent Troubleshooting Guide¶
This guide helps diagnose and resolve common issues with KrakenHashes agents. Use this reference when agents fail to connect, register, sync files, detect hardware, or execute jobs.
Quick Diagnostic Commands¶
Before diving into specific issues, run these commands to gather diagnostic information:
# Check agent status
systemctl status krakenhashes-agent
# View recent agent logs
journalctl -u krakenhashes-agent -f --since "5 minutes ago"
# Check agent configuration
/path/to/krakenhashes-agent --version
cat ~/.krakenhashes/agent/.env
# Test connectivity to backend
curl -k https://your-backend:31337/api/health
# Verify certificate files
ls -la ~/.krakenhashes/agent/config/
openssl x509 -in ~/.krakenhashes/agent/config/client.crt -text -noout
Admin Diagnostics Panel¶
If you have admin access to the KrakenHashes web interface, the Diagnostics page (Admin Menu → Diagnostics) provides:
- Remote debug mode toggle for agents
- Remote log viewing without SSH access
- One-click log purging
- Downloadable diagnostic packages
This is often the fastest way to diagnose agent issues without direct machine access. See System Diagnostics for details.
Connection Issues¶
Agent Cannot Connect to Backend¶
Symptoms: - Agent logs show "failed to connect to WebSocket server" - Repeated connection retry attempts - Certificate verification errors
Common Causes:
-
Incorrect Backend URL Configuration
-
Certificate Issues
-
Network Firewall Blocking
Solutions:
-
Update Backend URL
-
Renew Certificates
-
Fix Network/Firewall
Connection Drops Frequently¶
Symptoms: - Agent connects but disconnects after short periods - WebSocket ping/pong timeouts - Frequent reconnection attempts
Causes and Solutions:
-
Network Instability
-
Backend Overload
-
Aggressive Firewall/NAT
Registration and Authentication Issues¶
Agent Registration Fails¶
Symptoms: - "Registration failed" errors - "Invalid claim code" messages - "Registration request failed" in logs
Common Causes:
- Invalid or Expired Claim Code
- Check admin panel for active vouchers
-
Generate new voucher if expired
-
Certificate Download Issues
-
Clock Synchronization Issues
Solutions:
- Get Valid Claim Code
- Access backend admin panel
- Go to Agent Management → Generate Voucher
-
Use the new claim code immediately
-
Manual Registration
Authentication Errors After Registration¶
Symptoms: - "Failed to load API key" errors - "Authentication failed" messages - Agent connected but backend rejects requests
Diagnostic Steps:
# Check credentials files
ls -la ~/.krakenhashes/agent/config/
cat ~/.krakenhashes/agent/config/agent.key
# Verify API key format (should be UUID)
grep -E '^[0-9a-f-]{36}:[0-9]+$' ~/.krakenhashes/agent/config/agent.key
Solutions:
-
Regenerate Credentials
# Remove existing credentials rm ~/.krakenhashes/agent/config/agent.key rm ~/.krakenhashes/agent/config/*.crt ~/.krakenhashes/agent/config/*.key # Re-register systemctl stop krakenhashes-agent /path/to/krakenhashes-agent --register --claim-code NEW_CLAIM_CODE --host your-backend:31337 systemctl start krakenhashes-agent -
Fix Permissions
Hardware Detection Issues¶
No Devices Detected¶
Symptoms: - Agent shows "0 devices detected" - Missing GPU information in admin panel - Hashcat fails to find OpenCL/CUDA devices
Diagnostic Steps:
# Check if hashcat binary exists
ls -la ~/.krakenhashes/agent/data/binaries/
# Manually test hashcat device detection
find ~/.krakenhashes/agent/data/binaries -name "hashcat*" -type f -executable | head -1 | xargs -I {} {} -I
# Check for GPU drivers
nvidia-smi # NVIDIA
rocm-smi # AMD
intel_gpu_top # Intel
lspci | grep -i vga # General
Common Solutions:
-
Install GPU Drivers
-
Install OpenCL Runtime
-
Fix Hashcat Binary Issues
Devices Detected But Not Usable¶
Symptoms: - Agent shows devices in hardware detection output - Devices appear in the admin panel - However, devices are not available for job execution - Jobs fail with "no devices available" errors
Common Cause:
Hashcat 7.x compatibility issues with older GPU drivers. Hashcat 7.x may detect devices but fail to initialize them for compute operations with certain driver versions.
Solutions:
- Use Hashcat 6.x Binary (Recommended)
- Navigate to your Agent Details page in the web UI
- Enable "Binary Override" toggle
- Select a Hashcat 6.x version (e.g., 6.2.6 or 6.2.5) from the dropdown
- Click "Save"
- The agent will automatically download the binary
-
Device detection will re-run with the 6.x binary
-
Update GPU Drivers
-
Verify Driver Compatibility
-
Check Agent Logs
Partial Device Detection¶
Symptoms: - Some GPUs detected, others missing - Device count mismatch - Specific GPU types not showing
Solutions:
-
Mixed GPU Environment
-
PCIe/Power Issues
File Synchronization Problems¶
Files Not Downloading¶
Symptoms: - Wordlists/rules not available for jobs - "File not found" errors during job execution - Sync requests timing out
Diagnostic Steps:
# Check data directories
ls -la ~/.krakenhashes/agent/data/
ls ~/.krakenhashes/agent/data/wordlists/
ls ~/.krakenhashes/agent/data/rules/
ls ~/.krakenhashes/agent/data/binaries/
# Test file download manually
curl -k -H "X-API-Key: YOUR_API_KEY" -H "X-Agent-ID: YOUR_AGENT_ID" \
https://your-backend:31337/api/agent/files/wordlists/rockyou.txt \
-o /tmp/test_download.txt
Common Solutions:
-
Fix Authentication
# Verify API key is valid grep -o '^[^:]*' ~/.krakenhashes/agent/config/agent.key | head -1 # Test API authentication API_KEY=$(grep -o '^[^:]*' ~/.krakenhashes/agent/config/agent.key | head -1) AGENT_ID=$(grep -o '[^:]*$' ~/.krakenhashes/agent/config/agent.key) curl -k -H "X-API-Key: $API_KEY" -H "X-Agent-ID: $AGENT_ID" \ https://your-backend:31337/api/agent/info -
Fix Directory Permissions
-
Clear Corrupted Downloads
Binary Extraction Failures¶
Symptoms: - Downloaded .7z files not extracted - Hashcat binary not executable - "No such file or directory" when running hashcat
Solutions:
-
Install 7-Zip Support
-
Fix Extraction Permissions
Job Execution Failures¶
Jobs Not Starting¶
Symptoms: - Tasks assigned but never start - Agent shows as idle despite task assignment - "No enabled devices" errors
Diagnostic Steps:
# Check agent task status
journalctl -u krakenhashes-agent | grep -i "task\|job" | tail -10
# Verify enabled devices in backend
# (Check admin panel Agent Details page)
# Test hashcat manually
HASHCAT=$(find ~/.krakenhashes/agent/data/binaries -name "hashcat*" -type f -executable | head -1)
$HASHCAT --help
Solutions:
- Enable Devices
- Go to backend Admin Panel
- Navigate to Agent Management
-
Select agent and enable required devices
-
Fix Hashcat Path
Jobs Crash or Stop Unexpectedly¶
Symptoms: - Jobs start but terminate quickly - "Process killed" messages - Hashcat segmentation faults
Diagnostic Steps:
# Check system resources
free -h
df -h ~/.krakenhashes/agent/data/
ps aux | grep hashcat
# Check for OOM kills
dmesg | grep -i "killed process\|out of memory" | tail -5
journalctl -f | grep -i "oom\|memory"
Solutions:
-
Resource Issues
-
Driver/Hardware Issues
Job Progress Not Reporting¶
Symptoms: - Jobs running but no progress updates - Backend shows tasks as "running" indefinitely - No crack notifications
Solutions:
-
Check WebSocket Connection
-
Restart Agent Connection
Agent Stability and Connection Issues¶
Agent Crashes During High-Volume Cracking¶
Symptoms: - Agent crashes when thousands of hashes crack rapidly - Panic errors related to closed channels - Connection drops during large password discoveries - "send on closed channel" errors
Root Cause: High-volume cracking (e.g., 4,000+ cracks in seconds) can overwhelm the WebSocket message system if not properly buffered.
Solutions Implemented (v1.2.1+):
The system now includes automatic protections:
- Crack Batching System
- Cracks are batched in 500ms windows or 10,000-crack groups
- Reduces message volume by 100x (8,000 messages → 80 messages)
-
See Crack Batching System for details
-
Increased Channel Buffers
-
Channel Monitoring
- Automatic warnings when buffer reaches 75% capacity
- Critical alerts at 90% capacity
- Graceful message dropping instead of crashes
Monitoring for Issues:
# Check for channel fullness warnings
journalctl -u krakenhashes-agent | grep -i "channel.*full\|fullness"
# Look for dropped messages (indicates overload)
journalctl -u krakenhashes-agent | grep -i "dropped message"
# Monitor batch sizes (should be 500-10000 cracks)
journalctl -u krakenhashes-agent | grep -i "flush.*batch"
Expected Log Messages (Normal Operation):
Warning Signs:
[WARNING] Outbound channel filling up (78.2%)
[ERROR] Outbound channel critically full (92.5%)
[ERROR] Dropped message - channel full (95.0%)
Recovery Actions:
If you see persistent channel fullness warnings:
-
Check Backend Performance
-
Monitor Network Bandwidth
-
Verify Database Performance
Double-Close Panic Prevention¶
Historical Issue (Fixed in v1.2.1): Agents could crash with "close of closed channel" panics during connection cleanup.
Symptoms (if using older version):
panic: close of closed channel
goroutine 123 [running]:
agent/internal/agent.(*AgentConnection).cleanup()
Solution: Update to v1.2.1+ which includes: - Mutex-protected channel closing - Close-once semantics with sync.Once - Graceful shutdown during connection cleanup
If Still Experiencing Issues:
# Verify agent version
/path/to/krakenhashes-agent --version
# Should show v1.2.1 or later
# If older, update agent binary
Channel Overflow Protection¶
How the System Protects You:
- Automatic Batching
- Individual cracks accumulated in memory
- Sent in bulk every 500ms or when 10k accumulated
-
Reduces network traffic and message count
-
Buffer Monitoring
- System tracks outbound channel capacity
- Warnings logged before critical levels reached
-
Allows proactive investigation
-
Graceful Degradation
- If channel is full, message is dropped (not crashed)
- Drop events are logged for investigation
- Agent remains operational
Performance Tuning:
For environments with extremely high crack rates:
# Increase channel buffer (requires agent rebuild)
# Edit agent/internal/agent/connection.go:
# outbound: make(chan []byte, 8192) // Double the default
# Or reduce batch window for more frequent smaller batches
# Edit agent/internal/jobs/hashcat_executor.go:
# crackBatchInterval: 250 * time.Millisecond // Half the window
⚠️ Warning: Custom tuning is rarely needed. The defaults handle >99% of scenarios including extremely high-volume cracking.
Agent Stuck State Recovery¶
Symptoms: - Agent shows as "busy" in the admin panel but has no running task - Agent completed a task but can't accept new work - Backend shows agent with current_task_id set but task is completed - Agent logs show "stuck in completing state" warnings
Root Cause (GH Issue #12): Prior to v1.3.1, a race condition could occur where: 1. Agent completes task and sends completion message 2. Message is lost or backend doesn't process it 3. Agent remains in "completing" state indefinitely 4. Backend still shows agent as busy
Automatic Recovery (v1.3.1+):
The system now includes multiple automatic recovery mechanisms:
- Completion ACK Protocol
- Backend acknowledges every task completion
- Agent waits up to 30 seconds for ACK (3 retries)
-
If no ACK received, marks completion as pending
-
Stuck Detection
- Agent monitors its own state every 30 seconds
- If stuck in "completing" state for > 2 minutes, forces recovery
-
Automatically transitions to idle and accepts new work
-
State Sync Protocol
- Backend requests state sync every 5 minutes
- Agent reports current state and any pending completions
- Backend resolves mismatches automatically
Manual Recovery (If Automatic Fails):
-
Restart the Agent
-
Check Agent State
-
Force Backend State Reset
Diagnostic Log Messages:
Normal operation:
[INFO] Task abc-123 completed, waiting for ACK
[INFO] ACK received for task abc-123
[INFO] Transitioning to idle state
ACK timeout (triggers retry):
[WARNING] No ACK received for task abc-123, retrying (attempt 2/3)
[INFO] Resending completion for task abc-123
Stuck detection triggered:
[WARNING] Stuck detection: Agent in COMPLETING state for 2m30s
[INFO] Force recovery initiated for task abc-123
[INFO] Marking completion as pending, transitioning to idle
State sync resolution:
[INFO] State sync requested by backend
[INFO] Reporting pending completion for task abc-123
[INFO] Backend confirmed task completion resolved
Task Completion ACK Troubleshooting¶
ACK Never Received:
Possible causes: - WebSocket connection dropped during completion - Backend crashed while processing - Network partition during ACK transmission
Diagnostic Steps:
# Check for WebSocket connection issues
journalctl -u krakenhashes-agent | grep -i "websocket\|connection.*closed\|disconnect"
# Check backend logs for processing errors
docker logs krakenhashes-backend | grep -i "task.*complete\|ack\|error"
# Verify network stability
ping -c 10 your-backend-host
Solutions: 1. Agent will auto-recover via stuck detection (2 min timeout) 2. Backend will resolve via state sync (5 min interval) 3. Restart agent for immediate recovery: systemctl restart krakenhashes-agent
Duplicate Completion Messages:
The system handles duplicates gracefully: - Backend caches completions for 1 hour - Duplicate messages receive ACK without reprocessing - No double-counting of cracks or keyspace
Monitoring:
# Check for duplicate completion handling
journalctl -u krakenhashes-agent | grep -i "duplicate\|already processed"
# Backend logs show cache hits
docker logs krakenhashes-backend | grep -i "completion.*cached\|already completed"
Connection Stability Best Practices¶
-
Monitor Logs Proactively
-
Network Quality
- Ensure stable, low-latency connection to backend
- Avoid Wi-Fi for production agents (use wired connections)
-
Monitor for packet loss:
mtr your-backend-host -
Backend Capacity
- Ensure backend can process batches quickly (<5 seconds)
- Monitor backend CPU/memory during high-volume jobs
-
Scale backend resources if consistent warnings appear
-
Update Regularly
- Keep agent binary up to date for latest stability fixes
- Review release notes for performance improvements
- Test updates in dev environment first
Performance Problems¶
Slow Hash Rates¶
Symptoms: - Lower than expected H/s rates - GPU underutilization - Benchmark speeds don't match job speeds
Solutions:
-
GPU Optimization
-
Cooling and Throttling
-
Hashcat Parameters
High System Load¶
Symptoms: - System becomes unresponsive - Other applications slow down - CPU usage constantly high
Solutions:
-
Limit Resource Usage
-
System Tuning
Error Message Reference¶
Common Error Patterns¶
| Error Message | Cause | Solution |
|---|---|---|
failed to connect to WebSocket server | Network/TLS issues | Check connectivity, renew certificates |
failed to load API key | Missing/corrupt credentials | Re-register agent |
registration failed | Invalid claim code | Generate new voucher |
failed to detect devices | Missing drivers/OpenCL | Install GPU drivers |
no enabled devices | Devices disabled in backend | Enable devices in admin panel |
file sync timeout | Network/authentication issues | Check API credentials |
hashcat not found | Missing/corrupt binary | Re-download binaries |
certificate verify failed | Expired/invalid certificates | Renew certificates |
connection refused | Backend not accessible | Check backend status |
permission denied | File/directory permissions | Fix ownership/permissions |
Debug Logging¶
Enable detailed logging for troubleshooting:
# Enable debug logging
echo "DEBUG=true" >> ~/.krakenhashes/agent/.env
systemctl restart krakenhashes-agent
# View detailed logs
journalctl -u krakenhashes-agent -f
# Disable debug logging after troubleshooting
sed -i '/DEBUG=true/d' ~/.krakenhashes/agent/.env
systemctl restart krakenhashes-agent
Recovery Procedures¶
Complete Agent Reset¶
When all else fails, completely reset the agent:
# Stop agent
systemctl stop krakenhashes-agent
# Backup current configuration
cp -r ~/.krakenhashes/agent ~/.krakenhashes/agent.backup.$(date +%Y%m%d)
# Remove all agent data
rm -rf ~/.krakenhashes/agent/
# Re-register with new claim code
/path/to/krakenhashes-agent --register --claim-code NEW_CLAIM_CODE --host your-backend:31337
# Start agent
systemctl start krakenhashes-agent
Emergency Job Cleanup¶
Force cleanup of stuck hashcat processes:
# Kill all hashcat processes
pkill -f hashcat
# Clean temporary files
find ~/.krakenhashes/agent/data/ -name "*.tmp" -delete
find ~/.krakenhashes/agent/data/ -name "*.restore" -delete
# Restart agent to reset job state
systemctl restart krakenhashes-agent
Certificate Recovery¶
Recover from certificate issues:
# Stop agent
systemctl stop krakenhashes-agent
# Download CA certificate manually
curl -k https://your-backend:31337/ca.crt -o ~/.krakenhashes/agent/config/ca.crt
# Use API key to renew client certificates
API_KEY=$(grep -o '^[^:]*' ~/.krakenhashes/agent/config/agent.key | head -1)
AGENT_ID=$(grep -o '[^:]*$' ~/.krakenhashes/agent/config/agent.key)
curl -k -X POST -H "X-API-Key: $API_KEY" -H "X-Agent-ID: $AGENT_ID" \
https://your-backend:31337/api/agent/renew-certificates
# Start agent
systemctl start krakenhashes-agent
When to Restart vs Reinstall¶
Restart Agent Service¶
- Connection drops
- Configuration changes
- Minor authentication issues
- After enabling/disabling devices
Restart System¶
- GPU driver updates
- System resource exhaustion
- Hardware changes
- Kernel updates
Reinstall Agent¶
- Corrupt binary files
- Persistent authentication failures after certificate renewal
- File system permission issues that can't be resolved
- Agent binary corruption
Complete Reset (Last Resort)¶
- Multiple interconnected issues
- System contamination from previous installations
- Unknown configuration corruption
- When restart and reinstall don't resolve issues
Use the diagnostic commands at the beginning of this guide to determine the appropriate recovery level.