Agent Debugging Guide¶
This guide covers debugging the KrakenHashes agent, including enabling debug mode, interpreting logs, and using development tools to diagnose issues.
Quick Start - Enable Debug Mode¶
The fastest way to enable debug logging:
# Method 1: Command line flag
./krakenhashes-agent --debug --host backend.example.com:31337
# Method 2: Environment variable
export DEBUG=true
export LOG_LEVEL=DEBUG
./krakenhashes-agent --host backend.example.com:31337
# Method 3: Edit .env file
echo "DEBUG=true" >> .env
echo "LOG_LEVEL=DEBUG" >> .env
./krakenhashes-agent --host backend.example.com:31337
Debug Configuration Options¶
Command Line Flags¶
The agent supports several debugging-related command line flags:
./krakenhashes-agent --help
--debug Enable debug logging (default: false)
--host string Backend server host (e.g., localhost:31337)
--tls Use TLS for secure communication (default: true)
--interface string Network interface to listen on (optional)
--heartbeat int Heartbeat interval in seconds (default: 5)
--config-dir string Configuration directory for certificates and credentials
--data-dir string Data directory for binaries, wordlists, rules, and hashlists
--hashcat-params string Extra parameters to pass to hashcat (e.g., '-O -w 3')
Environment Variables¶
Configure debugging through environment variables in .env file:
# Logging Configuration
DEBUG=true # Enable debug logging
LOG_LEVEL=DEBUG # Set minimum log level (DEBUG, INFO, WARNING, ERROR)
# Server Configuration
KH_HOST=backend.example.com # Backend hostname
KH_PORT=31337 # Backend port
USE_TLS=true # Use secure connections
# WebSocket Timing Configuration
KH_WRITE_WAIT=10s # WebSocket write timeout
KH_PONG_WAIT=60s # Server pong timeout
KH_PING_PERIOD=54s # Ping interval
# File Transfer Configuration
KH_MAX_CONCURRENT_DOWNLOADS=3 # Concurrent download limit
KH_DOWNLOAD_TIMEOUT=1h # Download timeout
KH_MAX_DOWNLOAD_RETRIES=3 # Download retry attempts
# Development Configuration
HEARTBEAT_INTERVAL=5 # Heartbeat frequency (seconds)
Log Levels¶
The agent supports four log levels in order of severity:
- DEBUG - Detailed diagnostic information
- INFO - General operational messages
- WARNING - Potential issues that don't stop operation
- ERROR - Serious problems that may cause failures
Set the minimum level with LOG_LEVEL environment variable:
# Show all messages
LOG_LEVEL=DEBUG
# Show info, warnings, and errors
LOG_LEVEL=INFO
# Show only warnings and errors
LOG_LEVEL=WARNING
# Show only errors
LOG_LEVEL=ERROR
Remote Debug Management¶
Starting with v1.4.0, administrators can remotely manage agent debug settings through the web interface without requiring direct access to the agent machine.
Admin-Controlled Debug Toggle¶
Administrators can enable/disable debug mode on agents remotely:
- Navigate to Admin Menu → Diagnostics
- Find the agent in the Agent Debug Management table
- Click the toggle switch to enable/disable debug mode
This sends a WebSocket message to the agent, which immediately:
- Enables/disables debug logging
- Starts/stops file logging if configured
- Begins/stops buffering log entries for remote retrieval
Remote Log Viewing¶
Administrators can view recent agent logs directly from the web interface:
- In the Diagnostics page, click View next to an agent
- Select the log level filter (ALL, DEBUG, INFO, WARNING, ERROR)
- Choose hours to retrieve (1-168 hours)
- Click Refresh to update
Log Availability
Remote log viewing requires:
- Agent to be connected via WebSocket
- Debug mode to be enabled (for DEBUG level logs)
- Agent to have disk space for file logging (if configured)
Remote Log Purging¶
Administrators can clear agent logs remotely:
- In the Diagnostics page, click Purge next to an agent
- Confirm the purge action
- Both buffered logs and file logs are cleared
Including Agent Logs in Diagnostics Package¶
When downloading a diagnostic package, check "Include Agent Logs" to collect logs from all connected agents. This uses WebSocket to request logs, so agents must be online during download.
For more details on the admin diagnostics interface, see the System Diagnostics documentation.
Debug Output Interpretation¶
Log Message Format¶
Debug messages follow this format:
Example:
[DEBUG] [2025-01-10 15:04:05.123] [/path/to/file.go:42] [package.Function] Connecting to backend server
Common Debug Messages¶
Startup and Configuration¶
[INFO] Debug logging initialized - Debug enabled: true
[INFO] Current working directory: /path/to/agent
[INFO] Loading agent configuration...
[INFO] Using config directory: /path/to/config
[INFO] Using data directory: /path/to/data
WebSocket Connection¶
[DEBUG] Starting WebSocket connection process
[INFO] Connection attempt 1 of 3
[DEBUG] WebSocket connected to wss://backend.example.com:31337/ws/agent
[INFO] Connection attempt 1 successful
Hardware Detection¶
[INFO] Detecting compute devices at startup...
[DEBUG] Found GPU: NVIDIA RTX 4090 (Device ID: 0)
[DEBUG] Found GPU: NVIDIA RTX 4080 (Device ID: 1)
[INFO] Successfully detected and sent device information to server
File Synchronization¶
[INFO] Initializing file sync with max downloads: 3, timeout: 1h0m0s, max retries: 3
[DEBUG] Scanning wordlists directory: /path/to/data/wordlists
[INFO] File sync: Found 15 local files, backend has 23 files
[DEBUG] Downloading missing file: rockyou.txt (14344391 bytes)
Job Execution¶
[INFO] Received job assignment for task: task_abc123
[DEBUG] Starting hashcat with command: hashcat -m 1000 -a 0 hashes.txt wordlist.txt
[DEBUG] Hashcat process started with PID: 12345
[INFO] Job progress: Task task_abc123, Keyspace 1000000, Hash rate 2500000 H/s
Task State Machine (v1.3.1+)¶
[DEBUG] State transition: Idle → Running (task: task_abc123)
[DEBUG] State transition: Running → Completing (task: task_abc123)
[INFO] Waiting for ACK from backend
[DEBUG] ACK received, timestamp: 2024-01-15T10:30:00Z
[DEBUG] State transition: Completing → Idle
Completion ACK Protocol¶
[INFO] Task task_abc123 completed, sending progress update
[DEBUG] Entering Completing state, starting ACK wait
[INFO] ACK wait attempt 1/3, timeout: 30s
[DEBUG] ACK received successfully
[INFO] Task completion acknowledged, transitioning to idle
Stuck Detection Recovery¶
[WARNING] Stuck detection check: state=Completing, duration=2m30s
[WARNING] Stuck timeout exceeded (2m), initiating recovery
[INFO] Force recovery: marking completion_pending for task_abc123
[DEBUG] State transition: Completing → Idle (forced)
[INFO] Agent recovered, now idle and available for work
Component-Specific Debugging¶
WebSocket Connection Debugging¶
Enable verbose WebSocket debugging:
# Add to .env file
DEBUG=true
LOG_LEVEL=DEBUG
# Monitor WebSocket messages
tail -f agent.log | grep -E "(WebSocket|WSMessage|connection)"
Common WebSocket issues and debugging:
# Connection timeout
[ERROR] Failed to create connection on attempt 1: dial tcp: i/o timeout
# Certificate issues
[ERROR] Failed to load CA certificate: certificate signed by unknown authority
# Authentication failures
[ERROR] WebSocket handshake failed: HTTP 401 Unauthorized
File Synchronization Debugging¶
Monitor file sync operations:
# Filter sync-related logs
tail -f agent.log | grep -E "(sync|download|FileInfo)"
# Debug specific file types
tail -f agent.log | grep -E "(wordlist|rule|binary)"
File sync debug messages:
[DEBUG] Calculating MD5 hash for file: /path/to/wordlist.txt
[INFO] File sync: Downloading wordlist: rockyou.txt (14MB)
[WARNING] Download retry 2/3 for file: large_wordlist.txt
[ERROR] Failed to download file after 3 attempts: connection timeout
Job Execution Debugging¶
Monitor hashcat job execution:
# Job-specific logs
tail -f agent.log | grep -E "(job|task|hashcat|progress)"
# Real-time hashcat output
tail -f agent.log | grep "hashcat_output"
Job debug messages:
[INFO] Starting hashcat executor with extra params: -O -w 3
[DEBUG] Hashcat working directory: /tmp/krakenhashes/task_abc123
[DEBUG] Hashcat stdout: Session..........: hashcat
[DEBUG] Hashcat stdout: Status...........: Running
[INFO] Job completed successfully, found 15 cracked hashes
Task State Machine Debugging (v1.3.1+)¶
The agent uses an explicit state machine to track task status. Debug state transitions:
# Monitor state transitions
tail -f agent.log | grep -E "(state|State|transition|Transition)"
# Monitor ACK protocol
tail -f agent.log | grep -E "(ACK|ack|waiting|acknowledge)"
# Monitor stuck detection
tail -f agent.log | grep -E "(stuck|Stuck|recovery|Recovery|pending)"
State Machine Debug Messages:
Normal task lifecycle:
[DEBUG] TaskStateManager: TransitionTo(Running, task_abc123)
[DEBUG] TaskStateManager: state=Running, taskID=task_abc123
[DEBUG] TaskStateManager: TransitionTo(Completing, task_abc123)
[DEBUG] ACK wait started for task_abc123
[DEBUG] ACK received: task_abc123, success=true
[DEBUG] TaskStateManager: TransitionToIdle()
ACK timeout and retry:
[WARNING] ACK wait timeout for task_abc123 (attempt 1/3)
[INFO] Retrying completion message for task_abc123
[DEBUG] Resending job_progress for task_abc123
[WARNING] ACK wait timeout for task_abc123 (attempt 2/3)
[WARNING] ACK wait timeout for task_abc123 (attempt 3/3)
[WARNING] ACK retries exhausted, setting completion_pending
[DEBUG] TaskStateManager: SetCompletionPending(task_abc123)
[DEBUG] TaskStateManager: TransitionToIdle()
Stuck detection triggered:
[DEBUG] Stuck detection check: state=Completing, taskID=task_abc123, duration=2m15s
[DEBUG] Stuck detection check: state=Completing, taskID=task_abc123, duration=2m45s
[WARNING] Stuck detection: duration 2m45s exceeds timeout 2m
[INFO] Initiating force recovery for task_abc123
[DEBUG] TaskStateManager: SetCompletionPending(task_abc123)
[DEBUG] TaskStateManager: TransitionToIdle() [forced]
[INFO] Force recovery complete, agent now idle
Querying Current State:
Add diagnostic logging by checking state:
Common State Issues:
| Issue | Log Pattern | Solution |
|---|---|---|
| Stuck in Completing | state=Completing for >2 min | Wait for auto-recovery or restart agent |
| ACK never received | Multiple ACK wait timeout | Check network/backend connectivity |
| Rapid state changes | Fast Idle→Running→Idle | Check for task assignment rejections |
| Completion pending | completion_pending=true | Backend resolves on next state sync |
Hardware Detection Debugging¶
Monitor GPU and hardware detection:
# Hardware detection logs
tail -f agent.log | grep -E "(hardware|GPU|device|monitor)"
# Device capabilities
tail -f agent.log | grep -E "(OpenCL|CUDA|compute)"
Hardware debug messages:
[DEBUG] Detecting NVIDIA GPUs using nvidia-ml-py
[INFO] Found NVIDIA GPU: GeForce RTX 4090 (12GB VRAM)
[DEBUG] GPU compute capability: 8.9
[WARNING] GPU temperature high: 85°C
Development Environment Setup¶
Building Debug Builds¶
Create debug builds with additional debugging information:
# Build with debug symbols (no optimization)
cd agent
make clean
# Set debug build flags
export GOFLAGS="-gcflags=-N -gcflags=-l"
make build
# Or build with race detection
go build -race -o debug-agent ./cmd/agent
Using Go Debugger (Delve)¶
Install and use Delve for interactive debugging:
# Install delve
go install github.com/go-delve/delve/cmd/dlv@latest
# Build and debug
cd agent
go build -gcflags="all=-N -l" -o debug-agent ./cmd/agent
# Start debugging session
dlv exec ./debug-agent -- --debug --host localhost:31337
# Common delve commands:
# (dlv) break main.main
# (dlv) continue
# (dlv) print cfg
# (dlv) step
# (dlv) next
Using pprof for Performance Profiling¶
Add profiling endpoints for performance analysis:
// Add to main.go for profiling (development only)
import _ "net/http/pprof"
// Start profiling server (development builds only)
go func() {
log.Println(http.ListenAndServe("localhost:6060", nil))
}()
Then profile the running agent:
# CPU profiling
go tool pprof http://localhost:6060/debug/pprof/profile
# Memory profiling
go tool pprof http://localhost:6060/debug/pprof/heap
# Goroutine profiling
go tool pprof http://localhost:6060/debug/pprof/goroutine
Memory and Goroutine Analysis¶
Monitor resource usage during development:
# Monitor memory usage
while true; do
ps -p $(pgrep krakenhashes-agent) -o pid,rss,vsz,pcpu,pmem,cmd
sleep 5
done
# Monitor goroutines (with pprof endpoint)
curl http://localhost:6060/debug/pprof/goroutine?debug=1
Common Debugging Scenarios¶
1. Agent Won't Connect to Backend¶
Symptoms: - Connection timeout errors - Authentication failures - Certificate errors
Debugging steps:
# Enable debug logging
export DEBUG=true
export LOG_LEVEL=DEBUG
# Test network connectivity
telnet backend.example.com 31337
curl -k https://backend.example.com:31337/health
# Check certificate issues
openssl s_client -connect backend.example.com:31337
# Verify API key and agent ID
cat config/agent_credentials.json
cat config/api_key.json
2. File Sync Issues¶
Symptoms: - Files not downloading - Constant re-downloading - MD5 hash mismatches
Debugging steps:
# Check file permissions
ls -la data/wordlists/
ls -la data/rules/
# Verify network connectivity for downloads
curl -I https://backend.example.com:31337/api/files/download/wordlist/1
# Check available disk space
df -h data/
# Manual MD5 verification
md5sum data/wordlists/rockyou.txt
3. Job Execution Problems¶
Symptoms: - Jobs not starting - Hashcat errors - No progress updates
Debugging steps:
# Check hashcat installation
which hashcat
hashcat --version
# Test hashcat manually
hashcat -m 1000 -a 3 --stdout ?d?d?d?d | head -10
# Check GPU availability
hashcat -I
# Monitor system resources
top -p $(pgrep hashcat)
nvidia-smi # For NVIDIA GPUs
4. High Memory Usage¶
Symptoms: - Agent consuming excessive RAM - System becoming slow - Out of memory errors
Debugging steps:
# Enable memory profiling
export DEBUG=true
go tool pprof http://localhost:6060/debug/pprof/heap
# Check for memory leaks
# Monitor over time with:
watch "ps -p $(pgrep krakenhashes-agent) -o pid,rss,vsz"
# Reduce concurrent operations
# In .env file:
KH_MAX_CONCURRENT_DOWNLOADS=1
HEARTBEAT_INTERVAL=10
Debugging Tools and Utilities¶
Log Analysis Scripts¶
Create helper scripts for log analysis:
#!/bin/bash
# debug-helper.sh
# Show only error messages
show_errors() {
grep "\[ERROR\]" agent.log | tail -20
}
# Show WebSocket connection events
show_websocket() {
grep -E "(WebSocket|connection|disconnect)" agent.log | tail -20
}
# Show file sync activity
show_sync() {
grep -E "(sync|download|upload)" agent.log | tail -20
}
# Show job execution
show_jobs() {
grep -E "(job|task|hashcat)" agent.log | tail -20
}
# Usage: ./debug-helper.sh show_errors
$1
Real-time Monitoring¶
Monitor agent activity in real-time:
# Multi-pane monitoring with tmux
tmux new-session -d -s agent-debug
# Pane 1: Agent output
tmux send-keys -t agent-debug "tail -f agent.log" Enter
# Pane 2: System resources
tmux split-window -v -t agent-debug
tmux send-keys -t agent-debug "htop" Enter
# Pane 3: Network connections
tmux split-window -h -t agent-debug
tmux send-keys -t agent-debug "watch 'netstat -an | grep :31337'" Enter
# Attach to session
tmux attach-session -t agent-debug
Configuration Validation¶
Validate agent configuration:
#!/bin/bash
# validate-config.sh
echo "=== Agent Configuration Validation ==="
# Check required directories
echo "Checking directories..."
[ -d "config" ] && echo "✅ config/" || echo "❌ config/ missing"
[ -d "data" ] && echo "✅ data/" || echo "❌ data/ missing"
# Check .env file
echo "Checking .env configuration..."
if [ -f ".env" ]; then
echo "✅ .env file exists"
grep -q "KH_HOST=" .env && echo "✅ KH_HOST set" || echo "❌ KH_HOST missing"
grep -q "DEBUG=" .env && echo "✅ DEBUG set" || echo "❌ DEBUG missing"
else
echo "❌ .env file missing"
fi
# Check certificates
echo "Checking certificates..."
[ -f "config/agent.crt" ] && echo "✅ Agent certificate" || echo "❌ Agent certificate missing"
[ -f "config/ca.crt" ] && echo "✅ CA certificate" || echo "❌ CA certificate missing"
# Check API key
[ -f "config/api_key.json" ] && echo "✅ API key" || echo "❌ API key missing"
echo "=== Validation Complete ==="
Automated Testing and Debugging¶
Unit Tests with Debug Output¶
Run tests with verbose output:
cd agent
# Run all tests with verbose output
go test -v ./...
# Run specific package tests
go test -v ./internal/config
go test -v ./internal/agent
go test -v ./pkg/debug
# Run tests with race detection
go test -race -v ./...
# Generate test coverage
go test -coverprofile=coverage.out ./...
go tool cover -html=coverage.out -o coverage.html
Integration Testing¶
Test agent integration with a local backend:
# Start local backend for testing
cd ../backend
docker-compose -f docker-compose.dev-local.yml up -d
# Test agent connection
cd ../agent
./krakenhashes-agent --debug --host localhost:31337
Performance Profiling¶
CPU Profiling¶
Profile CPU usage during job execution:
# Start agent with profiling
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/profile?seconds=30
# During heavy computation (hashcat jobs)
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=60
Memory Profiling¶
Identify memory usage patterns:
# Heap profiling
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/heap
# Allocation profiling
go tool pprof http://localhost:6060/debug/pprof/allocs
Goroutine Analysis¶
Monitor concurrent operations:
# Goroutine dump
curl http://localhost:6060/debug/pprof/goroutine?debug=1
# Interactive analysis
go tool pprof http://localhost:6060/debug/pprof/goroutine
Logging Best Practices¶
Custom Debug Messages¶
Add debug messages to your code:
import "github.com/ZerkerEOD/krakenhashes/agent/pkg/debug"
// Different log levels
debug.Debug("Detailed diagnostic: variable=%v", someVar)
debug.Info("Operation started: %s", operation)
debug.Warning("Potential issue detected: %s", issue)
debug.Error("Critical error: %v", err)
Structured Logging¶
Organize debug output by component:
// Component-specific logging
debug.Info("[CONNECTION] WebSocket connected to %s", url)
debug.Info("[SYNC] Downloaded file: %s (%d bytes)", filename, size)
debug.Info("[JOB] Task started: %s", taskID)
debug.Info("[HARDWARE] GPU detected: %s", gpuName)
Contributing and Bug Reports¶
Preparing Debug Information¶
When reporting bugs, include:
-
Agent version and build info:
-
Complete configuration:
-
Debug logs:
-
System information:
-
Network connectivity:
Debug Build for Development¶
Create debug builds for development:
# Clean build with debug symbols
cd agent
make clean
# Build with debug flags
go build -gcflags="all=-N -l" -ldflags="-X main.BuildMode=debug" -o debug-agent ./cmd/agent
# Run with additional debugging
./debug-agent --debug --host backend.example.com:31337
This debug build includes: - No compiler optimizations - Full symbol information - Additional runtime checks - Enhanced logging
Troubleshooting Quick Reference¶
| Issue | Debug Steps | Key Files |
|---|---|---|
| Won't start | Check .env, verify paths | .env, config/ |
| Connection fails | Test network, check certs | config/ca.crt, config/agent.crt |
| Auth errors | Verify API key and agent ID | config/api_key.json, config/agent_credentials.json |
| Files not syncing | Check permissions, disk space | data/wordlists/, data/rules/ |
| Jobs not running | Test hashcat, check GPU | System hashcat, nvidia-smi |
| High memory | Enable profiling, reduce concurrency | .env (set lower limits) |
| Slow performance | CPU/memory profiling | pprof endpoints |
Advanced Debugging Techniques¶
Custom Debug Builds¶
Build with custom debug flags:
# Build with additional debugging
go build -tags debug -gcflags="all=-N -l" ./cmd/agent
# Build with memory debugging
go build -gcflags="all=-m" ./cmd/agent
# Build with race detection (development only)
go build -race ./cmd/agent
Remote Debugging¶
Debug agent running on remote systems:
# On remote system
dlv exec ./krakenhashes-agent --listen=:2345 --headless=true --api-version=2 -- --debug
# From local system
dlv connect remote-host:2345
Container Debugging¶
Debug agent running in containers:
# Build debug container
docker build -f Dockerfile.debug -t agent-debug .
# Run with debug enabled
docker run -e DEBUG=true -e LOG_LEVEL=DEBUG agent-debug
# Attach debugger to container
docker exec -it <container_id> dlv attach <pid>
This comprehensive debugging guide should help developers and advanced users effectively debug agent issues, profile performance, and contribute to the project development.