Verification
After installation, verify the agent is healthy:-
Check Pod Status:
-
Check Logs:
Look for “Graph published successfully”.
-
Check Metrics:
Port-forward the agent to access metrics:
Visit
http://localhost:8080/metrics.
Monitoring
Key metrics to monitor:dnstap_active_connections: Should be > 0 (indicates CoreDNS is connected).dnstap_frames_received_total: Should be increasing (indicates DNS traffic).graph_nodes_total: Should roughly match the number of resources in your cluster.publisher_errors_total: Should be 0.
Troubleshooting
Common Issues
1. DNSTap Not Connected
Symptoms:dnstap_active_connections is 0, no dependency edges in the graph.
Fix:
- Verify CoreDNS config has the correct Edge IP.
- Check if a NetworkPolicy is blocking traffic from
kube-systemtonofire-systemon port 6000. - Ensure CoreDNS was restarted after config change.
2. Publisher Errors
Symptoms: Logs show “Failed to publish graph”. Fix:- Check outbound internet access.
- Verify API Key is correct.
- Check if the NOFire AI endpoint is reachable (
curl -v https://api.nofire.ai/graph).
3. High Memory Usage
Symptoms: Pod OOMKilled. Fix:- Increase memory limit in Helm values.
- Reduce
graph.maxPruneAgeto keep the graph smaller.