Netdata and some free AI searches saved me a ton of resource usage on my desktop and server
I recently installed free Netdata in a container on my homelab server to see what it would show in terms of resource usage, bottlenecks, etc. When you start out of Netdata you get a 2-week trial business subscription which includes 10 free AI credits (for analysis reporting).
I let it run for 24 hours and then asked Netdata's AI to give me any key findings and recommendations. The key things it showed me pretty quickly, already blew me away. It's the old story of things are "working" but are far from optimal. Basically the issues I had were too many tasks firing off concurrently (or before others had finished) causing major bottlenecks (disk backlogs) on my different drives. I have three different backups, S.M.A.R.T. drive tasks running, Timeshift, RAID array scrubs, etc going.
I took the report from Netdata and fed it in as-is to Google Gemini (Perplexity has been leading me down very long rabbit holes the last few months) and asked what now. To cut a long story short, Gemini took me through various tests and recommendations around spacing all the tasks out far better, advising which should be daily, weekly, or monthly. It also suggested tweaking settings for the drives as well as the rsync jobs. For example when exporting to an external USB drive, it showed how to slow the rsync transfer down so that the drive was not choking, and neither was the server CPU. It also gave a nice summary table of how all the tasks were now spaced out over days and weeks.
I then decided to install Netdata on my desktop PC, and am glad I did. It boots quicker, and terminal screens open instantly (especially the Atuin history), etc. Again the issue being identified by Netdata was massive disk backlogs. It turns out my main /home data disk is 5.8 years old and has a 161ms response, where it should be 10x quicker. I need to replace this drive soon, but the optimisations suggested by Gemini, have now eased out a lot of the strain I was putting on this drive.
My Manjaro desktop configuration is a good 8 or 9 years old with tons of crud. I used to use VirtualBox for VMs but switched to KVM a while back, yet the old VirtualBox vboxnet0 network and kernel hooks were still in my system. I have a beautiful Conky window on my desktop, but I did not realise the amount of resources it was using through massive inefficiencies including firing off sudo smartctl every3 secs to check drive temperatures (polling the drive controller 28,800 times a day), if/then statements that each fire off the same query three times, using outdated network calls, etc. So Gemini helped optimise that dramatically by collapsing the queries and using memory caching instead, and reducing many checks to 30 secs or longer where the data does not change quickly. There was also rsync jobs that were made less intense, so that CPU was smoothed out more. Some old snapd stuff was also identified that was loading into memory, although I no longer used it, so that all got cleared out as well. I was using SAMBA shares with a Windows VM running in KVM, and it advised to ditch the SAMBA shares and rather use the faster Virtio-FS folder sharing, as well as the VirtIO network mode in KVM.
As Gemini pointed out initially, some events were coming together on my homelab server to create a perfect storm. My desktop PC is now booting up again in seconds, network acquisition is quicker, and with less intensive polling, my browsers are also more responsive.
I'm actually scaling back on my Grafana, Prometheus, Telegraf, InfluxDB stack on my server too. Netdata collects tons of data every second and seeing it is running, I'd rather try to optimise around that, seeing the information I get is a lot more useful. Netdat requires basically no configuration, unlike how Grafana, InfluxDB, Telegraf, and Prometheus must all work tother. There are some things that Grafana must still do, like pulling my Home Assistant stats through into graphs. The free Netdata tier only gives you 5 nodes in their cloud service, but you can view more locally if you host it yourself. Obviously after the trial period, I will also lose the AI credits.
Netdata is open source on the client agent (data collection) and "available" for the client dashboard. The cloud side, and the AI, is proprietary. I'll see how it goes on the free tier after 2 weeks, and what sort of reporting I can still export. But the benefit so far has made a dramatic difference, and will likely also ensure my hard drives have a longer and beneficial life.
Netdata running in a docker container on my homelab server is consuming 2.1% CPU and 327 MB of RAM. Disk space is now at 1.3 GB, so I'll need to keep an eye on that. There are retention sizes that can be set for each tier of data that is being stored (per second, per minute, and per hour tiers).
A tip on installing for Arch based systems. Netdata's script had all sorts of network permission issues on my PC. In the end I just did a plain pacman AUR package install and everything worked.
See
GitHub - netdata/netdata: The fastest path to AI-powered full stack observability, even for lean teams.
The fastest path to AI-powered full stack observability, even for lean teams. - netdata/netdata
#
technology #
optimisation #
dashboards