Troubleshooting

This page lists some general troubleshooting strategies and methods for Munin.

Check node agent

Is the munin-node process (daemon) running on the host you want to monitor?

Did you restart the munin-node process after you made changes to its configuration?

Check connectivity

The examples show a munin-node agent running on 127.0.0.1; replace it with your node address.

Note

You can use netcat to port 4949.

Using telnet was the previous recommended way as it was a fairly standard install. We don’t recommend it anymore since netcat is now almost as ubiquitous as telnet and it offers a real native TCP connection, whereas telnet does not. Note that using socat also works perfectly, but it is not as mainstream.

Does the munin-node agent allow connections from your munin master?

Here we try to connect manually to the munin-node that runs on the Munin master host. It can be reached via IP address 127.0.0.1 or hostname localhost and port 4949.

Output of a netcat session should be something like this:

# nc localhost 4949
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
# munin node at [your hostname]

Does the above output give the same hostname that should be expected upon configuration in munin.conf?

Note

If you have a fully qualified domain name (FQDN) in munin-node.conf, the host you’re monitoring has to identify itself with FQDN as well.

E.g. if the masters node tree has the following entry:

[foo.example.com]
  address foo.example.com

...then a netcat session to the node should give you the following output:

# munin node at foo.example.com

Note

If the connection test fails, check the allow directive in munin-node.conf and make sure any firewalls allow contact on destination port 4949.

Debugging Plugins

Which plugins are enabled on the node?

Does munin-node recognize any plugins? Try issuing the command list (being connected to the agent) and a (long) list of plugins should show.

# nc localhost 4949
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
# munin node at foo.example.com
list
open_inodes irqstats if_eth0 df uptime [...]

Check a particular plugin

Check on agent host

Note

All the commands here need to be run as user root. A common method of becoming root is via the sudo command, but refer to your local documentation for a more specific instruction.

Restart munin-node, as it only reads the plugin list upon start. (Good to test a plugin with munin-run, without enabling it right away.)

/etc/init.d/munin-node restart

Call munin-run on the monitored host to see whether the plugin runs through .

Try with and without the config plugin argument. Both runs should not emit any error message.

Note

You can also use the --debug flag, as it shows if the configuration file is correctly parsed, mostly for UID & environment variables.

Regular run:

# munin-run df
_dev_hda1.value 83

Config run:

# munin-run df config
graph_title Filesystem usage (in %)
graph_args --upper-limit 100 -l 0
graph_vlabel %
graph_category disk
graph_info This graph shows disk usage on the machine.
_dev_hda1.label /
_dev_hda1.info / (ext3) -> /dev/hda1
_dev_hda1.warning 92
_dev_hda1.critical 98

Check from Munin master

Does the plugin run through munin-node, with and without config?

Regular run:

# nc foo.example.com 4949
Trying foo.example.com...
Connected to foo.example.com.
Escape character is '^]'.
# munin node at foo.example.com
fetch df
_dev_hda1.value 83
[...]
.

With config:

# nc foo.example.com 4949
Trying foo.example.com...
Connected to foo.example.com.
Escape character is '^]'.
# munin node at foo.example.com
config df
graph_title Filesystem usage (in %)
graph_args --upper-limit 100 -l 0
graph_vlabel %
graph_category disk
graph_info This graph shows disk usage on the machine.
_dev_hda1.label /boot
_dev_hda1.info /boot (ext3) -> /dev/hda1
_dev_hda1.warning 92
_dev_hda1.critical 98
[...]
.

If the plugin works for munin-run but not through netcat, you might have a $PATH problem.

Note

Set {{{env.PATH}}} for the plugin in the plugin’s environment file.

Check Munin Master

Do the directories specified by dbdir, htmldir, logdir and rundir defined in munin.conf have the correct permissions? (If you first run munin as root, maybe they’re not readable/writeable by the user that runs the cron job)

Is munin-cron established as a cron controlled process, run as the Munin user?

Does the output when running munin-update as the Munin user on the server node show any errors?

Try running “munin-cron  --debug > /tmp/munin-cron.debug” and check the output file /tmp/munin-cron.debug.

Check data collection

This step will tell you whether munin-update (the master) is able to communicate with munin-node (the agent).

Run munin-update as user munin on the Munin master machine.

# su -s /bin/bash munin
$ /usr/share/munin/munin-update --debug --nofork --host foo.example.com --service df

You should get a line like this:

Aug 11 22:39:51 - [6846] Updating /var/lib/munin/example.com/foo.example.com-df-_dev_hda1-g.rrd with 57

After this, replace df with the service you want to check, such as hddtemp_smartctl.

If one of these steps does not work, something is probably wrong with the plugin or how munin-node talks to the plugin.

  1. Does the plugin run when executed directly? If it runs when executed as root and not through munin-run (as described above), the plugin has a permission problem. See this article on environment files.
  2. Does the plugin output contain too few, too many and/or illegal characters?
  3. Does Munin (munin-cron and its children) write values into RRD files? Hint: rrdtool fetch [rrd file] AVERAGE
  4. Does the plugin use legal field names? See Notes on Field names.
  5. In case you loan data from other graphs, check that the fieldname.type is set properly. See Munin file names for a quick reference on what any error messages in the logs might indicate.

Frequent Incidents

SELinux blocks Munin plugins

RRD files are filled with 0

although munin-node seems to show sane values.

  • The plugin’s output shows GAUGE values, but were declared as COUNTER or DERIVE in the plugin’s config.

Note

GAUGE is the default data type in Munin! Any other data type for a field must be explicitly declared.

RRD files are filled with NaN

although munin-node seems to show sane values.

  • Check that there are no invalid characters in the plugin’s output.
  • For new plugins let munin gather data for about 20 minutes and things will unwrinkle

munin-node won’t give any data

although it is configured properly.

  • Check that there is a .value directive for every of the plugin’s field names (yes, I managed to forget that recently).

munin-node only temporary returns valid data

  • Check that no race conditions occur. A typical race condition is updating a file with crontab while the plugin is trying to read the file.

The graphs are empty

  • The plugin’s output shows GAUGE values, but were declared as COUNTER or DERIVE in the plugin’s config. (GAUGE is default data type in Munin)
  • The files to be updated by Munin are owned by root or another user account
  • The local user browser cache may be corrupt, especially if “most” graphs are displayed correctly and “some” graphs are blank. In Firefox (or your browser of choice) go to tools and clear recent history, then check to see if the graphs are now properly displayed.

Other mumbo-jumbo

  • Run the different stages in munin-cron manually, using --debug, --nofork, something like this:
# su - munin -c "/usr/lib/munin/munin-update \
    --debug --nofork \
    --host foo.example.com \
    --service df"