Troubleshooting the provisioning

  • by Patrick Ogenstad
  • April 25, 2018

While things often feel better if you’re always positive and believe that things will work, in reality, it’s better to plan for failure. There are several things which can break down or go wrong in a ZTP solution.

Bricking the device

Ok, bricking is perhaps taking it too far, but what you want to avoid is saving an invalid configuration to flash making the device inaccessible. Once someone on site is forced to console into the device you end up wasting time. Up until now, we haven’t saved any of the configurations on any of the provisioned devices. What you want to avoid is applying the configuration to the wrong device and shutting down interfaces which you need, or assigning incorrect IP subnets.

Try to validate as much as possible after the device has received its initial configuration before writing the changes to disk.

During development

As you are creating the ZTP app make sure keep an eye on the output from the TFTP server and the RQ worker. If there’s something wrong in the code these are the places where you will be able to get a clear idea of what’s wrong.

Test things in small components. You don’t have to wait for your devices to boot up to see which configuration is sent from the TFTP server, it’s faster to test the TFTP server locally and request a file.

tftp 127.0.0.1
get og-sw-01__fa0_7.cfg

Timeouts and the inventory

In this tutorial we’ve been using a very simple inventory which loads almost instantly. If your inventory is hidden behind a web API and the TFTP server needs to collect data from that prior to serving a configuration file to the client it might take too much time, allowing for the TFTP request to timeout. It would be fully possible to move this around and auto-generate the configuration files from the inventory so that the TFTP server instead just reads static files that you have already created.

Test in a lab first

Even though everything should work and looks easy, make sure you test this in a lab environment first. Unfortunately, the behavior of devices can differ between versions and platforms. It’s much easier to know what to expect if you’ve tried a test provisioning first before sending out devices and having to console them through the laptop of an installation technician.