Characteristics of a Well Run Network

2018-01-09 networking operations

Design

Keep it simple. Just because you can, doesn’t mean you should
Complexity is not just a networking attribute, it’s an overall system attribute
Proper design leads to simplicity (most of the time)
What technologies are simple? How do you recognize complexity? Is vendor lock-in one indicator? Network management software / API lock-in another growing one?
There are some pretty simple campus/user, datacenter, Internet Edge, and WAN approaches Big organizations can handle and may need a bit more complexity — or not
Modularity- too many moving parts that have to work together = complex

Transparent Network. It just works
Able to easily implement changes
Agnostic to both today’s needs and flexible to absorb tomorrow’s needs
Up-to-date diagrams and documentation matter
Organized around OSI layers
Documented naming conventions with fixed fields
MTTR e.g. from NetMRI and some other tools
Up-to-date code
Common code across any given router or switch model Measurement
There is no way to tell how the network is performing currently as compared to previous performance unless data is collected at regular intervals
Ideally is done via more sophisticated tooling that not only gathers the data but also performs benchmarks and reports on anomalies
Network awareness among engineers
Interpersonal communication and team collaboration

Used to be sacrificed as a CAPEX or OPEX that was cost prohibitive because of the vast resources needed to provide test points outside, on the edges, of the network
All network designers and operators should expect/select OEMs that provide a virtual edition of their hardware
These software-only solutions should be used to re-create at least a subset if not all of the infrastructure. Then this virtual environment can be used to rehearse upcoming changes and their possible effects/outcomes.
This should also be used to verify that backups are not only occurring but are actually useful. Much like an application or database backup should be regularly tested, so should the network backups.

Wear a black concert T-shirt during all Maintenance Windows
Listen to Grunge music as IP truly came of money-making age during that era and the music soothes the network.
If in a datacenter, noise-cancelling headphones and conf call helps. Sep call/webex with 30 minute updates for execs (prevents constant second-guessing and distraction), and liaison on both the tech and exec call, jotting notes for periodic exec update
Have an Implementation Plan
Tabs in XLS, including contact info, pre-change configs, configlets to blow in, test plan, backout configlets. All SSH connections opened prior to change window.
Peer Review and Sign Off of Implementation Plan
Take Pre-Change Snapshots/Health Checks
Take Post-Change Snapshots/Health Checks
Insist upon user/app owner User Acceptance Testing to occur prior to and after completion of the change.
Use lab network
When selecting between OEM solutions you absolutely must conduct a Proof-of-Concept as reading of data sheets and white papers does not a sound decision make.
Regularly train the Ops team
User Acceptance Testing after changes are made