Facebook (Meta) Production Network Engineer
July 2015 - June 2024 (9 years), Menlo Park, CA, USA
Easily the pinnacle of my career so far. Spent 9 years building and exercising my skills in Network Engineering, Software Development and Data Analytics working across multiple teams to help build and scale Meta’s Network Infrastructure.
- Built alarms and remediation to scale the automated management across Facebook’s Datacenter, Backbone and Edge Networks.
- Built the Drain and audit frameworks to ensure the safe removal and insertion of network devices in all devices roles across the production network.
- Built data pipelines to show issues with the convergence timing of our MPLS/RSVP network after fiber cut events.
- Ran incident response for issues across both network and software tools.
- Deployed and managed multiple Terragraph instances to operationalize and harden the product.
- Partnered with edge and backbone teams, built tooling and workflows to make their turnup and migration processes more consistent and reliable.
- Worked with partner Software Development Teams developing the workflow orchestration systems to help productionize their service. Add monitoring, find bottlenecks, and integrate with wider Facebook tooling to increase the responsiveness and reliability of their service.
- Built tooling to integrate different teams databases to detect physical cable faults and create the appropriate followup actions.
- Designed and implemented RFC4884 and RFC5837 on FBOSS network OS to retain IPv4 traceroute functionality across newer V6-only deployments.
- Built tooling to parse verbose FBOSS switch ASIC state logs, extracting millisecond-granular data on resource usage, convergence timing, and routing micro-loops. This reduced time to triage hard-to-root-cause incidents, aided qualification efforts, and helped identify bottlenecks to inform future network design roadmaps.
- Implemented action plan to bring on-call alarming down to acceptable levels.
- Ran multiple Datacenter related incident investigations and mitigations.
Independent Contractor
October 2014 - June 2015 (9 months), Sydney Australia
Work here was contracting for primarily two different companies. Cinenet Systems (now acquired by Superloop) and Rise.ph, a new Philippine ISP starting up.
- Deploying new services across a passive DWDM network across Sydney and Melbourne.
- Troubleshooting MPLS (VLL and MPLS) issues with existing customers.
- Build and provision backend systems (Chef, Radius & Bind) to build the infrastructure to support initial deployments.
- Design BGP Communities and policies to influence traffic through network and peers.
- Manage initial ASN and IP allocations through APNIC.
University of Wollongong Network Engineer
June 2012 - Sep 2014 (2 years 4 months), Wollongong, NSW, Australia
Worked primarily as a Network Engineer and Software Developer to keep the university campus and datacenter networks operating smoothly.
- Manage, design and implement upgrades to the multi-campus MPLS VPN core.
- Deployment of open source tools such as NetDisco and Rancid with custom scripting to improve change management processes.
- Implement new quota and proxy “Free Internet” deployment with BGP community based shaping rules to network appliances to satisfy business financial needs, along with Traffic Attribution tools for the business to understand usage and costs down to a per-subscriber breakdown.
- Design network migration and lab changes to migrate to newer Palo Alto based firewalls for allowing inter-vrf routing.
University of Wollongong Academic Tutor
March 2009 - June 2013, Wollongong, NSW, Australia
Roles were assisting students with lab tasks, providing class assistance and demonstrations to assist teach course material and provide to students of procedural programming, interacting systems, and systems administration.
UTBox Systems Engineer
January 2011 - June 2012 (1 year 6 months), Sydney, NSW, Australia
Upgraded the infrastructure to ensure everything, from the network stack to database and webservers, were highly available and managed through configuration management to ensure a failure would not result in a loss of revenue. Supported clients, fixed bugs, and developed new features for the product in the codebase.