Learning Fundamentals of Data Engineering

Appendix B. Cloud Networking

Data engineers must understand cloud networking basics to design performant and cost-efficient systems.

Cloud networks impact latency, cost (especially due to data egress fees), and system architecture.

Key Concepts

Network Topology & Resource Hierarchy

Public clouds (AWS, GCP, Azure) follow similar structures: zones (smallest unit), regions (group of zones), and in GCP’s case, multiregions (group of regions).

Engineers must align data systems with this topology for high performance and resilience.

Data Egress Fees

Clouds allow free inbound traffic but charge for outbound traffic, especially across regions or to the internet.

This pricing model can create vendor lock-in and affect architecture choices.

Direct connections or CDNs can reduce costs.

Zones vs. Regions

GCP’s Premium Networking

Google offers premium-tier networking, where inter-region traffic stays on its private network, improving reliability and speed.

Direct Connect

Providers like AWS, Azure, and GCP offer direct network connections (e.g., AWS Direct Connect), lowering latency and significantly cutting egress costs—e.g., 9¢/GB to 2¢/GB.

CDNs (Content Delivery Networks)

CDNs like Cloudflare and cloud-native options cache data closer to users, improving delivery speed and reducing load on origin servers. However, their availability varies by region and political factors.

The Future of Data Egress

Data egress fees restrict cloud portability and multi-cloud adoption.

Competitive pressure and customer demand may push providers to reduce or eliminate egress fees in the near future, just as telecom pricing models evolved.

Takeaway

Cloud networking shapes system performance, resilience, and cost.

Data engineers must be aware of how their data moves within and across zones, regions, and providers—and should design architectures that balance latency, cost, and reliability while keeping an eye on evolving cloud pricing models.


🡐 Overview