Fundamentals Of Data Engineering
These are my notes from the book Fundamentals Of Data Engineering.
Although you can access the content through the github page, this is a served with mkdocs-material π
Why ? π€
This is an amazing book for everyone involved in data.
By the end of the book you'll be better equipped to:
- Understand how data engineering fits into roles like data scientist, analyst, or engineer
- Cut through hype to choose the right tools, architectures, and processes
- Design robust systems using the data engineering lifecycle
- Apply data engineering principles in your day-to-day work
- Solve data problems using a lifecycle-based framework
Which is a pretty good deal. π
I thought, I can share some of my highlights from it. If you want to discover more about any of the topics, please check out the book.
If youβre interested in the book, you can purchase one. It was previously available via Redpanda, but the free copy is no longer offered. Now, that link redirects to a guide, which is still useful.
The Structure π¨
The book consists of 3 parts, made up of 11 chapters and 2 appendices.
Here is the tree of the book.
And the following are my notes, following this structure.
So grateful that this book exists. Thanks to Joe Reis and Matt Housley.
Fundamentals of Data Engineering
βββ Part 1 β Foundation and Building Blocks
β βββ 1. Data Engineering Described
β βββ 2. The Data Engineering Lifecycle
β βββ 3. Designing Good Data Architecture
β βββ 4. Choosing Technologies Across the Data Engineering Lifecycle
βββ Part 2 β The Data Engineering Lifecycle in Depth
β βββ 5. Data Generation in Source Systems
β βββ 6. Storage
β βββ 7. Ingestion
β βββ 8. Orchestration
β βββ 9. Queries, Modeling, and Transformation
βββ Part 3 β Security, Privacy, and the Future of Data Engineering
βββ 10. Security and Privacy
βββ 11. The Future of Data Engineering
Contents
Part 1 - Foundation and Building Blocks
1. Data Engineering Described 2. The Data Engineering Lifecycle 3. Designing Good Data Architecture 4. Choosing Technologies Across the Data Engineering Lifecycle
Part 2 - The Data Engineering Lifecycle in Depth
5. Data Generation in Source Systems 6. Storage 7. Ingestion 8. Queries, Modeling, and Transformation 9. Serving Data for Analytics, Machine Learning, and Reverse ETL
Part 3 - Security, Privacy, and the Future of DE
10. Security and Privacy 11. The Future of Data Engineering
Appendices
Appendix A - Serialization and Compression Appendix B - Cloud Networking