FAANGgirl

Posts

Designing Data Intensive Applications - Ch 2 - Data Models and Query Languages

January 04, 2023

Relational Model Others that competed and did not last: network model hierarchical model XML database Object database NOSQL: specialized query options, expressive data model polyglot persistence impedence layer : the mismatch when moving from OO applications to relational databases Impedence is reduced by a translation layer like JSON JSON (document databases): flexible schema better locality : sub categories in one place instead of complex joins hard for many to many relations (easier in SQL) closer to data structure used by app layer schema on read instead of schema on write network model - tree structure like hierarchical but allowed multiple parents and so many-to-many - pointers and access paths instead of joins in SQL - complicated code even though efficient in small drives SQL - no access paths, just individual tables - access paths on the fly using query optimizer - can change indexes without changing table - conurrent - fault tolerant - shredding Graph Model Query Languages

Designing Data Intensive Applications - Ch 1 - Reliability, Scalability, Maintainability

January 04, 2023

Reliability Scalability Maintainability Data Engines as developers see it; database, ccaches, search index, stream processing, batch processing. How the data is distributed in disks encoding data Reliability - anticipate faults and can tolerate them. Faults - component failure. failure - system failure. hardware faults. software faults. human errors. telemetry using interfaces, decoupling, testing Scalability - Load parameters - request to web server - read or write to database - cache hit rate - active users could be average case or small number of extreme cases Twitter example : 4.6K to max 12k writes per user, but 300K reads per user. So work is done pushing writes to individual users caches at write time so read time can be faster. write times become a challenge when they involve so much leg work. still done within 5 seconds. Now twitter does a hybrid model where most tweets follow above approacch, but celebrity tweets are sent at read time. Performance : throughput response tim

Learning Journal

January 19, 2022

https://docs.google.com/document/d/11DCwE8qZ6wI9fruTsrTjsAxQWBqivJPGwgNvQ1wmjek/edit

Udacity Data Engineering - Data Modeling

December 22, 2021

< 2 years back - need to brush up> This was a good intro to relational modeling, data warehousing with fact and dimensions. using group sets and cubes for faster analytics, slicing, dicing, drill down and roll up. The project was a whole bunch of transformations converting a relational model to a data warehouse model. There were also a bunch of methods used to create schemas repetitively. When I brush up this course, I will update the learning here along with code snippets.

Udacity - Data Engineering - Intro

December 22, 2021

This is a challenging course I embarked on two years back and could not complete at that time. I'm back here in an attempt to improve my scope of skills, to improve my mental stamina and give myself something that can make me feel accomplished. This has 4 sections - Data Modeling - Cloud Data Warehousing - Spark and Data Lakes - Automating Data Pipelines There are 4 challenging projects to each of these sections and a final capstone project combining them. The technologies that span these courses are SQL, NOSQL, basic python, AWS, Spark, Airflow.

About me and Why read this blog?

December 22, 2021

About me : Hello there, I'm Ananya Jayakumar - 33 years old when I start this blog , 5 months into my pregnancy , 10 years experience in tech : Operations / Data analysis / ETL / dash-boarding / managing teams / Solving problems. Pregnancy - Slowing your career I just completed a personally fulfilling year onboarding a team and training them to perform, tackling fraud, building dashboards and creating automated ETL behind the dashboard. But since I'm going on maternity leave, I'm training folks on my job - making it easy to replace myself and jeopardizing a promo that I'd otherwise been well placed for and also having to coming back from maternity leave - and starting from scratch as if all the stuff I did before the leave does not count. It's also a time when my company is getting acquired, the business and products are changing and not the best time to be planning maternity. I feel like I have potential to be in a better place. Pregnancy - the baby's most i

Search This Blog

FAANGgirl

Posts

Udacity - Data Engineering - Cloud Data Warehousing

Designing Data Intensive Applications - Ch 2 - Data Models and Query Languages

Designing Data Intensive Applications - Ch 1 - Reliability, Scalability, Maintainability

Learning Journal

Udacity Data Engineering - Data Modeling

Udacity - Data Engineering - Intro

About me and Why read this blog?