Data Memes – Telegram

Data Memes

484 subscribers

574 photos

9 videos

2 files

63 links

All best data memes in one place!

https://surfalytics.com 🏄‍♀️

Download Telegram

About

Blog

Apps

Platform

484 subscribers

If you want to secure a long-lasting and successful career in data engineering, it's not enough to merely acquire proficiency in using a particular tool. Instead, you should strive to understand its inner workings at a fundamental level as well as the first principles that underly it.

"But how?" you may ask. "There are so many tools out there: Spark, Trino, BigQuery, Snowflake, etc."

That's certainly true. However, it might surprise you to discover that all of these systems share strikingly similar foundations.

For instance, they all depend on some variation of the MapReduce model to process data. They all require data shuffling between nodes for tasks like joining or grouping. They all rely on column-oriented data formats. They are all susceptible to issues such as skewed keys, the small object problem, uneven partitioning, and so on.

The key, naturally, lies in the details. What sets each tool apart is a unique set of trade-offs that its developers have chosen to make it particularly suited to address specific use cases. For instance, Trino terminates queries that exceed memory limits to prevent costly disk spills, as one of its primary objectives is low latency. Snowflake automatically handles data partitioning for a more user-friendly experience but relinquishes fine-grained control from end users. Spark offers maximum user control but may come across as a more complex tool, and so forth.

Nonetheless, if you dig into how these tools move data around you'll discover that they are not that different after all. Plus their functionalities continue to overlap and converge over time.

Therefore, my advice is to run an 'EXPLAIN' or equivalent command for every query you write and invest time in understanding the resulting output. Ensure you grasp how each part of your query maps to a specific stage within a physical plan. Use this knowledge to debug your queries. I can assure you that the expertise and experience acquired this way will be transferable to other similar tools or data warehouse vendors.

Individual tools may come and go at a rapid pace, but fundamental principles endure and change far less frequently.

Source https://www.linkedin.com/posts/izeigerman_dataengineering-activity-7110648980732080128-DKmn

🐳1

273 viewsedited 06:15

In Analytics you should always know what to measure and why, at least 2-3 metrics for your business domain.

PS avoid vanity metrics 🤨

Please open Telegram to view this post

VIEW IN TELEGRAM

🐳5❤1🔥1

386 views20:39

Best advice ever - wanna rise? Go get it with new company. Don’t listen corporate bullshit and your manager, take the action, your grow and income are on you.

🫡8💯3🤗2

301 views03:53

What does mean to be a tech lead?

Tech Lead’s Blueprint – What does it take to be a Tech Lead?

Alright, so you’re hear and you’re curious about how to become a tech lead? Or, you’re already a tech lead and you want to become “great” or “exceptional” …

288 views02:11

Ideal world where no low ball and everyone is honest upfront.

🔥2✍1🍾1

287 viewsedited 20:35

Anything stops you from success?!🏖

Please open Telegram to view this post

VIEW IN TELEGRAM

🍾1

279 views22:33

🐳6❤3🔥1

259 views01:40

Is university degree important for data jobs? Not at all. No one cares what degree you have. Skills are more important.

Today I talked with colleague, who paid 50k in 3rd tier US university for 1 year of Masters in Business Analytics + cost of living for 1 year. Overall 80k money waste. Yes she got the job and some skills but in what cost. With the right focus and content, she would "fake it and make it" in 4-5 months. Imagine degree for 2 years and 1st or 2nd tier university with cost of living🫨

💯5✍1

258 views04:18

🌟 Parquet:
Advantages: Columnar, compressed, schema evolution support!
Disadvantages: Not for write-heavy workloads.
Use Cases: Analytical querying & data warehousing.

🌟 Avro:
Advantages: Row-based, schema evolution, efficient serialization.
Disadvantages: Slower for analytical queries.
Use Cases: Data serialization & data interchange.

🌟 JSON:
Advantages: Human-readable & schema flexible.
Disadvantages: Inefficient storage.
Use Cases: Web data interchange & configuration.

🌟 DeltaLake:
Advantages: ACID Transactions, schema enforcement.
Disadvantages: Proprietary.
Use Cases: ACID transactions & schema enforcement in Data Lakes.

🚀 Tips for Maximizing Benefits in #Spark:
- Choosing Format: Select data format based on read-write patterns, query performance, and storage efficiency.

- Partitioning: Properly partition data to optimize read performance, especially for large datasets.

- Compression: Choose an appropriate compression codec considering the trade-off between storage space and CPU usage.

- Caching: Leverage Spark’s caching features for frequently accessed datasets.

- Schema Evolution: Design schemas thoughtfully to allow for evolution over time without causing data inconsistency or requiring expensive migrations.

❤‍🔥5❤1

312 views05:56

❤1

236 views16:17

https://youtu.be/sXpbONjV1Jc?si=wRAm-OO_QC9wOE7p

STOP STEALING DREAMS: Seth Godin at TEDxYouth@BFS

STOP STEALING DREAMS: On the future of education & what we can do about it.

Seth Godin is the author of 14 books that have been bestsellers around the world and have been translated into more than 35 languages. Permission Marketing was a New York Times bestseller…

⚡1❤1

255 views22:25

Only remote!

💯9❤1

275 views03:01

From rockyourdata.cloud: "We have awesome news! We launched education programs for Data Engineer, Data Analysts and BI engineers positions. We are going to utilize years of experience into our curriculum and help people move to data industry and land first job"

Rock Your Data is North America consulting company with focus on Cloud Analytics.

Please share https://www.linkedin.com/posts/rock-your-data_dataengineer-dataanalyst-biengineer-activity-7118664122300321792-mXWV

🔥8❤2

312 viewsedited 18:24

https://youtu.be/vol72ZnAQqI?si=hDwGOPb5Ai3uVV7C

Interviewing & Hiring Like a Boss • Kristine Howard • YOW! 2023

This presentation was recorded at YOW! Perth 2023. #GOTOcon #YOW
https://yowcon.com

Kristine Howard - AWS Head of Developer Relations APJ

RESOURCES
https://twitter.com/web_goddess
https://linkedin.com/in/kristinehoward
https://aus.social/@web_goddess
h…

332 views05:20

This media is not supported in your browser

VIEW IN TELEGRAM

Enjoy time at work🤙

🔥6❤‍🔥2🫡1

325 views05:03

This media is not supported in your browser

VIEW IN TELEGRAM

💯4🤗3🐳2

354 views22:58

https://www.linkedin.com/posts/dmitryanoshin_analytics-datacareer-surfalytics-activity-7122353315501244416-Ibex

🏄‍♂️ Dmitry Anoshin on LinkedIn: #analytics #datacareer #surfalytics #dataengineering

When someone asks me about the single best "business" book about analytics, I always think about Lean Analytics: Use Data to Build a Better Startup…

❤‍🔥2❤2

291 views20:52

Are you planning to move from analyst role to the data engineering role and don't know where to start? I bet for any questions out there, there is ideal book that exists and this question is not an exception.

The "The Missing README" is the best book for anyone looking for the foundational software engineering knowledge. Even you don't plan to work as a data engineer right now, you can still learn basic concepts and communicate effectively with backend engineer team.

Personally, this book has helped me tremendously in my career and I highly recommend it to anyone who are lacking Computer Science degree 🤗

Book link: https://lnkd.in/dQxNe3dm

⚡4🔥2❤‍🔥1

421 views18:03

This media is not supported in your browser

VIEW IN TELEGRAM

Club 500 is started https://surfalytics.com/pages/club500/

🍾7🤔3🫡2❤1

324 views17:54

Hello, I’ve started Discord for Surfalytics. Best way for collaboration, resume and job hunting progress updates and share your success with rest! Join us here: https://discord.gg/yEQkFerr

Discord - Group Chat That’s All Fun & Games

Discord is great for playing games and chilling with friends, or even building a worldwide community. Customize your own space to talk, play, and hang out.

🔥8❤1

286 views20:45

The success of your career often depends on your impact. While you might find yourself building numerous things—such as reports, pipelines, ingesting additional data sources, merging pull requests, or producing many lines of code—you may realize that these activities alone aren't propelling your career forward.

You're producing outputs and gauging your work by these outputs. However, output isn't synonymous with outcome. Your outputs might have limited business value and negligible impact. In essence, you're caught in the "building trap."

This is why I highly recommend the book "Escaping the Build Trap: How Effective Product Management Creates Real Value." This book will introduce you to the fundamentals of product management and the outcome-driven approach. It aims to help you avoid the building trap, create value for businesses, and focus on meaningful impacts that can genuinely advance your career.

P.S. Naturally, this will shine brightest when paired with a team and leadership that truly have their eyes on outcomes and can deftly distinguish between "output" and "outcome." Wouldn't that be refreshing?

Link to book: https://www.goodreads.com/book/show/42611483-escaping-the-build-trap

Escaping the Build Trap: How Effective Product Manageme…

To stay competitive in today’s market, organizations ne…

❤2🔥1

287 viewsedited 02:25