DataEng

Воу-Воу! В AWS появилась Kafka как сервис. Новость об этом. Сам сервис можно пощупать здесь.

Amazon Web Services reveals a managed Kafka service for streaming data

LAS VEGAS – Yet another popular open-source project is now available as a managed service from Amazon Web Services with the addition of Amazon Managed Streaming for Kafka, announced Thursday… Read More

686 views11:00

DataEng

Лучшее введение в построение data pipelines, используя Apache Beam на Python — Hands on Apache Beam, building data pipelines in Python

Towards Data Science

Hands on Apache Beam, building data pipelines in Python

Apache Beam is an open-source SDK which allows you to build multiple data pipelines from batch or stream based integrations and run it in…

739 views07:30

DataEng

Вышел очередной выпуск Data Engineering Podcast. На этот раз гость программы Patrick Hunt, Tech Lead на проекте Apache Zookeeper. Разговор пойдёт о Zookeeper и его роли в построении распределённых систем: Apache Zookeeper As A Building Block For Distributed Systems

Data Engineering Podcast

Apache Zookeeper As A Building Block For Distributed Systems with Patrick Hunt - Episode 59

Distributed systems are complex to build and operate, and there are certain primitives that are common to a majority of them. Rather then re-implement the same capabilities every time, many projects build on top of Apache Zookeeper. In this episode Patrick…

714 views08:01

DataEng

Нашел на LinkedIn крутой пост про построение своего Data Warehouse на основе Open Source Software: Druid, Airflow и Superset BI: https://bit.ly/2EaCETX

Open-Source Data Warehousing – Druid, Apache Airflow & Superset

These days everyone talks about open-source, however still not common in the Data Warehouse (DWH) field. Why is this? In my recent blog, I researched OLAP technologies and what’s coming next, in this blog I choose one of the open-source technologies and build…

4.59K views11:00

DataEng

Годный твиттер тред про проблемы в распределенных систем: https://twitter.com/janl/status/1072442448893358081?s=20

Twitter

Jan Lehnardt

A thread about handling deletes in distributed systems. You'd think that deleting a piece of data would be straightforward. As long as you're talking about deleting something on a single computer, it's not that hard. But once you add a network, the fun begins.

665 views11:03

DataEng

Убер в прошлом году запустил в продакшен новую версию своей распределённой платёжной системы. Перед командой стояла цель — создать надёжную отказоустройчивую систему приёма платежей по всему миру для целого спектра продуктов комании: UberRide, UberEats, UberHealth, UberBusiness и тд. Что из этого получилось, смотрите в блоге комании.

Uber Engineering Blog

Engineering Uber’s Next-Gen Payments Platform

During a September 2018 meetup, Uber's Payments Platform team discusses how this technology supports our company's growth through an active-active architecture, exactly-once payment processing, and scalability across businesses.

625 views08:30

DataEng

Что необходимо знать разработчику о механизме хранения в базе данных? Узнайте в докладе Алекса Петрова: https://www.youtube.com/watch?v=V667vJzDvt4

YouTube

🚀 What Every Programmer Has to Know About Database Storage (Alex Petrov)

🗓️ Upcoming developer events: https://dev.events In the world of Big Data, it’s important to know how the Database Storage works in order to be able to pick a right tool right job. The talk covers evaluation techniques, to choose storage with best read, write…

777 views14:21

DataEng

В блоге Lyft появилась статья о том как компания использует Apache Airflow в своей работе: https://eng.lyft.com/running-apache-airflow-at-lyft-6e53bb8fccff

Medium

Running Apache Airflow At Lyft

By Tao Feng, Andrew Stahlman, and Junda Yang

1.06K viewsedited 06:01

DataEng

В сети появился блог, где, по словам автора, каждую среду будет появляться статья на тему основ распределённых систем в течение года: https://bit.ly/2ArN4fe

674 views09:55

DataEng

Apache Airflow стал полноценным членом Apache Software Foundation: https://blogs.apache.org/foundation/entry/the-apache-software-foundation-announces44

532 views13:59

DataEng

Does my Startup Data Team Need a Data Engineer?

Отличный пост, где автор рассуждает о роли Data Engineer в современных data организациях. Основная идея в том, что привычные ETL задачи легко автоматизируются благодаря сервисам типа Stitch, поэтому роль Data Engineer смещается в сторону построения data infrastructure со всеми вытекающими (reliability, data consistency, monitoring и т.д.), активной работе с командой аналитиков (data scientists, data analysts). Автор утверждает, что дата инженер это командный игрок роль которого всячески оказывать поддержку людям, формирующим выводы из данных.

А ещё мне понравилась фраза: data engineers don’t provide direct business value—their value comes in making your data analysts and scientists more productive.

Must read!

Fishtown Analytics

Does my Startup Data Team Need a Data Engineer?

The role of the data engineer in a startup data team is changing rapidly. Are you thinking about it the right way?

688 viewsedited 09:00

DataEng

Пару дней назад от программы Insight Data Engineering прошел вебинар на тему Transitioning to Data & DevOps Engineering. Его цель - познакомить начинающих Data/DevOps инженеров со сферой и помочь в неё плавно окунуться.

Помимо вебинара вам также может быть полезна их статья Preparing for the Transition to Data Engineering

YouTube

Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.

709 views15:39

DataEng

Внутреннее устройство PostgreSQL: https://www.interdb.jp/pg/index.html

588 views08:15

DataEng

На сайте baseDS тем временем вышли 2 новые статьи:

- Transparency: Illusions of a Single System (Part 1)
- Transparency: Illusions of a Single System (Part 2)

Medium

Transparency: Illusions of a Single System (Part 1)

Even though we might be new to distributed systems, by now we can see that, by definition, they involve many moving parts. Those moving…

616 views07:00

DataEng

Вебинары на тему RabbitMQ vs. Kafka:

Part I
Part II

YouTube

RabbitMQ vs Kafka - Jack Vanlightly x Erlang Solutions webinar

RabbitMQ vs Kafka

Messaging is at the core of many architectures and two giants in the messaging space are RabbitMQ and Apache Kafka. In this webinar we'll take a look at RabbitMQ and Kafka within the context of real-time event-driven architectures.
In this…

5.34K views06:32

DataEng

Классный доклад про DB Event Streaming на Qcon: https://www.infoq.com/presentations/wepay-database-streaming

InfoQ

The Whys and Hows of Database Streaming

Joy Gao talks about how database streaming is essential to WePay's infrastructure and the many functions that database streaming serves. She provides information on how the database streaming infrastructure was created & managed so that others can leverage…

516 views07:01

DataEng

Неплохое введение в распределённую БД FoundationDB от компании Apple: https://tech.marksblogg.com/minimalist-guide-tutorial-foundationdb.html

Marksblogg

A Minimalist Guide to FoundationDB

Benchmarks & Tips for Big Data, Hadoop, AWS, Google Cloud, PostgreSQL, Spark, Python & More...

520 views15:00

DataEng

История развития Apache Flink в блоге Alibaba Tech: https://medium.com/@alitech_2017/a-brief-history-of-flink-tracing-the-big-data-engines-open-source-development-87464fd19e0f

Medium

A Brief History of Flink: Tracing the Big Data Engine’s Open-source Development

From version 1.1.0 to 1.6.0, Apache Flink’s relentless improvement exemplifies open-source development.

559 views07:00

DataEng

Forwarded from DevBrain

Друзья, есть идея и наработки по записи небольшого курса про построение data pipelines на Luigi и введению в DataEng. Этот инструмент я активно использую, поэтому есть чем поделиться. Вопрос - купили бы Вы такой курс за 650 руб.?

Anonymous Poll

38%

Да!!!

11%

Конечно! Почему так дешево?!

51%

Ни за что!

281 voters74 views09:28

DataEng

На Udacity вышел Nano degree про Data Engineering: https://www.udacity.com/course/data-engineer-nanodegree--nd027, цена правда заоблачная - $999

Udacity

Data Engineering Training Course | Become a Data Engineer | Udacity