Datadog vs Graylog vs ELK
Hi all,
At my company, we use Datagog for metrics and Graylog for log management (and generally also derived metrics). Both Datadog and Graylog are getting older and we are thinking about upgrades.
All of our infra is on AWS in a bunch of accounts, what we need is to have monitoring with dashboards with metrics and alerting (very important). We need basic stuff - CPU/mem/disk/traffic/io/HTTP graphs. Our Graylog ingests around 5GB of logs a day.
Now, I was reading more about all 3 apps (Datadog, Graylog, ELK) and I can't make up my mind (not enough live experience), what we should do (worked closer only with ES, but not for monitoring):
\- use only Datadog and skip Graylog (I understand that Datadog can ingest logs as well)
\- use only Graylog (but I don't see it being able to show basic OS metrics)
\- use ELK with +beats for metrics
\- use AWS OpenSearch as above (but know it's a fork from an older version)
\- stick with just CloudWatch?
​
Or maybe there is some other system (Grafana/Loki, Splunk ....) that you would recommend?
Most important requirements:
\- metrics
\- logs management
\- alerting (setting custom thresholds)
\- the cheaper - the better, but the price is not an important factor
\- nice to have - good integration with AWS
\- nice to have - multi-user / fine-grained ACLs
\- nice to have - integration with Github and Atlassian stack
\- nice to have - easy deployment (Docker preferred)
​
Thanks in advance for any comments/recommendations :)
https://redd.it/10ntvs1
@r_devops
Hi all,
At my company, we use Datagog for metrics and Graylog for log management (and generally also derived metrics). Both Datadog and Graylog are getting older and we are thinking about upgrades.
All of our infra is on AWS in a bunch of accounts, what we need is to have monitoring with dashboards with metrics and alerting (very important). We need basic stuff - CPU/mem/disk/traffic/io/HTTP graphs. Our Graylog ingests around 5GB of logs a day.
Now, I was reading more about all 3 apps (Datadog, Graylog, ELK) and I can't make up my mind (not enough live experience), what we should do (worked closer only with ES, but not for monitoring):
\- use only Datadog and skip Graylog (I understand that Datadog can ingest logs as well)
\- use only Graylog (but I don't see it being able to show basic OS metrics)
\- use ELK with +beats for metrics
\- use AWS OpenSearch as above (but know it's a fork from an older version)
\- stick with just CloudWatch?
​
Or maybe there is some other system (Grafana/Loki, Splunk ....) that you would recommend?
Most important requirements:
\- metrics
\- logs management
\- alerting (setting custom thresholds)
\- the cheaper - the better, but the price is not an important factor
\- nice to have - good integration with AWS
\- nice to have - multi-user / fine-grained ACLs
\- nice to have - integration with Github and Atlassian stack
\- nice to have - easy deployment (Docker preferred)
​
Thanks in advance for any comments/recommendations :)
https://redd.it/10ntvs1
@r_devops
Reddit
r/devops - Datadog vs Graylog vs ELK
Posted in the devops community.
Simple PAAS open source that run on docker or docker-compose insted of openshift
Hi in work environment we want to make a PAAS platform but openshift if heavy to run and we want a light alternative and in next time migrate to Kubernetes PAAS platform any return of production ready that you can use ?
https://redd.it/10nkqgu
@r_devops
Hi in work environment we want to make a PAAS platform but openshift if heavy to run and we want a light alternative and in next time migrate to Kubernetes PAAS platform any return of production ready that you can use ?
https://redd.it/10nkqgu
@r_devops
Reddit
r/devops - Simple PAAS open source that run on docker or docker-compose insted of openshift
4 votes and 4 comments so far on Reddit
Tool for app testing?!
I don't know if this is the right subreddit for this question but I have a network where I have many systems that are directly unsupported for the application our organization is working on and what to set up an in-network server for the testing application.
I've heard about Citrix and Winflactor for my use case but both cost a ton and Citrix even has a complicated learning curve.
I'm looking for an easy-to-use solution, preferred open source software, that works on Windows and Mac OS.
https://redd.it/10mq20v
@r_devops
I don't know if this is the right subreddit for this question but I have a network where I have many systems that are directly unsupported for the application our organization is working on and what to set up an in-network server for the testing application.
I've heard about Citrix and Winflactor for my use case but both cost a ton and Citrix even has a complicated learning curve.
I'm looking for an easy-to-use solution, preferred open source software, that works on Windows and Mac OS.
https://redd.it/10mq20v
@r_devops
Reddit
r/devops on Reddit
Tool for app testing?!
How about a tool convert form .tf to png/design ?
I like Terraform, but I always want to know what my terraform code looks like in a design graph.
​
I want to know how my Terraform code is represented in the Cloud service:
How resources are grouped under the network
How resources are grouped under a region
How resources are grouped under a resource-group
How resources cost
Verify if our initial design is as same as the design terraform code generate
Etc.
​
If someone makes a tool for that, will you use it and pay for it?
https://redd.it/10o16a5
@r_devops
I like Terraform, but I always want to know what my terraform code looks like in a design graph.
​
I want to know how my Terraform code is represented in the Cloud service:
How resources are grouped under the network
How resources are grouped under a region
How resources are grouped under a resource-group
How resources cost
Verify if our initial design is as same as the design terraform code generate
Etc.
​
If someone makes a tool for that, will you use it and pay for it?
https://redd.it/10o16a5
@r_devops
Reddit
r/devops - How about a tool convert form .tf to png/design ?
Posted in the devops community.
(GITOPS) Progressive Delivery Tools and Rollbacks to Git ?
Hello,
For GitOps - Git is the source of truth. Let's imagine a situation when deployment fails and the application is rolled back to the previous configuration and version.
Should state in Git reflect that ?
What we do if we want to revert something in Git ? We create a revert commit. Then the state in git reflects reality that this change does not work and thus was reverted.
Do any of the GitOps tools are aware of that ?
An ideal GitOps approach pipeline would be:
- I commit a change to repository (change of state in Git)
- Tool pick-ups the change and compares it with the state on the Cluster
- Tool applies the state on the cluster - but the metric says the change causes a failed state
- Tool creates a rollback to the previous version
- Tool notifies something that was tracking Git that this change does not work
- This something else prepares a PR with revert to Git with explanation that this version failed due to X
- It's up to human to merge the PR thus restoring the Git state to again reflect what is truth
Thoughts ? Ideas ?
Seems like we have a gap in currently available tooling.
https://redd.it/10o4ux0
@r_devops
Hello,
For GitOps - Git is the source of truth. Let's imagine a situation when deployment fails and the application is rolled back to the previous configuration and version.
Should state in Git reflect that ?
What we do if we want to revert something in Git ? We create a revert commit. Then the state in git reflects reality that this change does not work and thus was reverted.
Do any of the GitOps tools are aware of that ?
An ideal GitOps approach pipeline would be:
- I commit a change to repository (change of state in Git)
- Tool pick-ups the change and compares it with the state on the Cluster
- Tool applies the state on the cluster - but the metric says the change causes a failed state
- Tool creates a rollback to the previous version
- Tool notifies something that was tracking Git that this change does not work
- This something else prepares a PR with revert to Git with explanation that this version failed due to X
- It's up to human to merge the PR thus restoring the Git state to again reflect what is truth
Thoughts ? Ideas ?
Seems like we have a gap in currently available tooling.
https://redd.it/10o4ux0
@r_devops
Reddit
r/devops - (GITOPS) Progressive Delivery Tools and Rollbacks to Git ?
Posted in the devops community.
How to automate baremetal migration
My team wants me to setup automation for the an application which is in baremetal/physical server, since this migration happens every year.
Bellow are the steps i need to automate. How would you do it? What tools would you use?
1. Procure new baremetals servers
2. Get IPs allocated
3. Get VIPs from network team
4. Open firewall rules - ticket to network team
5. Configure tomcat and install application
6. Configure database
7. Setup monitoring
8. Configure gslbs to point to new Load balancers
https://redd.it/10o3al5
@r_devops
My team wants me to setup automation for the an application which is in baremetal/physical server, since this migration happens every year.
Bellow are the steps i need to automate. How would you do it? What tools would you use?
1. Procure new baremetals servers
2. Get IPs allocated
3. Get VIPs from network team
4. Open firewall rules - ticket to network team
5. Configure tomcat and install application
6. Configure database
7. Setup monitoring
8. Configure gslbs to point to new Load balancers
https://redd.it/10o3al5
@r_devops
Reddit
r/devops - How to automate baremetal migration
2 votes and 13 comments so far on Reddit
Which CD solution would you use - if you had to start fresh?
If you were tasked to build a new K8s environment from scratch, what would you use for CD?
Considerations:
\- Minimal set-up time
\- Easy rollback
\- Cloud agnostic
\- Canary deployments
This is only part of the picture of course - if you chose one of these CDs, can you share what the rest of your set-up looks like?
View Poll
https://redd.it/10o6i2m
@r_devops
If you were tasked to build a new K8s environment from scratch, what would you use for CD?
Considerations:
\- Minimal set-up time
\- Easy rollback
\- Cloud agnostic
\- Canary deployments
This is only part of the picture of course - if you chose one of these CDs, can you share what the rest of your set-up looks like?
View Poll
https://redd.it/10o6i2m
@r_devops
Reddit
Which CD solution would you use - if you had to start fresh?
5 votes and 15 comments so far on Reddit
Whats the best practice on using a package your distro version doesn't support?
I am on Ubuntu 22.04, (Pop OS) which still doesn't support MongoDB 6.0. Some people have a suggested tinkering with the repo .list, but it seems kind of off.
If I were in an organization with a tigher security protocol, in what way would I develop locally with mongodb? I thought about running mongodb in docker but I wanted to hear your thoughts.
https://redd.it/10o995h
@r_devops
I am on Ubuntu 22.04, (Pop OS) which still doesn't support MongoDB 6.0. Some people have a suggested tinkering with the repo .list, but it seems kind of off.
If I were in an organization with a tigher security protocol, in what way would I develop locally with mongodb? I thought about running mongodb in docker but I wanted to hear your thoughts.
https://redd.it/10o995h
@r_devops
Reddit
r/devops - Whats the best practice on using a package your distro version doesn't support?
Posted in the devops community.
Am I missing something? (argo cd and helm in AWS)
My goal is simply to deploy helm charts for our applications via argo cd, but it seems harder than it should be. I’m not sure if I’m missing something but our environment can’t be uncommon.
We are using EKS and we have working helm releases - I was exploring simply moving from native helm to Argo applications. Our helm charts are stored via OCI in ECR.
The first thing I ran into is there is no native integration from Argo to private ECR over the OCI to get charts. Several people have workarounds or cronjobs to get ECR tokens but I’m not really looking to add hacks just to use Argo.
The second option was to just make my charts public and apply the values file from the git repo where our apps are. Immediately found that helm repos and git sources aren’t meant to be mixed by Argo. They’ve very very recently added support for this but it’s basically still in beta.
So I’m left wondering.. what am I missing here? I understand that these things are being addressed and there are ways to make it happen but how is everyone else doing this? How are you applying helm charts with private values files with Argo? Is everyone just using artifactory or harbor and I’m in the minority?
I get the sense Argo was made for kustomize and helm support was bolted on after. Which makes sense.. I guess helm isn’t really “gitops”.
https://redd.it/10o97jo
@r_devops
My goal is simply to deploy helm charts for our applications via argo cd, but it seems harder than it should be. I’m not sure if I’m missing something but our environment can’t be uncommon.
We are using EKS and we have working helm releases - I was exploring simply moving from native helm to Argo applications. Our helm charts are stored via OCI in ECR.
The first thing I ran into is there is no native integration from Argo to private ECR over the OCI to get charts. Several people have workarounds or cronjobs to get ECR tokens but I’m not really looking to add hacks just to use Argo.
The second option was to just make my charts public and apply the values file from the git repo where our apps are. Immediately found that helm repos and git sources aren’t meant to be mixed by Argo. They’ve very very recently added support for this but it’s basically still in beta.
So I’m left wondering.. what am I missing here? I understand that these things are being addressed and there are ways to make it happen but how is everyone else doing this? How are you applying helm charts with private values files with Argo? Is everyone just using artifactory or harbor and I’m in the minority?
I get the sense Argo was made for kustomize and helm support was bolted on after. Which makes sense.. I guess helm isn’t really “gitops”.
https://redd.it/10o97jo
@r_devops
Reddit
r/devops on Reddit: Am I missing something? (argo cd and helm in AWS)
Posted by u/from_the_river_flow - No votes and no comments
Microservices Authentication: SAML and JWT
I have the following problem: I want to create an authentication concept for a microservices environment. External requests by users go through an API gateway. User authentication and transfer of user context inside the platform should be done via JWTs.
A user should be able to authenticate to the platform via SAML. How could this be enabled?
I am aware that exchanging a SAML token to a JWT is not possible or very difficult. Would it be an option not to return a JWT to the user, but to generate it on the gateway after successful authentication and attach it to the user request?
https://redd.it/10o8yzd
@r_devops
I have the following problem: I want to create an authentication concept for a microservices environment. External requests by users go through an API gateway. User authentication and transfer of user context inside the platform should be done via JWTs.
A user should be able to authenticate to the platform via SAML. How could this be enabled?
I am aware that exchanging a SAML token to a JWT is not possible or very difficult. Would it be an option not to return a JWT to the user, but to generate it on the gateway after successful authentication and attach it to the user request?
https://redd.it/10o8yzd
@r_devops
Reddit
r/devops - Microservices Authentication: SAML and JWT
Posted in the devops community.
jenkins using variable in withcredentials block?
Hi guys,
I am not able to find how can I use variable for credentialsID in that withCredential script. I will use just same a jenkinsfile for all branches with different credentials so I need to do it.
​
I have tried these versions
​
https://redd.it/10ob2el
@r_devops
Hi guys,
I am not able to find how can I use variable for credentialsID in that withCredential script. I will use just same a jenkinsfile for all branches with different credentials so I need to do it.
​
withCredentials([usernamePassword(credentialsId: 'GITHUBCREDENTIALS' , passwordVariable: 'GIT_PASS', usernameVariable: 'GIT_USER')])I have tried these versions
'$GITHUBCREDENTIALS''${GITHUBCREDENTIALS}''"${GITHUBCREDENTIALS}"''"'${GITHUBCREDENTIALS}'"'​
https://redd.it/10ob2el
@r_devops
Reddit
r/devops - jenkins using variable in withcredentials block?
Posted in the devops community.
Is it possible to share the checkout and setup result for next jobs?
I'm fairly new to Github actions and started with this workflow
name: QA on pull request
on: pullrequest
jobs:
run-tests:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Setup Go
uses: actions/setup-go@v3
with:
go-version: 1.19
- name: Run tests
run: make test
build-application:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Setup Go
uses: actions/setup-go@v3
with:
go-version: 1.19
- name: Build application
run: make build
I want to run both jobs in parallel so the build job doesn't have to wait for the tests to finish. But as you can see both of them have to checkout the repository and have to setup Go.
Is it possible to share this step or even share the result? This is my pseudo solution
name: QA on pull request
on: pullrequest
jobs:
setup:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Setup Go
uses: actions/setup-go@v3
with:
go-version: 1.19
# share all the data from here
run-tests:
runs-on: ubuntu-latest
steps:
- name: Import data from setup job
# Maybe as artifact?
- name: Run tests
run: make test
build-application:
runs-on: ubuntu-latest
steps:
- name: Import data from setup job
# Maybe as artifact?
- name: Build application
run: make build
If this is not possible, can I extract the duplicate logic into a "function" I can call twice so I don't have to write the logic in every job?
https://redd.it/10o4rnk
@r_devops
I'm fairly new to Github actions and started with this workflow
name: QA on pull request
on: pullrequest
jobs:
run-tests:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Setup Go
uses: actions/setup-go@v3
with:
go-version: 1.19
- name: Run tests
run: make test
build-application:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Setup Go
uses: actions/setup-go@v3
with:
go-version: 1.19
- name: Build application
run: make build
I want to run both jobs in parallel so the build job doesn't have to wait for the tests to finish. But as you can see both of them have to checkout the repository and have to setup Go.
Is it possible to share this step or even share the result? This is my pseudo solution
name: QA on pull request
on: pullrequest
jobs:
setup:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Setup Go
uses: actions/setup-go@v3
with:
go-version: 1.19
# share all the data from here
run-tests:
runs-on: ubuntu-latest
steps:
- name: Import data from setup job
# Maybe as artifact?
- name: Run tests
run: make test
build-application:
runs-on: ubuntu-latest
steps:
- name: Import data from setup job
# Maybe as artifact?
- name: Build application
run: make build
If this is not possible, can I extract the duplicate logic into a "function" I can call twice so I don't have to write the logic in every job?
https://redd.it/10o4rnk
@r_devops
Reddit
r/devops on Reddit: Is it possible to share the checkout and setup result for next jobs?
Posted by u/markustuchel - No votes and 1 comment
Looking for new CI/CD tools at new company
I just onboard new company as a devops engineer. And they need build everything from beginning and I am only devops engineer at the moment. So I can chose which tools to use and write a proposal.
Current tools: Jenkins, Jira, Github free, all infrastructure are on AWS (most services are on K8s), Enterprise stuffs on azure ( office 365, power BI, Sharepoint ...)
So I planning to build with those 2 options:
1. Gitlab Premium CI tools, ArgoCD for CD.
2. Will use Azure Devops .Azure Repos replace github, Azure Board to replace Jira, Azure Pipeline for CI and ArgoCD for CD.
I am leaning more towards option 2. Because the cost of Azure Devops is much cheaper, when replace Az Board with Jira, we can cancel Jira. Also everything will be in one place, no need to install apps integrate between Jira and gitlab to build charts, dashboard... for reporting.
I don't have much exp with azure devops but as I saw feedback from community is rather positive. But the only thing concern me is why Azure devops is so much cheaper when compare to Gitlab $19 vs $6 per user + 2$ per Gb. I don't know if there is any catch on this.
Any advice would be greatly appreciated.
https://redd.it/10lwr7u
@r_devops
I just onboard new company as a devops engineer. And they need build everything from beginning and I am only devops engineer at the moment. So I can chose which tools to use and write a proposal.
Current tools: Jenkins, Jira, Github free, all infrastructure are on AWS (most services are on K8s), Enterprise stuffs on azure ( office 365, power BI, Sharepoint ...)
So I planning to build with those 2 options:
1. Gitlab Premium CI tools, ArgoCD for CD.
2. Will use Azure Devops .Azure Repos replace github, Azure Board to replace Jira, Azure Pipeline for CI and ArgoCD for CD.
I am leaning more towards option 2. Because the cost of Azure Devops is much cheaper, when replace Az Board with Jira, we can cancel Jira. Also everything will be in one place, no need to install apps integrate between Jira and gitlab to build charts, dashboard... for reporting.
I don't have much exp with azure devops but as I saw feedback from community is rather positive. But the only thing concern me is why Azure devops is so much cheaper when compare to Gitlab $19 vs $6 per user + 2$ per Gb. I don't know if there is any catch on this.
Any advice would be greatly appreciated.
https://redd.it/10lwr7u
@r_devops
Reddit
r/devops on Reddit: Looking for new CI/CD tools at new company
Posted by u/cotoka - 1 vote and no comments
Cloud vs DevOPs - Career path advices
Hey guys,
I got inspired lately from a post about a differences between a Cloud engineer and DevOps and as someone with mostly a traditional Linux System admin background i have found my self not enjoying so much my current role as DevOps Engineer which is focused more on a developer aspect and working on a project currently that its coming from a telecommunication company which i dont like their weird ways...or even worse the so buggy products that we have to deal.
By said that i have told my manager that i want a career path focused mostly on Cloud and tools like terraform and Ansible. Like a Automation Engineer or Cloud Engineer...
Do you think its a good choice for someone that doesnt enjoy development much, is all the way in for remote working and also hates the buggy products from telecommunications companies?
When i first started working as DevOps i was feeling like it will be interesting and closer to system admin stuff but i feel disappointed now.
Luckily my company hunts always new clients and projects so i have room to push my manager even to switch teams and clients, so i can work on stuff that i find beneficial for me.
But whats your opinion?what stuff should i hunt to work for to avoid bad situations in work and achieve the career life i want?
Also, I am someone that doesnt like abstract thinking or mathematical way of thinking and development so..hehe
Do you think also for learning Green fields projects are better?
So much questions and discuss topics that i have in my mind...
https://redd.it/10oi566
@r_devops
Hey guys,
I got inspired lately from a post about a differences between a Cloud engineer and DevOps and as someone with mostly a traditional Linux System admin background i have found my self not enjoying so much my current role as DevOps Engineer which is focused more on a developer aspect and working on a project currently that its coming from a telecommunication company which i dont like their weird ways...or even worse the so buggy products that we have to deal.
By said that i have told my manager that i want a career path focused mostly on Cloud and tools like terraform and Ansible. Like a Automation Engineer or Cloud Engineer...
Do you think its a good choice for someone that doesnt enjoy development much, is all the way in for remote working and also hates the buggy products from telecommunications companies?
When i first started working as DevOps i was feeling like it will be interesting and closer to system admin stuff but i feel disappointed now.
Luckily my company hunts always new clients and projects so i have room to push my manager even to switch teams and clients, so i can work on stuff that i find beneficial for me.
But whats your opinion?what stuff should i hunt to work for to avoid bad situations in work and achieve the career life i want?
Also, I am someone that doesnt like abstract thinking or mathematical way of thinking and development so..hehe
Do you think also for learning Green fields projects are better?
So much questions and discuss topics that i have in my mind...
https://redd.it/10oi566
@r_devops
Reddit
r/devops on Reddit
Cloud vs DevOPs - Career path advices
Lightweight logs collection and discord notifications
Central is a highly efficient, lightweight application that facilitates the collection of logs from various sources, as well as the monitoring of their health status. Utilizing bottle.py and gevent technologies, Central is designed to offer a seamless user experience while maintaining high performance standards.
Central github
https://redd.it/10ol2q9
@r_devops
Central is a highly efficient, lightweight application that facilitates the collection of logs from various sources, as well as the monitoring of their health status. Utilizing bottle.py and gevent technologies, Central is designed to offer a seamless user experience while maintaining high performance standards.
Central github
https://redd.it/10ol2q9
@r_devops
GitHub
GitHub - achaayb/Central: Lightweight logs collection and discord notifications
Lightweight logs collection and discord notifications - GitHub - achaayb/Central: Lightweight logs collection and discord notifications
How to keep in mind all the directories
I'm struggling to remember all the directories of tools. Like tomcat, Nagios, Apache, Jenkins and others. it's hard to remember instantly, so I need to check my notes when working. I don't know if they ask on interview I might be fail to answer.
​
I have interview in few days.
https://redd.it/10ot0tt
@r_devops
I'm struggling to remember all the directories of tools. Like tomcat, Nagios, Apache, Jenkins and others. it's hard to remember instantly, so I need to check my notes when working. I don't know if they ask on interview I might be fail to answer.
​
I have interview in few days.
https://redd.it/10ot0tt
@r_devops
Reddit
r/devops - How to keep in mind all the directories
Posted in the devops community.
Looking for platforms with challenges or 'realistic' problems
Hello, im noob to devops but i have some experience with programming/netwoking/virtualization and many other.
Im looking for webpages o resources with challenges, like hackthebox (for example), to practice with.
Is there any resource to learn like that?
https://redd.it/10ovrkr
@r_devops
Hello, im noob to devops but i have some experience with programming/netwoking/virtualization and many other.
Im looking for webpages o resources with challenges, like hackthebox (for example), to practice with.
Is there any resource to learn like that?
https://redd.it/10ovrkr
@r_devops
Reddit
r/devops - Looking for platforms with challenges or 'realistic' problems
Posted in the devops community.
Apache Superset and Prometheus
Would you like to be able to query Prometheus data with Apache Superset?
If Yes, what use cases do you have in mind?
https://redd.it/10ox71x
@r_devops
Would you like to be able to query Prometheus data with Apache Superset?
If Yes, what use cases do you have in mind?
https://redd.it/10ox71x
@r_devops
Reddit
r/devops - Apache Superset and Prometheus
Posted in the devops community.
Does it make sense to provide an SLA for a microservices based Saas service?
This is based on this post where I asked about calculating the SLA based on cloud services used.
So basically, if I have a server running on an EC2 instance, which connects to a RDS database to serve the requests, the maximum SLA I can achieve is (SLA of EC2 x SLA of RDS). However, this is the SLA for the cloud components, not taking in to account the application failures.
So I set off to calculate the SLA for our platform, which is microservices based.
Meaning that every user request goes through a long sequence of cloud services, like WAF, load balancer, EC2 instance, RDS instance, Redis, etc.
These are all in the critical path, so the SLA just for these services becomes something like (
Clearly 98.8% is not a SLA level that we want to advertise to customers.
So I am wondering, does it make any sense to calculate the SLA for a microservices based system?
Or should we ignore the SLA of underlying infrastructure and just account for our own application availability?
EDIT:
I should add that the number 99.99 SLA value is the one advertised by the cloud provider. This is not the SLA we calculate internally. Its the advertised SLA from the provider.
https://redd.it/10owws0
@r_devops
This is based on this post where I asked about calculating the SLA based on cloud services used.
So basically, if I have a server running on an EC2 instance, which connects to a RDS database to serve the requests, the maximum SLA I can achieve is (SLA of EC2 x SLA of RDS). However, this is the SLA for the cloud components, not taking in to account the application failures.
So I set off to calculate the SLA for our platform, which is microservices based.
Meaning that every user request goes through a long sequence of cloud services, like WAF, load balancer, EC2 instance, RDS instance, Redis, etc.
These are all in the critical path, so the SLA just for these services becomes something like (
99.99^number_of_services). The number_of_servicesis greater than 10, and what I end up is an SLA of about 98.8%. And this is before accounting for our own application error budget.Clearly 98.8% is not a SLA level that we want to advertise to customers.
So I am wondering, does it make any sense to calculate the SLA for a microservices based system?
Or should we ignore the SLA of underlying infrastructure and just account for our own application availability?
EDIT:
I should add that the number 99.99 SLA value is the one advertised by the cloud provider. This is not the SLA we calculate internally. Its the advertised SLA from the provider.
https://redd.it/10owws0
@r_devops
Reddit
r/devops - How do you define SLO (and SLA) for a cloud platform
Posted in the devops community.
When devops (as a practice) start to fall apart
I've just realized the moment it happens. It's the moment, when a member of a team no longer can overhaul infra 'because it's better this way', and instead other people told him/her that 'it will cause too much of changes for other team members'.
Basically, it's a start of dying and ossification. Specific toolstack with specific practices, was amazing 3 years ago, okayish now, and, but of course, going to be obsolete in 5 to 7 years.
The more time passes, the more 'existing stability' become a hard stop for any significant overhaul. Only small incremental changes with legacy (in humans heads!) for years. Infra is getting to the point when switching to a new tech is a revolution, and it's easier to do from scratch than to evolve non-evolvable ossified 'this'.
I've just got this situation, when infra is clearly need shift from one paradigm to another, and it was struck down only because of the 'too much people need to readopt to a different approach and we can't afford it'.
Kinda sad...
https://redd.it/10ozhhx
@r_devops
I've just realized the moment it happens. It's the moment, when a member of a team no longer can overhaul infra 'because it's better this way', and instead other people told him/her that 'it will cause too much of changes for other team members'.
Basically, it's a start of dying and ossification. Specific toolstack with specific practices, was amazing 3 years ago, okayish now, and, but of course, going to be obsolete in 5 to 7 years.
The more time passes, the more 'existing stability' become a hard stop for any significant overhaul. Only small incremental changes with legacy (in humans heads!) for years. Infra is getting to the point when switching to a new tech is a revolution, and it's easier to do from scratch than to evolve non-evolvable ossified 'this'.
I've just got this situation, when infra is clearly need shift from one paradigm to another, and it was struck down only because of the 'too much people need to readopt to a different approach and we can't afford it'.
Kinda sad...
https://redd.it/10ozhhx
@r_devops
Reddit
When devops (as a practice) start to fall apart
Posted in the devops community.
Uptime, status pages, and why transparency is often lost
After seeing a lot of comments about the recent Slack outages, thought I'd write-up my thoughts about why status pages so often become a battleground for transparency, based on my experience working at companies that went through similar journeys.
I'd be super interested in other perspectives, especially if you've encountered non-obvious pressures that work against efforts to be fully transparent when it comes to public incident comms.
The post is here: https://blog.lawrencejones.dev/status-pages/
https://redd.it/10p06pt
@r_devops
After seeing a lot of comments about the recent Slack outages, thought I'd write-up my thoughts about why status pages so often become a battleground for transparency, based on my experience working at companies that went through similar journeys.
I'd be super interested in other perspectives, especially if you've encountered non-obvious pressures that work against efforts to be fully transparent when it comes to public incident comms.
The post is here: https://blog.lawrencejones.dev/status-pages/
https://redd.it/10p06pt
@r_devops
blog.lawrencejones.dev
Uptime, status pages, and transparency calculus
From the evergreen AWS status page to hardcoded 100% uptime, no one fully trusts a status page anymore. But why is this? Companies often start with good intentions, aiming for full transparency. So why do so many change along the way: what pressures people…