Using work provided sand-box for learning? At odds here..
I have access to an Azure and AWS provided sand-box , as a total newb and junior, I need to upskill. I see where I am lacking and it involves, AWS , azure, cloud based skills. Which employer knew.
My only concern is that, I want to start using it, but deploying sample websites , to learn PKI, learn certificate installs , buying domains etc. Only concern with that is….1. Buying things,(domains, etc) is not cheap. 2. I was told its for learning purposes and to kill off anything Which i wont be using ( was also planning to self learn terraform this way - to help destroy any infrastructure I create.)
Is there anything bad with inherently learning how to do this, I would for 1, not be establishing a personal website or the like, but i would like to learn how to deploy it. Since this account is being paid for or would i be better off, mocking something we have in our DEV or test environment to learn from in my own sandbox.
Never really had one, so seeking some input on that
https://redd.it/1eqpfxy
@r_devops
I have access to an Azure and AWS provided sand-box , as a total newb and junior, I need to upskill. I see where I am lacking and it involves, AWS , azure, cloud based skills. Which employer knew.
My only concern is that, I want to start using it, but deploying sample websites , to learn PKI, learn certificate installs , buying domains etc. Only concern with that is….1. Buying things,(domains, etc) is not cheap. 2. I was told its for learning purposes and to kill off anything Which i wont be using ( was also planning to self learn terraform this way - to help destroy any infrastructure I create.)
Is there anything bad with inherently learning how to do this, I would for 1, not be establishing a personal website or the like, but i would like to learn how to deploy it. Since this account is being paid for or would i be better off, mocking something we have in our DEV or test environment to learn from in my own sandbox.
Never really had one, so seeking some input on that
https://redd.it/1eqpfxy
@r_devops
Reddit
Using work provided sand-box for learning? At odds here.. : r/devops
18 votes, 20 comments. 349K subscribers in the devops community.
created a terraform for faster docker builds using a remote buildkit instance
Hey folks, I know most of you use Docker in your CI/CD pipelines. Slow Docker builds are so annoying and frustrating—we’ve all been there!
I created this open-sourced repo, https://github.com/useblacksmith/remote-buildkit-terraform, which contains a terraform config to quickly spin up and configure a remote BuildKit instance in aws that caches docker layers and substantially speeds up docker builds.
It is not perfect and wouldn’t work for large engineering teams, but it could really help many folks here.
Feel free to use it and let me know what you think.
https://redd.it/1eqr954
@r_devops
Hey folks, I know most of you use Docker in your CI/CD pipelines. Slow Docker builds are so annoying and frustrating—we’ve all been there!
I created this open-sourced repo, https://github.com/useblacksmith/remote-buildkit-terraform, which contains a terraform config to quickly spin up and configure a remote BuildKit instance in aws that caches docker layers and substantially speeds up docker builds.
It is not perfect and wouldn’t work for large engineering teams, but it could really help many folks here.
Feel free to use it and let me know what you think.
https://redd.it/1eqr954
@r_devops
GitHub
GitHub - useblacksmith/remote-buildkit-terraform
Contribute to useblacksmith/remote-buildkit-terraform development by creating an account on GitHub.
Are we having a good use case for k8s jobs?
Hello,
we are looking at optimising our kubernetes workloads. The cluster's are hosted on AWS EKS.
For reference a small overview of how a usual java/python service works in our cluster:
We are using AWS step functions to create a message in SQS, our pods are constaly checking its appropriate queue. If there is a new message it will perform the task. Based on queue size we are scaling the pods, to be able to handle higher traffic. As HPA we are using zalando adapter for metrics server Github.
So far this works quite well. However most of our services are not often triggered, this means we have a lot of pods just running without doing anything.
To better use our resources, we thought about migrating some of these services from pods to jobs. If a new message is sent to a queue, it will trigger a kubernetes job (looks like KEDA could be used for this). And the service will perform its task and then the job gets terminated.
Would this be a good use case for kubernetes jobs or are you recommending to look at other approches?
Thanks!
https://redd.it/1eqnxup
@r_devops
Hello,
we are looking at optimising our kubernetes workloads. The cluster's are hosted on AWS EKS.
For reference a small overview of how a usual java/python service works in our cluster:
We are using AWS step functions to create a message in SQS, our pods are constaly checking its appropriate queue. If there is a new message it will perform the task. Based on queue size we are scaling the pods, to be able to handle higher traffic. As HPA we are using zalando adapter for metrics server Github.
So far this works quite well. However most of our services are not often triggered, this means we have a lot of pods just running without doing anything.
To better use our resources, we thought about migrating some of these services from pods to jobs. If a new message is sent to a queue, it will trigger a kubernetes job (looks like KEDA could be used for this). And the service will perform its task and then the job gets terminated.
Would this be a good use case for kubernetes jobs or are you recommending to look at other approches?
Thanks!
https://redd.it/1eqnxup
@r_devops
GitHub
GitHub - zalando-incubator/kube-metrics-adapter: General purpose metrics adapter for Kubernetes HPA metrics
General purpose metrics adapter for Kubernetes HPA metrics - zalando-incubator/kube-metrics-adapter
Should we CI/CD on production
Yesterday, my colleague told me that he didn’t think implemented ci/cd on production environment was a good idea. Since it could accidentally made something wrong and out of control. He suggested that we should deployed production manually, what do you guys think about it, please let me know
https://redd.it/1equmsf
@r_devops
Yesterday, my colleague told me that he didn’t think implemented ci/cd on production environment was a good idea. Since it could accidentally made something wrong and out of control. He suggested that we should deployed production manually, what do you guys think about it, please let me know
https://redd.it/1equmsf
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
argc - Top-tier utility/framework for creating shell scripts
https://github.com/sigoden/argc
I’m not the author. Whoever it is, they are a bloody legend!
Figured I would share it as it deserves way more love.
https://redd.it/1eqvgzw
@r_devops
https://github.com/sigoden/argc
I’m not the author. Whoever it is, they are a bloody legend!
Figured I would share it as it deserves way more love.
https://redd.it/1eqvgzw
@r_devops
GitHub
GitHub - sigoden/argc: A Bash CLI framework, also a Bash command runner.
A Bash CLI framework, also a Bash command runner. Contribute to sigoden/argc development by creating an account on GitHub.
Immutable VM image bakery companies?
What companies create hardened immutable VM images? For containers/docker images ChainGuard seems to be the front runners. Do any companies focus on VM images?
https://redd.it/1eqv2sm
@r_devops
What companies create hardened immutable VM images? For containers/docker images ChainGuard seems to be the front runners. Do any companies focus on VM images?
https://redd.it/1eqv2sm
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Attempting a Website Builder
Hey everyone. Im attempting to build a website builder (targeting low traffic sites).
my plan was to deploy a single VM initially and run multiple containers on it. i.e backend, frontend, reverse-proxy, certbot for the app/builder and have a service like vercel/netlify handle all the domains and deployment of users websites.
but i had the bright idea of what if i have a go of it myself and learn more dev ops on the way. What do i need to know to build the devops side of a website builder....
I thought at first should i try run everything on a single VM to reduce costs initially and scale vertically and worry about getting it get scale horizontally across multiple VM later. (I know its a single point of failure).
Am i crazy for even thinking a website builder can operate without kubernetes?
currently i have a cd ci pipeline. with infrastructure managed by terraform. and ansible configuring my vm and pull and run my docker images.
any direction or thoughts would help. I am fairly new to dev ops, so sorry if my explainations aren't clear.
many thanks.
https://redd.it/1eqyhw6
@r_devops
Hey everyone. Im attempting to build a website builder (targeting low traffic sites).
my plan was to deploy a single VM initially and run multiple containers on it. i.e backend, frontend, reverse-proxy, certbot for the app/builder and have a service like vercel/netlify handle all the domains and deployment of users websites.
but i had the bright idea of what if i have a go of it myself and learn more dev ops on the way. What do i need to know to build the devops side of a website builder....
I thought at first should i try run everything on a single VM to reduce costs initially and scale vertically and worry about getting it get scale horizontally across multiple VM later. (I know its a single point of failure).
Am i crazy for even thinking a website builder can operate without kubernetes?
currently i have a cd ci pipeline. with infrastructure managed by terraform. and ansible configuring my vm and pull and run my docker images.
any direction or thoughts would help. I am fairly new to dev ops, so sorry if my explainations aren't clear.
many thanks.
https://redd.it/1eqyhw6
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
ZFS or Ceph
Any use in learning ZFS or Ceph if I want to switch to a DevOps job?
https://redd.it/1er1wzs
@r_devops
Any use in learning ZFS or Ceph if I want to switch to a DevOps job?
https://redd.it/1er1wzs
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
See the cost of your Terraform in IntelliJ IDEs, as you develop it
Hi, my name is Owen and I recently started working at Infracost (YC W21 batch) (https://infracost.io). Infracost shows engineers how much their code changes will cost on the cloud before it gets deployed. For example, when an engineer changes a cloud resource (like an AWS virtual machine), Infracost posts a comment in CI/CD telling them "This change is going to increase your costs by 25% next month from $500/m to $625/m".
Previously, I was one of the founders of tfsec, the code security scanner; I quickly realised that identifying issues in your code (especially infrastructure code, i.e. Terraform) as soon as possible was the best defence. A lot of the principles of code scanning for security misconfigurations translate well to identifying cost impact. Many times, people are surprised by how cloud resources are priced and how expensive they can be. It is also really unfair that engineers are never given a ‘checkout screen’ when buying infrastructure, and then are blamed for breaking cloud budgets.
I believe engineers should have access to key information about cloud costs at the time of writing the code. So, I spent some time and built an Infracost plugin for the IntelliJ family of IDEs (https://plugins.jetbrains.com/plugin/24761-infracost).
With this plugin installed, as you develop your Terraform code, you will get the cost impact of your current project, and quickly see where the expensive resources are hiding in your code (just hit save & it will recalculate). Two main use cases I’m thinking of:
As you change resources, you can see the cost impact. For example, I increased the instance size from my Dev to Prod environment to handle the prod-sized workloads, and I can see the increase costs.
Comparing costs: I can copy + paste blocks of code and see the cost impact of using different configuration options, like removing multi-AZ options from test environments etc. I can see I save a few thousand dollars per year that way immediately.
You can still use Infracost in GitHub/GitLab to automate the cost analysis in CI/CD, and check for best practices, and the IDE tools will help you spot the issues sooner.
I’d love to get your feedback on this. I want to know if it is helpful, what other cool features we can add, and how it can be improved. Also if you spot any issues or bugs, let me know!
Here is how to install it: https://plugins.jetbrains.com/plugin/24761-infracost
I've done a demo video to get you started too - https://www.youtube.com/watch?v=kgfkdmUNzEo
https://redd.it/1er7966
@r_devops
Hi, my name is Owen and I recently started working at Infracost (YC W21 batch) (https://infracost.io). Infracost shows engineers how much their code changes will cost on the cloud before it gets deployed. For example, when an engineer changes a cloud resource (like an AWS virtual machine), Infracost posts a comment in CI/CD telling them "This change is going to increase your costs by 25% next month from $500/m to $625/m".
Previously, I was one of the founders of tfsec, the code security scanner; I quickly realised that identifying issues in your code (especially infrastructure code, i.e. Terraform) as soon as possible was the best defence. A lot of the principles of code scanning for security misconfigurations translate well to identifying cost impact. Many times, people are surprised by how cloud resources are priced and how expensive they can be. It is also really unfair that engineers are never given a ‘checkout screen’ when buying infrastructure, and then are blamed for breaking cloud budgets.
I believe engineers should have access to key information about cloud costs at the time of writing the code. So, I spent some time and built an Infracost plugin for the IntelliJ family of IDEs (https://plugins.jetbrains.com/plugin/24761-infracost).
With this plugin installed, as you develop your Terraform code, you will get the cost impact of your current project, and quickly see where the expensive resources are hiding in your code (just hit save & it will recalculate). Two main use cases I’m thinking of:
As you change resources, you can see the cost impact. For example, I increased the instance size from my Dev to Prod environment to handle the prod-sized workloads, and I can see the increase costs.
Comparing costs: I can copy + paste blocks of code and see the cost impact of using different configuration options, like removing multi-AZ options from test environments etc. I can see I save a few thousand dollars per year that way immediately.
You can still use Infracost in GitHub/GitLab to automate the cost analysis in CI/CD, and check for best practices, and the IDE tools will help you spot the issues sooner.
I’d love to get your feedback on this. I want to know if it is helpful, what other cool features we can add, and how it can be improved. Also if you spot any issues or bugs, let me know!
Here is how to install it: https://plugins.jetbrains.com/plugin/24761-infracost
I've done a demo video to get you started too - https://www.youtube.com/watch?v=kgfkdmUNzEo
https://redd.it/1er7966
@r_devops
Infracost - Shift FinOps Left
Shift FinOps Left with Infracost
Know the cost impact of infrastructure changes before launching resources. Try it for free today!
Take control over GitHub repositories through leaked secrets in artifacts
New research shows how organizations tend to embed secrets in GitHub Actions workflow artifacts, mainly GitHub tokens. While the GITHUB_TOKEN is invalidated as soon as the job is complete, it's still possible to track the artifact upload and utilize the token to push code to the repository before the job is done.
Issue was found in highly-popular open source projects, owned by Google, Microsoft, AWS, Red Hat, Canonical (Ubuntu), OWASP, and others.
https://unit42.paloaltonetworks.com/github-repo-artifacts-leak-tokens/
https://redd.it/1er8x0j
@r_devops
New research shows how organizations tend to embed secrets in GitHub Actions workflow artifacts, mainly GitHub tokens. While the GITHUB_TOKEN is invalidated as soon as the job is complete, it's still possible to track the artifact upload and utilize the token to push code to the repository before the job is done.
Issue was found in highly-popular open source projects, owned by Google, Microsoft, AWS, Red Hat, Canonical (Ubuntu), OWASP, and others.
https://unit42.paloaltonetworks.com/github-repo-artifacts-leak-tokens/
https://redd.it/1er8x0j
@r_devops
Unit 42
ArtiPACKED: Hacking Giants Through a Race Condition in GitHub Actions Artifacts
New research uncovers a potential attack vector on GitHub repositories, with leaked tokens leading to potential compromise of services.
I am building a new CI tool what things should I keep in mind ?
If I were to build a new CI tool what are some things i should do which gives me competitive edge over others ?
https://redd.it/1er9cwm
@r_devops
If I were to build a new CI tool what are some things i should do which gives me competitive edge over others ?
https://redd.it/1er9cwm
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Needing to run 4 web applications, each requiring only 0.25cpu 500mb ram, what's the most economical way on AWS?
I'm looking into various options to run 4 web applications, each requiring only 0.25 cpu and 500mb ram (or lesser even). Traffic is fairly low, less than 1k active users a month. Each application is merely running SPA + a node backend bundled with it. These applications also update very frequently (once or twice a day), it needs to automatically swap out the old, from code to a running application, without downtime, and without supervision.
Sure, I could setup a EKS cluster running solely on spot nodes + running multiple replicas of them to ensure spot termination interrupt doesn't create downtime. But even that, would cost me roughly $200 a month (guesstimate). Slap in argocd, image updater and a build pipeline, everything is handled for me without supervision.
Or I could spin up an EC2 instance, and have them all run in it, but these applications updates once or twice a day, I needed a way to have them deployed as soon as code is checked in to the repository, automatically. I don't feel like fiddling with webhook, SNS and lambda just to get it work.
Then I saw AWS Amplify, it can tracks code! and have them built as soon as there's code checks in and deployed automatically. But damn, they are buggy, I could not get those applications to work 100% on Amplify for some weird reasons I could not understand behind the scene.
Then I saw ECS with Fargate, seems promising, but the ability for me to automate builds and deploys from code to a running container is still questionable. I'm not sure if there's cost advantage comapred to running a full EKS + spot instances only (economical-wise).
I looked at other providers, like Digital Ocean and Vultr, they offer managed kubernetes control plane that cost $0, but damn their container registry cost a lot more than AWS ECR and has no lifecycle policy to automatically remove old images, which brings the cost very similar as though I'm doing the same on AWS.
Any idea how would you deploy these applications?
https://redd.it/1erbi8r
@r_devops
I'm looking into various options to run 4 web applications, each requiring only 0.25 cpu and 500mb ram (or lesser even). Traffic is fairly low, less than 1k active users a month. Each application is merely running SPA + a node backend bundled with it. These applications also update very frequently (once or twice a day), it needs to automatically swap out the old, from code to a running application, without downtime, and without supervision.
Sure, I could setup a EKS cluster running solely on spot nodes + running multiple replicas of them to ensure spot termination interrupt doesn't create downtime. But even that, would cost me roughly $200 a month (guesstimate). Slap in argocd, image updater and a build pipeline, everything is handled for me without supervision.
Or I could spin up an EC2 instance, and have them all run in it, but these applications updates once or twice a day, I needed a way to have them deployed as soon as code is checked in to the repository, automatically. I don't feel like fiddling with webhook, SNS and lambda just to get it work.
Then I saw AWS Amplify, it can tracks code! and have them built as soon as there's code checks in and deployed automatically. But damn, they are buggy, I could not get those applications to work 100% on Amplify for some weird reasons I could not understand behind the scene.
Then I saw ECS with Fargate, seems promising, but the ability for me to automate builds and deploys from code to a running container is still questionable. I'm not sure if there's cost advantage comapred to running a full EKS + spot instances only (economical-wise).
I looked at other providers, like Digital Ocean and Vultr, they offer managed kubernetes control plane that cost $0, but damn their container registry cost a lot more than AWS ECR and has no lifecycle policy to automatically remove old images, which brings the cost very similar as though I'm doing the same on AWS.
Any idea how would you deploy these applications?
https://redd.it/1erbi8r
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Traefik global redirect from www to non-www domain
I want to redirect all my containers - websites from
I had discussion with ChatGpt, but what it gave me doesn't work, it just loads https://www.mywebsite.com without a SSL certificate.
Here is my Traefik dynamic.yml configuration, what is missing to make it work? I want to apply this redirect globally in static or dynamic configuration without editing labels for each container.
This does redirect but www domain has no https certificate.
https://redd.it/1ercmvj
@r_devops
I want to redirect all my containers - websites from
https://www.mywebsite.com to https://mywebsite.com. Http to https redirect I already have. I have set up CNAME dns record to point www.mywebsite.com to my server's IP.I had discussion with ChatGpt, but what it gave me doesn't work, it just loads https://www.mywebsite.com without a SSL certificate.
Here is my Traefik dynamic.yml configuration, what is missing to make it work? I want to apply this redirect globally in static or dynamic configuration without editing labels for each container.
This does redirect but www domain has no https certificate.
# dynamic configuration
http:
middlewares:
redirect-to-non-www:
redirectRegex:
regex: "^https?://www\\.(.*)"
replacement: "https://$1"
permanent: true
secureHeaders:
headers:
sslRedirect: true
forceSTSHeader: true
stsIncludeSubdomains: true
stsPreload: true
stsSeconds: 31536000
user-auth:
basicAuth:
users:
- '{{ env "TRAEFIK_AUTH" }}'
routers:
default-router:
entryPoints:
- web
- websecure
rule: "HostRegexp(`{host:.+}`)"
middlewares:
- redirect-to-non-www
- secureHeaders
- user-auth
service: noop-service
priority: 1
services:
noop-service:
loadBalancer:
servers:
- url: "https://0.0.0.0"
tls:
options:
default:
cipherSuites:
- TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
- TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
- TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
- TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305
minVersion: VersionTLS12
https://redd.it/1ercmvj
@r_devops
Should I leave ?
Hey all, struggling with what to do with regards to my current role
My main issue is around a year ago a lot of the stuff which I would have been interested in has been abstracted away to managed vendors , from the management of our environments to the management of developer machines.
Anything network related is handled by either an internal network team or again our managed vendor
As such , there’s actually not much I have direct responsibilities over in any meaningful capacity.
I can feel my skills atrophying and it just feels like we’re secretaries for these other teams to tell them something is wrong, it really feels like just a glorified support role they slapped the name devops engineer on
We are barely involved in th development process for any new applications and don’t have much of any opportunities to practice anything
I’ve been trying to learn in my own time but it’s hard when you can’t utilise the skills in the work place
As someone who’s first job this is out of uni for 3 years in the role , In my scenario what would you do ?
https://redd.it/1erf1hm
@r_devops
Hey all, struggling with what to do with regards to my current role
My main issue is around a year ago a lot of the stuff which I would have been interested in has been abstracted away to managed vendors , from the management of our environments to the management of developer machines.
Anything network related is handled by either an internal network team or again our managed vendor
As such , there’s actually not much I have direct responsibilities over in any meaningful capacity.
I can feel my skills atrophying and it just feels like we’re secretaries for these other teams to tell them something is wrong, it really feels like just a glorified support role they slapped the name devops engineer on
We are barely involved in th development process for any new applications and don’t have much of any opportunities to practice anything
I’ve been trying to learn in my own time but it’s hard when you can’t utilise the skills in the work place
As someone who’s first job this is out of uni for 3 years in the role , In my scenario what would you do ?
https://redd.it/1erf1hm
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
I built a POC for a real-time log monitoring solution, orchestrated as a distributed system
A proof-of-concept log monitoring solution built with a microservices architecture and containerization, designed to capture logs from a live application acting as the log simulator. This solution delivers actionable insights through dashboards, counters, and detailed metrics based on the generated logs. Think of it as a very lightweight internal tool for monitoring logs in real-time. All the core infrastructure (e.g., ECS, ECR, S3, Lambda, CloudWatch, Subnets, VPCs, etc...) deployed on AWS via Terraform.
Feel free to take a look and give some feedback: https://github.com/akkik04/Trace
https://redd.it/1ergpf0
@r_devops
A proof-of-concept log monitoring solution built with a microservices architecture and containerization, designed to capture logs from a live application acting as the log simulator. This solution delivers actionable insights through dashboards, counters, and detailed metrics based on the generated logs. Think of it as a very lightweight internal tool for monitoring logs in real-time. All the core infrastructure (e.g., ECS, ECR, S3, Lambda, CloudWatch, Subnets, VPCs, etc...) deployed on AWS via Terraform.
Feel free to take a look and give some feedback: https://github.com/akkik04/Trace
https://redd.it/1ergpf0
@r_devops
GitHub
GitHub - akkik04/Trace: POC for a real-time log monitoring solution, orchestrated as a distributed system
POC for a real-time log monitoring solution, orchestrated as a distributed system - akkik04/Trace
API Observability Guide: Enhancing Reliability & Performance
One of these guest blogs did a pretty good job covering API observability including the pillars of it, what it is, components, and implementation of it. There are also a few advanced techniques, and I thought it might be good to share it here as an educational resource.
Any additional techniques that we may have missed are welcome but no pressure.
https://www.getambassador.io/blog/api-observability-enhancing-reliability-performance
https://redd.it/1erghrs
@r_devops
One of these guest blogs did a pretty good job covering API observability including the pillars of it, what it is, components, and implementation of it. There are also a few advanced techniques, and I thought it might be good to share it here as an educational resource.
Any additional techniques that we may have missed are welcome but no pressure.
https://www.getambassador.io/blog/api-observability-enhancing-reliability-performance
https://redd.it/1erghrs
@r_devops
www.getambassador.io
API Observability: Key to Boosting Reliability & Performance
Explore API observability to boost reliability and performance in your digital systems. Master essential tools for improved infrastructure management
Why is this happening
Suddenly started to face this problem while pressing Run Java of my Spring Boot App. If any of you beautiful souls faced it before, how did you work around it? I have this deadline i gotta fix this quick im sorry
The problem:
https://redd.it/1erj3j1
@r_devops
Suddenly started to face this problem while pressing Run Java of my Spring Boot App. If any of you beautiful souls faced it before, how did you work around it? I have this deadline i gotta fix this quick im sorry
The problem:
Failed to refresh live data from process service:jmx:rmi:///jndi/rmi://127.0.0.1:45556/jmxrmi after retries: 10Source: Spring Boot Toolshttps://redd.it/1erj3j1
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
DevOps lessons from building a global monitoring platform
Ever start a side project that spirals out of control? That's the story of my last year building UptimeCard, and I thought I'd share some DevOps war stories with you all.
It began innocently enough - just a simple uptime monitor. Fast forward, and I'm juggling a platform that's analyzing tech stacks for thousands of websites globally.
The first reality check hit when my cute little DigitalOcean setup choked at around 1000 monitored sites. Suddenly, I'm deep-diving into AWS documentation, trying to figure out how to scale this thing without breaking the bank. EC2, Lambda, DynamoDB - my new best friends and worst nightmares.
But here's the kicker - monitoring globally means dealing with, well, the globe. I naively thought I could run everything from a single region. You can't.
Then came the data deluge. Turns out, collecting and processing data from thousands of sites every minute is like drinking from a fire hose. I cobbled together a pipeline with Kinesis, and it's holding... for now.
Oh, and the irony of needing rock-solid monitoring for a monitoring service? Not lost on me. I've got CloudWatch alerts that would wake the dead. Because nothing says "professional" like your uptime monitor going down.
Infrastructure management became my nemesis. Started with manual setups (I know, I know), and quickly drowned in config hell. Terraform saved my sanity, but the migration was... let's call it character-building.
Security? A constant paranoia. When you're handling data from thousands of websites, every shadow looks like a potential breach. I'm now on a first-name basis with AWS's IAM documentation.
And let's not forget the cloud bill. I'm now a reluctant expert in auto-scaling groups and spot instances.
UptimeCard's at v1.0 now (https://uptimecard.com if you're curious), but it feels like I've aged a decade getting here. I'm sure there's still a ton to optimize.
So, what hard-learned lessons have you picked up from similar projects? Any tips for a battle-worn developer still figuring out this DevOps game?
I'm also toying with the idea of open-sourcing some of our DevOps scripts. Feels like it's time to give back to the community that's saved my bacon more times than I can count.
https://redd.it/1erjp83
@r_devops
Ever start a side project that spirals out of control? That's the story of my last year building UptimeCard, and I thought I'd share some DevOps war stories with you all.
It began innocently enough - just a simple uptime monitor. Fast forward, and I'm juggling a platform that's analyzing tech stacks for thousands of websites globally.
The first reality check hit when my cute little DigitalOcean setup choked at around 1000 monitored sites. Suddenly, I'm deep-diving into AWS documentation, trying to figure out how to scale this thing without breaking the bank. EC2, Lambda, DynamoDB - my new best friends and worst nightmares.
But here's the kicker - monitoring globally means dealing with, well, the globe. I naively thought I could run everything from a single region. You can't.
Then came the data deluge. Turns out, collecting and processing data from thousands of sites every minute is like drinking from a fire hose. I cobbled together a pipeline with Kinesis, and it's holding... for now.
Oh, and the irony of needing rock-solid monitoring for a monitoring service? Not lost on me. I've got CloudWatch alerts that would wake the dead. Because nothing says "professional" like your uptime monitor going down.
Infrastructure management became my nemesis. Started with manual setups (I know, I know), and quickly drowned in config hell. Terraform saved my sanity, but the migration was... let's call it character-building.
Security? A constant paranoia. When you're handling data from thousands of websites, every shadow looks like a potential breach. I'm now on a first-name basis with AWS's IAM documentation.
And let's not forget the cloud bill. I'm now a reluctant expert in auto-scaling groups and spot instances.
UptimeCard's at v1.0 now (https://uptimecard.com if you're curious), but it feels like I've aged a decade getting here. I'm sure there's still a ton to optimize.
So, what hard-learned lessons have you picked up from similar projects? Any tips for a battle-worn developer still figuring out this DevOps game?
I'm also toying with the idea of open-sourcing some of our DevOps scripts. Feels like it's time to give back to the community that's saved my bacon more times than I can count.
https://redd.it/1erjp83
@r_devops
UptimeCard
UptimeCard | Uptime For Innovators
Join UptimeCard to discuss and review the best web hosting providers. Get insights, tips, and find your perfect host.
Need Suggestions for Reducing Downtime During EKS Deployments
Hello everyone,
I could use some help or suggestions with a deployment issue we're facing.
Currently, we're deploying to EKS, using Atlas MongoDB, and storing some documents in S3. The challenge is that every time we deploy to production, we need to take the system offline, back up S3 (which takes about an hour due to a large number of files, even though the size is small), back up the database, then deploy and run the migration.
Does anyone have ideas on how we can reduce or eliminate this downtime?
https://redd.it/1erjuji
@r_devops
Hello everyone,
I could use some help or suggestions with a deployment issue we're facing.
Currently, we're deploying to EKS, using Atlas MongoDB, and storing some documents in S3. The challenge is that every time we deploy to production, we need to take the system offline, back up S3 (which takes about an hour due to a large number of files, even though the size is small), back up the database, then deploy and run the migration.
Does anyone have ideas on how we can reduce or eliminate this downtime?
https://redd.it/1erjuji
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
Resources to learn DevOps Project
Hi all,
Hoping you wonderful people can help.
I'm a project manager that moved into product management.
At present, I am product owner for Dynamics 365. One of the core issues we have faced has been single branching strategy. I'm currently in the process of moving us over fully onto Azure DevOps for us to automate testing and resolve the branching strategy allowing us to be more agile.
One area that I need help on is understanding how to use Azure boards, or the delivery plan section on DevOps.
Does anyone know any good, free content for me and my BA's to learn this?
https://redd.it/1erixho
@r_devops
Hi all,
Hoping you wonderful people can help.
I'm a project manager that moved into product management.
At present, I am product owner for Dynamics 365. One of the core issues we have faced has been single branching strategy. I'm currently in the process of moving us over fully onto Azure DevOps for us to automate testing and resolve the branching strategy allowing us to be more agile.
One area that I need help on is understanding how to use Azure boards, or the delivery plan section on DevOps.
Does anyone know any good, free content for me and my BA's to learn this?
https://redd.it/1erixho
@r_devops
Reddit
From the devops community on Reddit
Explore this post and more from the devops community
What do you monitor on your servers?
We've been developing the BlueWave Uptime Manager for the past 5 months with a team of 7 developers and 3 contributors. As we move towards expanding from basic uptime tracking to a comprehensive monitoring solution, we're interested in getting insights from the community.
For those of you managing server infrastructure,
What are the key assets you monitor beyond the basics like CPU, RAM, and disk usage?
Do you also keep tabs on network performance, processes, services, or other metrics?
Additionally, we're debating whether to build a custom monitoring agent or leverage existing solutions like OpenTelemetry or Fluentd.
What’s your take—would you trust a simple, bespoke agent, or would you feel more secure with a well-established solution?
Lastly, what’s your preference for data collection—do you prefer an agent that pulls data or one that pushes it to the monitoring system?
https://redd.it/1erkhef
@r_devops
We've been developing the BlueWave Uptime Manager for the past 5 months with a team of 7 developers and 3 contributors. As we move towards expanding from basic uptime tracking to a comprehensive monitoring solution, we're interested in getting insights from the community.
For those of you managing server infrastructure,
What are the key assets you monitor beyond the basics like CPU, RAM, and disk usage?
Do you also keep tabs on network performance, processes, services, or other metrics?
Additionally, we're debating whether to build a custom monitoring agent or leverage existing solutions like OpenTelemetry or Fluentd.
What’s your take—would you trust a simple, bespoke agent, or would you feel more secure with a well-established solution?
Lastly, what’s your preference for data collection—do you prefer an agent that pulls data or one that pushes it to the monitoring system?
https://redd.it/1erkhef
@r_devops
GitHub
GitHub - bluewave-labs/Checkmate: Checkmate is an open-source, self-hosted tool designed to track and monitor server hardware,…
Checkmate is an open-source, self-hosted tool designed to track and monitor server hardware, uptime, response times, and incidents in real-time with beautiful visualizations. Don't be shy, ...