Trouble Launching Docker on Windows
So I have been trying to get into the DevOps thing over the past two years. I used to have a laptop that ran Docker Desktop perfectly, allowing me to mess with containers, and run Kubernetes using Minikube.
​
Now I lost the laptop and bought me a desktop, which has refused to run the docker engine completely. I have tried a number of options, including running clusters with Hyper-V as the driver to no avail. The Desktop runs on legacy BIOS, but I was told this should not be a problem. After a little troubleshooting I realized that Docker Desktop fails to install dockerd.exe on my system, so the Engine cannot start, and neither does the daemon (am I even getting the terms right?), so it looks like I'll have to build from source. I am told though that this is complicated and I may end up with issues even then.
​
It has been a seven month journey of troubleshooting with trial and error and I am just about to give up on this. Has any of you ever faced this? Anyone know a workaround.
My computer's specs are:
HP ProDesk 600 G1 SFF (2014)
Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz 3.20 GHz
Installed RAM: 16.0 GB
BIOS mode: Legacy
Virtualization: Enabled
Hyper-V: Running
​
I have tried Docker on WSL, KIND, Minikube and a few other steps from the Docker documentation and stack overflow but haven't had any success. When I have to, I typically spin up a cloud instance, which would be expensive for everyday practice. BTW I'd love to sign up for the CKA and CKAD exam later this year, if anyone is wondering.
https://redd.it/10kr6ll
@r_devops
So I have been trying to get into the DevOps thing over the past two years. I used to have a laptop that ran Docker Desktop perfectly, allowing me to mess with containers, and run Kubernetes using Minikube.
​
Now I lost the laptop and bought me a desktop, which has refused to run the docker engine completely. I have tried a number of options, including running clusters with Hyper-V as the driver to no avail. The Desktop runs on legacy BIOS, but I was told this should not be a problem. After a little troubleshooting I realized that Docker Desktop fails to install dockerd.exe on my system, so the Engine cannot start, and neither does the daemon (am I even getting the terms right?), so it looks like I'll have to build from source. I am told though that this is complicated and I may end up with issues even then.
​
It has been a seven month journey of troubleshooting with trial and error and I am just about to give up on this. Has any of you ever faced this? Anyone know a workaround.
My computer's specs are:
HP ProDesk 600 G1 SFF (2014)
Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz 3.20 GHz
Installed RAM: 16.0 GB
BIOS mode: Legacy
Virtualization: Enabled
Hyper-V: Running
​
I have tried Docker on WSL, KIND, Minikube and a few other steps from the Docker documentation and stack overflow but haven't had any success. When I have to, I typically spin up a cloud instance, which would be expensive for everyday practice. BTW I'd love to sign up for the CKA and CKAD exam later this year, if anyone is wondering.
https://redd.it/10kr6ll
@r_devops
Reddit
r/devops - Trouble Launching Docker on Windows
Posted in the devops community.
Alternative to Atlassian Jira and Confluence
Dear all,
Can you recommend a viable alternative to Jira and Confluence? Costs are rising everywhere and I was asked to look into cheaper viable alternatives. Any thoughts?
Context:
Engineering org of about 250 people
Current use of Jira is pretty standard, confluence mainly for documentation (private and for emerging concepts which have not made it to the ‘official’ documentation yet) and exchange of information/ thoughts. Users are mainly software architects, enterprise architects, devs, qa, etc.
Thanks
https://redd.it/10ksowi
@r_devops
Dear all,
Can you recommend a viable alternative to Jira and Confluence? Costs are rising everywhere and I was asked to look into cheaper viable alternatives. Any thoughts?
Context:
Engineering org of about 250 people
Current use of Jira is pretty standard, confluence mainly for documentation (private and for emerging concepts which have not made it to the ‘official’ documentation yet) and exchange of information/ thoughts. Users are mainly software architects, enterprise architects, devs, qa, etc.
Thanks
https://redd.it/10ksowi
@r_devops
Reddit
r/devops on Reddit
Alternative to Atlassian Jira and Confluence - No votes and 2 comments
Reliable managed CI with SLA?
I'm fed up with my Github actions hanging Queued state forever. It is affecting business and our output. Any experience with other providers? Maybe even something with an SLA, so at least we are compensated for their downtime. All I'm looking for is a tool that, right as someone commits, IMMEDIATELY runs CI. Always.
https://redd.it/10kt02n
@r_devops
I'm fed up with my Github actions hanging Queued state forever. It is affecting business and our output. Any experience with other providers? Maybe even something with an SLA, so at least we are compensated for their downtime. All I'm looking for is a tool that, right as someone commits, IMMEDIATELY runs CI. Always.
https://redd.it/10kt02n
@r_devops
Reddit
r/devops on Reddit
Reliable managed CI with SLA?
Tool to use local IDE for remote development
Hi everyone,
I actually plan to pass my development server on server side and dev in remote.
For the first test i use vs code with the remote connection extension and it works well.
But i want to use Jetbrains IDE and Gateway seams not ready... It crash all the time and take the server with im.
So i'am searching for a tools that make a ssh tunnel for use Jetbrains on 'local' folder.
For the moment i test mutagen.io, looks good but not perfectly stable.
Do you know a solution/tool that can suits my case ?
Thanks a lot !
(maybe this post should go en dev not devops, idk sorry)
https://redd.it/10ktydo
@r_devops
Hi everyone,
I actually plan to pass my development server on server side and dev in remote.
For the first test i use vs code with the remote connection extension and it works well.
But i want to use Jetbrains IDE and Gateway seams not ready... It crash all the time and take the server with im.
So i'am searching for a tools that make a ssh tunnel for use Jetbrains on 'local' folder.
For the moment i test mutagen.io, looks good but not perfectly stable.
Do you know a solution/tool that can suits my case ?
Thanks a lot !
(maybe this post should go en dev not devops, idk sorry)
https://redd.it/10ktydo
@r_devops
From Operations to Dev Ops
Hello all, would like a bit of advice from devops folks in here. I've been in the operation side of things for 14 odd years (mostly incident, change and release management) but I want to move on to DevOps. I have quit my job 6 months ago but I am now ready to start looking for job. Can you give me some tips on what kind of skills I should get into?
​
Just a little bit of background - my main skills were troubleshooting and resolving issues in production AWS and on-premise environment. I have AWS Solutions Architect - Associate, some high level knowledge on Kubernetes, and usage of Splunk/Datadog. Zero on programming language. I am finding job hunting a huge challenge for me since I quit as well.
​
Hope you can give me advice and thank you very much in advance!
https://redd.it/10kuiw5
@r_devops
Hello all, would like a bit of advice from devops folks in here. I've been in the operation side of things for 14 odd years (mostly incident, change and release management) but I want to move on to DevOps. I have quit my job 6 months ago but I am now ready to start looking for job. Can you give me some tips on what kind of skills I should get into?
​
Just a little bit of background - my main skills were troubleshooting and resolving issues in production AWS and on-premise environment. I have AWS Solutions Architect - Associate, some high level knowledge on Kubernetes, and usage of Splunk/Datadog. Zero on programming language. I am finding job hunting a huge challenge for me since I quit as well.
​
Hope you can give me advice and thank you very much in advance!
https://redd.it/10kuiw5
@r_devops
Reddit
From Operations to Dev Ops
Posted in the devops community.
How do you add a migration script that should only run once?
I am thinking I just have to create a sh file, put it inside the repo, and then mount the file into the volume by mapping the repo folder into a docker image folder, and then put
ENTRYPOINT /my-script.sh;
inside the docker-compose file and then remove the sh file and the ENTRYPOINT command, but I am wondering if there's a better way.
https://redd.it/10kod31
@r_devops
I am thinking I just have to create a sh file, put it inside the repo, and then mount the file into the volume by mapping the repo folder into a docker image folder, and then put
ENTRYPOINT /my-script.sh;
inside the docker-compose file and then remove the sh file and the ENTRYPOINT command, but I am wondering if there's a better way.
https://redd.it/10kod31
@r_devops
Reddit
How do you add a migration script that should only run once?
3 votes and 5 comments so far on Reddit
Advice for a beginner
Hi all.
So I am just learning web development, and can't decide what to go for, frontend, backend or fullstack, as it it defines what to learn from now.
I personally think backend is more interesting for mw, but when I look for it, it's so much more complicated than frontend, in meaning there is too much options. I don't even know which framework/language is better to learn, while learning new one is a long time for me. Maybe I'd go for Ruby with RoR, or think for future and learn something like Rust to use it in desktop development as well, while there are many other options (Iguess JS isn't best for backend?). I am lost.
While in case of frontend it's at least clear what to learn and where to start with. So basically my question is: is it possible to change specialization after some time of work from frontend to backend? Is it easy? Or employers wouldn't count my frontend experience as an experience for that? I hope after 6-12 months I might figure out for myself what is better to learn for backend etc.
If you have any other advice, please do so. You might help me figure out this path.
Share you experience if you want as well :)
https://redd.it/10kx1a2
@r_devops
Hi all.
So I am just learning web development, and can't decide what to go for, frontend, backend or fullstack, as it it defines what to learn from now.
I personally think backend is more interesting for mw, but when I look for it, it's so much more complicated than frontend, in meaning there is too much options. I don't even know which framework/language is better to learn, while learning new one is a long time for me. Maybe I'd go for Ruby with RoR, or think for future and learn something like Rust to use it in desktop development as well, while there are many other options (Iguess JS isn't best for backend?). I am lost.
While in case of frontend it's at least clear what to learn and where to start with. So basically my question is: is it possible to change specialization after some time of work from frontend to backend? Is it easy? Or employers wouldn't count my frontend experience as an experience for that? I hope after 6-12 months I might figure out for myself what is better to learn for backend etc.
If you have any other advice, please do so. You might help me figure out this path.
Share you experience if you want as well :)
https://redd.it/10kx1a2
@r_devops
Reddit
r/devops on Reddit: Advice for a beginner
Posted by u/targa_d - No votes and 1 comment
Is Build Systems DevOps or just Build Systems Engineering?
My take is given building deb/rpm packages & container image artifacts, licenses & key management that's automated across sub components of a large multi C or Java based project being done through repos (which could include build tools like make or maven) executed & orchestrated from pipelines the repos build order and where they become dependencies would suggest it falls under DevOps? Especially with pipelines being core to supporting the complexities of compiling integrate projects and distributing these builds for an efficient CD work environment to the standard Dev, test, prod spaces.
For me it's something I haven't came across before and could not find more than the typical talk of maven, or other build tools. Here I'm asking to focus all the work around having those build tools managed and interoperability in an automated pipeline/cron job of scripts that have encoded business logic for building such a large mutli repository projects.
I know platform development is also being thrown around. To me I'm guess that would be one level before the builds/distribution and that would be writing code (not TF, or infra) for what needs to be compiled to provided developers a foundation to do there work.
Note: this not making considerations of k8s and deployments. This is intended to be a dialog on building large projects and where does build systems (preferably c binary builds) overlap DevOps.
https://redd.it/10kwwe7
@r_devops
My take is given building deb/rpm packages & container image artifacts, licenses & key management that's automated across sub components of a large multi C or Java based project being done through repos (which could include build tools like make or maven) executed & orchestrated from pipelines the repos build order and where they become dependencies would suggest it falls under DevOps? Especially with pipelines being core to supporting the complexities of compiling integrate projects and distributing these builds for an efficient CD work environment to the standard Dev, test, prod spaces.
For me it's something I haven't came across before and could not find more than the typical talk of maven, or other build tools. Here I'm asking to focus all the work around having those build tools managed and interoperability in an automated pipeline/cron job of scripts that have encoded business logic for building such a large mutli repository projects.
I know platform development is also being thrown around. To me I'm guess that would be one level before the builds/distribution and that would be writing code (not TF, or infra) for what needs to be compiled to provided developers a foundation to do there work.
Note: this not making considerations of k8s and deployments. This is intended to be a dialog on building large projects and where does build systems (preferably c binary builds) overlap DevOps.
https://redd.it/10kwwe7
@r_devops
Reddit
r/devops - Is Build Systems DevOps or just Build Systems Engineering?
5 votes and 1 comment so far on Reddit
Which DevOps Tool Am AI? Game
I just stumbled upon this super fun and addictive game that portrays DevOps Tools as humans.
Check it out here >> https://www.gofirefly.io/blog/devops-ai-challenge
https://redd.it/10ky11g
@r_devops
I just stumbled upon this super fun and addictive game that portrays DevOps Tools as humans.
Check it out here >> https://www.gofirefly.io/blog/devops-ai-challenge
https://redd.it/10ky11g
@r_devops
www.gofirefly.io
What if DevOps tools looked like humans?
Have you ever wondered how DevOps tools could be brought to life? Well, imagine no more! With the power of AI image generators, we can now turn those abstract concepts into human representations that are not only easier to understand but also visually appealing.
What is the correct way to run checks with GitHub actions?
Greetings,
I am using
The end product is a docker image. And different stages (like
The main "check" mechanism is like this: When a PR targets
However, sometimes new environment variables are introduced (or something like that) and, even if the building is successful for the
I can think of two different ways:
1. Running checks for all environments (
2. Setting all the
How do you (or would you) handle this kind of a scenario?
https://redd.it/10l25af
@r_devops
Greetings,
I am using
development -> staging -> main branches. main is the most stable release branch. staging is for testing deployments, and development is the actual development branch. Feature and bug fix branches originate from, and merged into the development. Pushes to staging and main branches trigger release and deployment workflows.The end product is a docker image. And different stages (like
test, staging, production) are built with different environment variables.The main "check" mechanism is like this: When a PR targets
development branch, linting checks are done, a test docker image is built and some tests are run. So, if a development passes this checks, it is considered as safe.However, sometimes new environment variables are introduced (or something like that) and, even if the building is successful for the
development branch, building fails on staging or main branches. And these failures occur when the actual release is done. I want to catch those issues sooner.I can think of two different ways:
1. Running checks for all environments (
test, staging, production) on the PR that targets the development branch. This ensures the development is surely safe, for all stages. But it comes with a cost, checks take a long time.2. Setting all the
staging, main branches as protected, too. And allowing these branches to be updated only by PRs (e.g. A PR that merges development into staging). And running individual checks for that stage on that PR. This ensures there won't be any build failures, however, when a check fails it will require going back and creating a new PR targeting development that will fix the checks. It doesn't sound like the best way.How do you (or would you) handle this kind of a scenario?
https://redd.it/10l25af
@r_devops
Reddit
r/devops on Reddit
What is the correct way to run checks with GitHub actions?
CloudNativeSecurityCon North America 2023
I am excited to be speaking at the first CloudNativeSecurityCon North America, happening in Seattle from February 1-2. I will be talking about Demystifying Zero Trust for Cloud Native Technologies and would love to chat with you about your experiences if you are interested!
Come talk to me about Zero Trust while you are onsite or think about registering if you haven’t already!
https://events.linuxfoundation.org/cloudnativesecuritycon-north-america/register/
https://redd.it/10l3zdr
@r_devops
I am excited to be speaking at the first CloudNativeSecurityCon North America, happening in Seattle from February 1-2. I will be talking about Demystifying Zero Trust for Cloud Native Technologies and would love to chat with you about your experiences if you are interested!
Come talk to me about Zero Trust while you are onsite or think about registering if you haven’t already!
https://events.linuxfoundation.org/cloudnativesecuritycon-north-america/register/
https://redd.it/10l3zdr
@r_devops
Linux Foundation Events
Register | Linux Foundation Events
Register Now! Quick Note: We never sell attendee lists or contact information, nor do we authorize others to do so. If you receive an email claiming to sell an attendee list for a Linux Foundation or…
how to scrape data from sonarqube
I'm a student intern. I have been assigned a job to automate extraction of data like bugs, vulnerabilities, security hotspots, etc and store it in tabular form.
How do I it?
https://redd.it/10kq35l
@r_devops
I'm a student intern. I have been assigned a job to automate extraction of data like bugs, vulnerabilities, security hotspots, etc and store it in tabular form.
How do I it?
https://redd.it/10kq35l
@r_devops
Reddit
r/devops on Reddit
how to scrape data from sonarqube - 1 vote and 4 comments
lambda API deployment
How do we create API documentation in a serverless lambda API deployment?
We create swagger API documentation for non-serverless deployments. Is there anything similar ?
https://redd.it/10knwd1
@r_devops
How do we create API documentation in a serverless lambda API deployment?
We create swagger API documentation for non-serverless deployments. Is there anything similar ?
https://redd.it/10knwd1
@r_devops
Reddit
r/devops on Reddit: lambda API deployment
Posted by u/anacondaonline - 1 vote and 4 comments
Recommendations on testing with M1/M2 chips on Mac Arch?
TLDR: How do you test out the new Mac Arch on your CI/CD pipelines?
Hello everyone. Im a lead over where I work at that supports a huge amount of scripts/architectures/etc... We have repos of tools that internal developers can use to simplify their lives. It allows them to get setup from day 1 with a dev env. It takes 2 minutes for the installation to complete and setup.
We use MacBook Pros for all of our development machines. These were standardized and mostly ran on intel....until recently. With the introduction of hte M1/M2 chips, certain brew installations have failed. We have been playing wackamole with each of the new updates that come put and we are starting to feel the pain.
Is there a service or way we can test these scripts out within a CI/CD architecture? We use CircleCI ATM but im open to just about anything.
https://redd.it/10l8fc0
@r_devops
TLDR: How do you test out the new Mac Arch on your CI/CD pipelines?
Hello everyone. Im a lead over where I work at that supports a huge amount of scripts/architectures/etc... We have repos of tools that internal developers can use to simplify their lives. It allows them to get setup from day 1 with a dev env. It takes 2 minutes for the installation to complete and setup.
We use MacBook Pros for all of our development machines. These were standardized and mostly ran on intel....until recently. With the introduction of hte M1/M2 chips, certain brew installations have failed. We have been playing wackamole with each of the new updates that come put and we are starting to feel the pain.
Is there a service or way we can test these scripts out within a CI/CD architecture? We use CircleCI ATM but im open to just about anything.
https://redd.it/10l8fc0
@r_devops
Reddit
r/devops on Reddit
Recommendations on testing with M1/M2 chips on Mac Arch?
Help deciding between two job offers.
I've been offered a DevOps engineer role from two different companies, both big enterprises. I've got experience with AWS and Azure.
Job 1:
- 100% on prem. No public cloud.
- Openstack, Elk, Jenkins, ansible, Python, Grafana, Prometheus
- Pay £60k + 5% bonus
- 1 or 2 days a week in the office.
Job 2:
- Hybrid cloud, on prem and Azure.
- Terraform, Python, Jenkins, Ansible, Github Actions, Grafana, Prometheus.-
- Pay £63k + 20% bonus.
- 2 days a week in office mandatory. Slightly further commute.
Weirdly, despite job 2 having a better tech stack, I'm leaning more towards job 1. Fully on-prem has me curious but it might take me out of the loop with public cloud. Job 1 is more network focused, there's quite a bit of internal legacy stuff it seems. Job 2 sits in the machine learning/data science division and has more exciting tech I suppose. I'm not a fan of Azure though.
Thoughts?
Edit: ideally I'd like to work on AWS as that's the most popular. Will taking an Azure gig make it harder for me to transition back to AWS?
https://redd.it/10la3dv
@r_devops
I've been offered a DevOps engineer role from two different companies, both big enterprises. I've got experience with AWS and Azure.
Job 1:
- 100% on prem. No public cloud.
- Openstack, Elk, Jenkins, ansible, Python, Grafana, Prometheus
- Pay £60k + 5% bonus
- 1 or 2 days a week in the office.
Job 2:
- Hybrid cloud, on prem and Azure.
- Terraform, Python, Jenkins, Ansible, Github Actions, Grafana, Prometheus.-
- Pay £63k + 20% bonus.
- 2 days a week in office mandatory. Slightly further commute.
Weirdly, despite job 2 having a better tech stack, I'm leaning more towards job 1. Fully on-prem has me curious but it might take me out of the loop with public cloud. Job 1 is more network focused, there's quite a bit of internal legacy stuff it seems. Job 2 sits in the machine learning/data science division and has more exciting tech I suppose. I'm not a fan of Azure though.
Thoughts?
Edit: ideally I'd like to work on AWS as that's the most popular. Will taking an Azure gig make it harder for me to transition back to AWS?
https://redd.it/10la3dv
@r_devops
Reddit
r/devops - Help deciding between two job offers.
Posted in the devops community.
Removing secondary disk
I'm using this video (https://www.youtube.com/watch?v=J4NCvIMuzVE) as a guide to start building out Windows VM's, and running into some questions. I'm trying to use the template located here ( azure-quickstart-templates/quickstarts/microsoft.compute/vm-simple-windows at master · Azure/azure-quickstart-templates (github.com)).
My issue is that the template creates but does not mount a 2nd disk. I want to remove that 2nd disk and only have the OS disk for testing. I found the section under variables, but removing it blows up the pipeline. What's the secret sauce to removing this from the JSON file? It's the "dataDisks" section.
https://redd.it/10l7rq6
@r_devops
I'm using this video (https://www.youtube.com/watch?v=J4NCvIMuzVE) as a guide to start building out Windows VM's, and running into some questions. I'm trying to use the template located here ( azure-quickstart-templates/quickstarts/microsoft.compute/vm-simple-windows at master · Azure/azure-quickstart-templates (github.com)).
My issue is that the template creates but does not mount a 2nd disk. I want to remove that 2nd disk and only have the OS disk for testing. I found the section under variables, but removing it blows up the pipeline. What's the secret sauce to removing this from the JSON file? It's the "dataDisks" section.
"osDisk": { "createOption": "FromImage", "managedDisk": { "storageAccountType": "StandardSSD_LRS" } }, "dataDisks": [ { "diskSizeGB": 1023, "lun": 0, "createOption": "Empty" } ] }, "networkProfile": { "networkInterfaces": [ { "id": "[resourceId('Microsoft.Network/networkInterfaces', variables('nicName'))]" } ] },https://redd.it/10l7rq6
@r_devops
YouTube
Azure DevOps: Deploy a Windows Server with a DevOps Pipeline and Key Vault
In this video, we go through the process of using Azure DevOps pipelines to deploy a simple Windows server with an Azure ARM template. We add ARM templates to a repo and connect Azure DevOps to a subscription with a service principle. We also go over connecting…
what the tech they use to build vocalcom remise?
hi guys , im working on running vocalcom software to our server but i need to know more info about it to choose the best hardware requirement (cpu , ram , drive) , so as you know you can guess the cpu and ram base on the tech they use to build the software (tech stack ) , so if you have any information about this software share with me and tanks a lot , https://www.capterra.com/p/240468/Vocalcom-Hermes360/#pricing
https://redd.it/10kvuw4
@r_devops
hi guys , im working on running vocalcom software to our server but i need to know more info about it to choose the best hardware requirement (cpu , ram , drive) , so as you know you can guess the cpu and ram base on the tech they use to build the software (tech stack ) , so if you have any information about this software share with me and tanks a lot , https://www.capterra.com/p/240468/Vocalcom-Hermes360/#pricing
https://redd.it/10kvuw4
@r_devops
Capterra
Vocalcom Hermes360 Pricing, Alternatives & More 2023 - Capterra
With the help of Capterra, learn about Vocalcom Hermes360, its features, pricing information, popular comparisons to other Call Center products and more. Still not sure about Vocalcom Hermes360? Check out alternatives and read real reviews from real users.
How can I speed up this data scrubber?
Alright,
I'm a jr devops engineer and I have been asked to write a data scrubber to remove sensitive information from every instance of the db besides prod. There is a Kubernetes job that compresses the contents of the prod db into a single
The job is written in bash so I stuck with a bash/sql (specifically MySQL) combo to accomplish the scrubbing. I got a working model, however, the database is large enough that it will take 20+ hours to run across all the tables. This is partially due to the fact that I have to process the data in batches to prevent consuming absurd amounts of memory in the cloud. Here are the basics of how it works.
​
​
1. The scrubbing starts after the database has been dumped using the
2. Spin up a local mysql instance in a separate k8s container and start processing the data table by table, file by file.
3. Create the schema for the target table using the provided schema file.
4. Write the target file's contents to the table and begin the scrubbing logic. I am using MD5 to hash the value, replacing all the characters to be integers or letters to match the column type and slicing the hashed value to meet the length constraints. Dates are being offset by a random number of days determined by the first 4 integers of the hashed date. (see code below)
5. Use the
6. Compress the directory with the scrubbed files/files that don't need to be scrubbed and push the
Here is the code but the field and table names are changed to something generic:
I know this is a lot so thanks if you made it this far. I'm just digging
Alright,
I'm a jr devops engineer and I have been asked to write a data scrubber to remove sensitive information from every instance of the db besides prod. There is a Kubernetes job that compresses the contents of the prod db into a single
.tar file using mydumper. The engineers/dev & sandbox environments pull this file and decompress it to update their db's. Naturally, I decided to build scrubbing logic in this job. The job is written in bash so I stuck with a bash/sql (specifically MySQL) combo to accomplish the scrubbing. I got a working model, however, the database is large enough that it will take 20+ hours to run across all the tables. This is partially due to the fact that I have to process the data in batches to prevent consuming absurd amounts of memory in the cloud. Here are the basics of how it works.
​
​
1. The scrubbing starts after the database has been dumped using the
mydumper tool. The output is anywhere between 1 .sql file to 200 .sql files, depending on table size.2. Spin up a local mysql instance in a separate k8s container and start processing the data table by table, file by file.
3. Create the schema for the target table using the provided schema file.
4. Write the target file's contents to the table and begin the scrubbing logic. I am using MD5 to hash the value, replacing all the characters to be integers or letters to match the column type and slicing the hashed value to meet the length constraints. Dates are being offset by a random number of days determined by the first 4 integers of the hashed date. (see code below)
5. Use the
mydumper tool to dupm the scrubbed table and overwrite the original .sql file with the scrubbed file.6. Compress the directory with the scrubbed files/files that don't need to be scrubbed and push the
.tar file to a bucket.Here is the code but the field and table names are changed to something generic:
echo 'scrubbing table'
mysql -D prod -u root -S /mysql-socket/mysql.sock < proddump/prod.table-schema.sql
for filename in proddumpnohj/*table*.sql; do
if [[ $filename != *schema*.sql ]]; then
echo "writing $filename to database"
mysql -D prod -u root -S /mysql-socket/mysql.sock < $filename
echo "scrubbing table $filename"
mysql -D prod -u root -S /mysql-socket/mysql.sock << "END"
SET FOREIGN_KEY_CHECKS=0;
UPDATE table SET string_value = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(SUBSTRING(MD5(string_value), 1, 20), 0, 'g'), 1, 'h'), 2, 'i'), 3, 'j'), 4, 'k'), 5, 'l'), 6, 'm'), 7, 'n'), 8, 'o'), 9, 'p'),
int_value = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(SUBSTRING(MD5(int_value), 1, 10), 'a', '1'), 'b', '2'), 'c', '3'), 'd', '4'), 'e', '5'), 'f', '6'),
date_value = DATE_ADD(appointment_date, INTERVAL REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(SUBSTRING(MD5(IFNULL(IFNULL(patient_id, date_value), 'NULL NULL')), 1, 4), 'a', '1'), 'b', '2'), 'c', '3'), 'd', '4'), 'e', '5'), 'f', '6') DAY_HOUR ),
email_and_phone_values = IF (
sent_to LIKE '%@%',
CONCAT(SUBSTRING(MD5(email_and_phone_values), 1, 20), '@scrubbed.com'),
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(SUBSTRING(MD5(email_and_phone_values), 1, 10), 'a', '1'), 'b', '2'), 'c', '3'), 'd', '4'), 'e', '5'), 'f', '6')
);
END
echo "dumping table"
`mydumper` -u root -r 250000 -S /mysql-socket/mysql.sock -B prod -t 8 -o /mnt/ssd_storage/scrubbedproddump -l 7200 -T 'table'
echo "deleting from table"
mysql -D prod -u root -S /mysql-socket/mysql.sock << "END"
TRUNCATE table;
END
echo "copying file to output"
ls scrubbedproddump
if test -f 'prod.table.sql'
then
cp /mnt/ssd_storage/scrubbedproddump/prod.table.sql /mnt/ssd_storage/$filename
else
cp /mnt/ssd_storage/scrubbedproddump/prod.table.00000.sql /mnt/ssd_storage/$filename
fi
rm -r /mnt/ssd_storage/scrubbedproddump
fi
done
I know this is a lot so thanks if you made it this far. I'm just digging
to see if there are obvious ways to speed this up in a major way. I know there are some smaller things like telling
https://redd.it/10lia5d
@r_devops
mydumper to overwrite the original file with it's output instead of creating another directory, writing to that, and copying the output file over the original. I'd love any advice.https://redd.it/10lia5d
@r_devops
Reddit
r/devops on Reddit
How can I speed up this data scrubber?
Network state stack Hackathon- up too 50K$ on prizes
Hi everyone 📷 I'm organizing a hackathon that combined on-chain gaming and social network (web3) If anyone wants to take a part as a developer or anything else and get the chance to earn prizes here is the link- https://ntwstate.org/ The prizes are up to 50K$
https://redd.it/10k6c4f
@r_devops
Hi everyone 📷 I'm organizing a hackathon that combined on-chain gaming and social network (web3) If anyone wants to take a part as a developer or anything else and get the chance to earn prizes here is the link- https://ntwstate.org/ The prizes are up to 50K$
https://redd.it/10k6c4f
@r_devops
ntwstate.org
The Network State Hackathon
MatchboxDAO: The Network State Virtual Hackathon
Why cloud is cheaper than on-prem
In one of my other posts I mentioned how cloud is cheaper than on-prem then got downvoted like I was somehow wrong.. I felt they were just trying to win an argument.. Let me explain why cloud is in-fact more cost-effective for most businesses. If you have a reasonable counter argument please explain, as im always willing to learn.
1. Autoscaling features help you automatically scale your infrastructure up or down in response to the ebbs and flows of traffic.. Which is super useful for saving money.
2. You don't have to buy expensive servers and mass storage devices
3. You have high-availability which ensures uptime and means you're not loosing money due to downtime. Because your applications are continuously available even during server crashes and failures, as the cloud infrastructure can automatically redirect traffic from failed instances to other healthy instances
4. You have powerful big data visibility/metrics/analytics features the cloud offers
5. You dont have to worry as much about network and security as its managed and baked into the cloud provider
6. You don't have to worry power costs or consumption
7. You don't have to worry about things physically breaking and hire computer technicians to maintain them
8. You don't have to worry about fires and other natural disasters
9. You have servers located around the world for optimal load balancing
10. You dont have to replace everything every 3-5 years to avoid failures, or keep them patched.
Also side note: NETFLIX is one of the biggest tech companies (FAANG) and they run on AWS aswell do many others.. Also mods I encourage you to enforce your own rule No.7
https://redd.it/10kp9dl
@r_devops
In one of my other posts I mentioned how cloud is cheaper than on-prem then got downvoted like I was somehow wrong.. I felt they were just trying to win an argument.. Let me explain why cloud is in-fact more cost-effective for most businesses. If you have a reasonable counter argument please explain, as im always willing to learn.
1. Autoscaling features help you automatically scale your infrastructure up or down in response to the ebbs and flows of traffic.. Which is super useful for saving money.
2. You don't have to buy expensive servers and mass storage devices
3. You have high-availability which ensures uptime and means you're not loosing money due to downtime. Because your applications are continuously available even during server crashes and failures, as the cloud infrastructure can automatically redirect traffic from failed instances to other healthy instances
4. You have powerful big data visibility/metrics/analytics features the cloud offers
5. You dont have to worry as much about network and security as its managed and baked into the cloud provider
6. You don't have to worry power costs or consumption
7. You don't have to worry about things physically breaking and hire computer technicians to maintain them
8. You don't have to worry about fires and other natural disasters
9. You have servers located around the world for optimal load balancing
10. You dont have to replace everything every 3-5 years to avoid failures, or keep them patched.
Also side note: NETFLIX is one of the biggest tech companies (FAANG) and they run on AWS aswell do many others.. Also mods I encourage you to enforce your own rule No.7
https://redd.it/10kp9dl
@r_devops
Reddit
r/devops - Why cloud is cheaper than on-prem
Posted in the devops community.