Reddit DevOps
270 subscribers
6 photos
31.1K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
Issues with TFS agent version on Android Builds?

Hey guys so our dev team at our company started reported android build fails for our android apps. It looks like sometime in the last week (last successful android build was 5/3..however we only have 2 android apps we don't touch daily).

Wondering if anyone has seen this/gotten past this without just upgrading agent versions. I'm just wondering what happened that would force something like this and I hope this info can help others.

​

Required version as of some time in the last week: 2.182.1

Our current agent versions: 2.173

https://redd.it/n96yra
@r_devops
DoorDash Custom Canary Kubernetes Controller

Hey folks, I thought you might be interested in learning how we at DoorDash built a custom Kubernetes Canary controller on top of Argo Rollouts in the linked blog post. Let us know your comments and feedback!
https://doordash.engineering/2021/04/14/gradual-code-releases-using-an-in-house-kubernetes-canary-controller/

https://redd.it/n8zo60
@r_devops
Making the case for a new quality dimension for K8s apps

One of my mentors once said "Don't optimize before it works". I think that we could make a case that cloud native applications do already work. Hence, the next logical step is to think about running our apps with the optimal resource configuration. My team and I make the case that resource efficiency should be a key dimension for cloud native applications and I'd like to invite you to join our discussion next Thursday about best ways to integrate efficiency in our daily work. Find more details here:https://www.stormforge.io/event/crossing-kubernetes-performance-chasm/?utm\_medium=social&utm\_source=Reddit&utm\_campaign=crossingthechasm

https://redd.it/n7r4fb
@r_devops
Ad hoc jobs question

Hello everyone,

I was hoping to ask the experienced DevOps Engineers here for some help with the concept of ad-hoc jobs. I have read that "AWS CodePipeline and GitHub Actions do not cater for ad hoc jobs. AWS CodePipeline needs a trigger, and then runs a static pipeline. GitHub Actions is listening to git events. "

Our team is leaning towards using GitHub Actions and I am trying to determine if GitHub Actions not catering for ad-hoc jobs is something that we should seriously consider. However, I am not sure I understand clearly the concept of ad-hoc jobs. Could someone clarify what these ad-hoc jobs are, and what do they actually do? Is it a big downside that GitHub Actions do not cater for them? Any help will be much appreciated.

https://redd.it/n7q44g
@r_devops
Slack ChatOps for releases

Looking for solutions,

problem:

inherited project, release process is semi manual and cannot be completed without heavy engineer involvement.

​

proposition:

in order to win back time for engineers to refactor the release process into something a little more up to date, we should create a slack chatbot which can be used to send api requests to our build and release automation services.

​

I have done some research and the most accessible solution appears to be errbot.io which supports a sort of ACL which would be perfect for the gated release process our stakeholders require.

The solution needs to be fairly lightweight and written in an accessible language that won't require much up skilling from our engineering team should they need to support it.

I am about to enter the rabbit hole on this topic, I have no idea what the infrastructure will look like yet. Hopefully it can all be run from a lambda on AWS.

​

Any tips, ideas, warnings or references would be greatly appreciated at this point. Keen to hear what you've got for me DevOps!!!

https://redd.it/n7lv7w
@r_devops
New to cloud CI infrastructure (Bitbucket Pipelines in my case). What is the proper way to make a release?

I do traditional desktop software development with slow point releases, e.g. myapp_1.2.1.tar.gz. So no CD, I just build, compress, and upload to the Downloads section of my repo, from where people can manually obtain my release packages. I recently started using Bitbucket.

I already set up Bitbucket Pipelines to launch a build whenever there's a commit. I'd like to start using it to create the actual release package: so manually click something to initiate my intention to make a 1.2.1 release from a commit, compile the app, run tests, update a header file with the string "1.2.1", create the archive myapp_1.2.1.tar.gz, and upload that.

I read the doc and learned how to do these steps but I can't tell how I'm supposed to send the desired version name to this pipeline.

Based on what I read (their doc isn't the best btw), I saw 2 ways:

* pipelines can be triggered based on a pushed git tag. So instead of making a release from the web UI, I could open a git terminal, create and push the git tag "release-1.2.1". This will trigger a pipeline configured to trigger on "release-*" tags. Anything in the pipeline can then extract "1.2.1" from the value of the $BITBUCKET_TAG envvar. This seems very unnatural to me, it inverts the flow I'm used to.

* I could abuse repo variables. Keep an CURRENT_RELEASE_VERSION=1.2.1 variable which I update before making a release. I don't like this because I could forget to update the variable, or click the Run Pipeline button by mistake when I'm not trying to make a release. And because of that silly stuff can happen like overwriting past versions, overwriting git tags, etc. Also, this doesn't communicate the explicit intent of making a release.

It feels like there's a clean 3rd method I'm missing.

Here's a skeleton of the pipeline I had in mind before I started: https://paste.ubuntu.com/p/zmn8ffkkJ6/

https://redd.it/n9k1u8
@r_devops
Production Ready DevOps Books

Hi there,

I'm writing this post to ask for any books that discusses production-ready devops techniques/designs/prototypes. I'm moving our applications to Docker and i think it's ready to go to production. I also built Jenkins pipeline that does the build and deployment to my Swarm Cluster. It's surfing perfectly. But I'm interested in reading more about what others are building and using for their environments, best practices, security todos, and more. I know it's a big topic but there must be a book that discusses this.
My stack is: Java Spring Boot, Jenkins, Docker Swarm, Haproxy.

Thank you guys for the help.

https://redd.it/n7hfvt
@r_devops
data redundancy. Where to back stuff up to?

Hi all.

I have an app in Google Cloud platform. I have the following data:

* Bucketed misc files in storage (~ 1gb)
* Bucketed secondary files (~ 1tb and growing). If we lost these, it's not the end of the world, but not ideal.
* Database (~ 1gb)

What is the best way of keeping all that safe? I have the regular 7 day backups on the database.

I am most concerned about a scenario where we are held ransom, or we lose access to our account.

I would ideally like to store this data offsite, or perhaps in another cloud provider? What do people recommend?

Edit: I came across rsync.net. Seems like something that could be useful as a simple solution?

https://redd.it/n7gcku
@r_devops
How are you measuring DevOps performance?

Hi r/devops,


Many people here are familiar with the four key metrics identified by DORA for measuring performance of software development teams i.e. lead time, deployment frequency, change failure rate, mean time to recover

I'm curious to know some of the different ways in which you are measuring these metrics. Are there any well known tools/approaches that make this easy, or are you building internal applications to measure this stuff? e.g. Incrementing a counter in a data store after every successful deployment and pulling this data into a nice dashboard

I apologize if this is a simple question, just curious to see how others are measuring the impact of a devops culture

https://redd.it/n9o76m
@r_devops
How to persist volumes/filesystems in a Packer EBS AMI for use in newly created EC2 instances?



I'm trying to build an AWS AMI that has all my filesystems set up as I'd expect, i.e. /var, /var/log, /tmp etc. I am attempting to achieve this using packer in conjunction with the Ansible provisioner.

Here is my HCL2 build file

source "amazon-ebs" "example" {
ami_name = "test_ami ${local.timestamp}"
ami_description = "test ami with predefined filesystems ${local.timestamp}"
instance_type = "t2.micro"
region = "eu-west-2"
source_ami_filter {
filters = {
name = "amzn2-ami-hvm-2.0.*-gp2"
root-device-type = "ebs"
virtualization-type = "hvm"
architecture = "x86_64"
}
most_recent = true
owners = ["amazon"]
}
# EBS for root volume
launch_block_device_mappings {
device_name = "/dev/xvda"
volume_size = 10
volume_type = "gp2"
delete_on_termination = true
}
# EBS for data volume
launch_block_device_mappings {
device_name = "/dev/sdb"
volume_size = 5
volume_type = "gp2"
delete_on_termination = true
}
ssh_username = "ec2-user"
}

I then have ansible provisioners to set up my physical volumes, volume groups and logical volumes, along with some xfs filesystems. This all works fine during the Packer AMI build. I can verify using PACKER\_LOG=1 packer build . that the plays are successful in my ansible playbook.

Once the AMI is created, I have built an EC2 instance off of it, but all the work the Ansible playbook has done in setting up the aforementioned volumes and file systems has disappeared. For example, /dev/sdb1 doesn't exist when I run blkid or fdisk -l
. My /etc/fstab file has also disappeared.

I was under the impression that although I've selected delete\_on\_termination under launch\_block\_device\_mappings , that the snapshot created from the AMI build would be applied to any EC2 instances that were built from the AMI, therefore my physical volumes and filesystems would have been intact.

Am I misunderstanding this? If so, can anybody clarify where I'm going wrong?

https://redd.it/n9rtll
@r_devops
Delete CloudFormation Stack Including S3 Objects

I needed to create and tear down development environments. Deleting CloudFormation Stack has issue with S3 objects. S3 bucket can not be deleted if it has objects (to the best of my understanding). So I wrote a script which does:

1. Removes deletion protection from DB instances belonging to the stack
2. Deletes S3 objects including versions (10 in parallel) in buckets belonging to the stack
3. Issues delete stack command after the above is finished

The script is at https://github.com/ngs-lang/nsd/blob/master/aws/cloudformation/delete-stack.ngs

It is written in Next Generation Shell.

Hope that helps!

https://redd.it/n9sf9o
@r_devops
Spacelift Feature Reveal: Local Preview

Multiple times have we been asked to implement local preview, here on Reddit and elsewhere. Creating small commits all the time to see if what you’re writing will properly execute is tedious! So is setting up all necessary accesses and environment variables locally.

We’re glad to let you know this is now available!

From now on, by turning on `Enable local preview` on a Stack, you can preview changes based on the changes in your local directory, you just have to run `spacectl stack --id <stack-name> local-preview` and you’ll get the output streamed right into your terminal!

Here’s a demo of it:

Spacelift Local Preview - asciinema

To find our more about Spacelift, check out: https://spacelift.io

https://redd.it/n9zl4x
@r_devops
Apache atlas configuration conenction cassandra backend [help]

Hi,

for a future poc i need to deploy an apache atlas 2.1 stack

but i can't found the parameter for the cassandra backend connection.

if anyone got a link or hadalready made a implementation with password authen.

or a other sub reddit where some one can have an answer

this is my current config file if it can help.

&#x200B;

atlas.graph.storage.backend=cql
atlas.graph.storage.hostname=cassandra
atlas.graph.storage.cassandra.keyspace=JanusGraph

atlas.graph.storage.clustername=cassandra
atlas.graph.storage.port=9042

atlas.EntityAuditRepository.impl=org.apache.atlas.repository.audit.CassandraBasedAuditRepository
atlas.EntityAuditRepository.keyspace=atlas_audit
atlas.EntityAuditRepository.replicationFactor=1

atlas.graph.index.search.backend=solr
atlas.graph.index.search.solr.mode=cloud
atlas.graph.index.search.solr.zookeeper-url=zookeeper:2181
atlas.graph.index.search.solr.zookeeper-connect-timeout=60000
atlas.graph.index.search.solr.zookeeper-session-timeout=60000
atlas.graph.index.search.solr.wait-searcher=true

atlas.graph.index.search.max-result-set-size=150

atlas.notification.embedded=false
atlas.data=${sys:atlas.home}/data/kafka

atlas.notification.create.topics=true
atlas.notification.replicas=1
atlas.notification.topics=ATLAS_HOOK,ATLAS_ENTITIES
atlas.notification.log.failed.messages=true
atlas.notification.consumer.retry.interval=500
atlas.notification.hook.retry.interval=1000

atlas.enableTLS=false

atlas.authentication.method.kerberos=false
atlas.authentication.method.file=true

atlas.authentication.method.ldap.type=none

atlas.authentication.method.file.filename=${sys:atlas.home}/conf/users-credentials.properties


atlas.rest.address=https://localhost:21000

atlas.audit.hbase.tablename=apache_atlas_entity_audit
atlas.audit.zookeeper.session.timeout.ms=1000
atlas.audit.hbase.zookeeper.quorum=atlas-zookeeper:2181

atlas.server.ha.enabled=false
atlas.authorizer.impl=simple
atlas.authorizer.simple.authz.policy.file=atlas-simple-authz-policy.json
atlas.rest-csrf.enabled=true
atlas.rest-csrf.browser-useragents-regex=^Mozilla.*,^Opera.*,^Chrome.*
atlas.rest-csrf.methods-to-ignore=GET,OPTIONS,HEAD,TRACE
atlas.rest-csrf.custom-header=X-XSRF-HEADER

atlas.metric.query.cache.ttlInSecs=900

######### Gremlin Search Configuration #########

#Set to false to disable gremlin search.
atlas.search.gremlin.enable=false

thanks for any help

https://redd.it/n9zi16
@r_devops
Interview for new job.

I need some help preparing for an interview for infrastructure and Devops Engineer position.

I currently have a personal project I have complete which using AWS: S3 Bucket, GitHub, Terraform, Route53, Python, YAML


Previous Work Experience: 1st / 2nd Line Engineer, Mainly Supporting Azure, Window Servers and standard day to day support

University Experience: DB - Full UML Design, Java


Any advise ?


>Requirements

>
>Ensure smooth operation of our CI/CD pipelines.
>
>Ensure the security of our cloud infrastructure and internal company communications.
>
>Support and effectively communicate with other teams (QA, Software Engineering, Product Development, etc.)
>
>Optimize the cost of the cloud-based infrastructure.
>
>Be a driver for innovation & change, staying up to date with new AWS announcements/new releases.
>
>Deliver and maintain best practice, utilising modern security & testing standards.
>
>Report the progress of work to EVP.
>
>WE NEED

>
>Must-have

>
>Interpersonal Skills

>
>Effective communication both within the department, and with external suppliers.
>
>Amazon Web Services

>
>EC2
>
>VPC ( and/or equivalent TCP Networking skills )
>
>Networking & Security

>
>Good understanding of TCP networking, Firewalls, and Basic Routing.
>
>Linux/Unix

>
>Linux/Unix Systems Admin Skills
>
>Windows

>
>Basic Exposure to Windows Servers, and willingness to learn.
>
>Configuration Management

>
>Puppet/Ansible or transferable skills in any similar systems.
>
>Programming/Scripting

>
>Basic Python
>
>Bash shell (or a flavour thereof)
>
>Source control

>
>Git
>
>Nice-to-have/willing to learn

>
>Amazon Web Services
>
>Other

>
>Experience with Ubuntu and/or CentOS.
>
>Docker
>
>Elasticsearch / Logstash / Kibana
>
>Monitoring Nagios, Munin & New Relic ( or Similar Systems )
>
>DDOS protection tools (Cloudflare)

https://redd.it/n9wda1
@r_devops
Introducing db-auth-gateway our Database Authentication Proxy (Blog Post)

We recently wrote an in house fork/re-implementation of cloudsql-proxy and I thought it would be interesting to talk about why, and how we did it. Hope you enjoy and would love your feedback!

Journey of a Cloud SQL Packet:

https://medium.com/kloeckner-i/journey-of-a-cloud-sql-packet-26b546db43e9

https://redd.it/n9u8u1
@r_devops
I just had to ask. I have done what the web says to do but I am still getting an HTTP 403 error in my Jenkins configuration

I have followed what people said they did. I used an API Token I generated from my user interface in Jenkins as a shared secret between it and the git repo but the error keeps showing up. I also chose the GitHub hook trigger I am an absolute beginner and this is my first trail. What could be happening?

https://redd.it/n9wa9b
@r_devops
Pulumi in go feels like trying to shove Typescript in a go shaped box. Am I the only one who feels that way?

Hello!

I recently tried my hand at Pulumi after working with Terraform.

I'm happy with terraform, but I wanted to see what all the fuss was about. Being a fan of go, I tried to create a EKS cluster with Pulumi in go, but oh my god it feels so wrong.

Feels like trying to shove typescript into a go shaped box, and I hate it. Am I the only one who feels that? Is it better in other languages?

I'm going to stick with terraform for now.

https://redd.it/n9qipq
@r_devops
Deploying code

Hi. I'm no devops (I'm now I guess), but I do have a two man project and I'm the one doing the infra. As of now all the automation works pretty good. Apart from the actual code deployments.

What I'm doing currently is that I build on gitlab (using my runner as I've ran out of quota already). Package an RPM, publish it to gitlab's generic package registry and curl an endpoint on AWX to download it to my RPM repo.

What I was planning to do now was to curl another endpoint (or instead of play trigger a workflow in the previous step) somehow wait until the RPM download finished and now the tricky part - elegantly get approle credentials from hashi vault (I failed miserably), update the RPM on the box (easy).

The thing is, I'm running AWX currently in docker. I don't have anything against docker, but I'd much rather run it outside of it. But that's kinda not doable as Red Hat is pushing it towards k8s. And I'm not running k8s. And I don't have budget for Tower either.

So the question, finally. What to replace AWX with? I'm fine with Ansible. I'd like to avoid shell scripts. I'm planning to look into Ansible semaphore. Is it still as good as people on Reddit said 3 years ago? (this should probably go to r/Ansible). I've looked at terraform, but that seems like something a bit different - I'm also not creating VMs at will. Everything seems to be either configuration management or infrastructure management. I'm missing some tools in between. What would you suggest me to look at?

Edit: Now I'm thinking, would Nomad be a good fit? I guess since I'm already on the hashi stack, why not?

https://redd.it/n9neoe
@r_devops
single dashboard for monitoring/apm

hey guys, wanted to see if you guys have any opinion about having a single dashboard to visualize your application. in the past, i used stuff like cacti and new relic. of course, some of features at my current job is replaced with prometius and grafana, but it doesn't give me the tracing of a new relic and managing the infrastructure for this is too much of a burden. i actually thought new relic was much easier, but it's not available at my current job. it's a nice and simple UI for my app and my infrastructure.

i was thinking of leveraging AWS and X-Ray, but aws logging me out all of the time, and forcing me to login and refresh all the screen is less than ideal. plus, i have three login actions - username, captcha, and google authenticator. i then need to go around each screen and do a refresh. it doesn't provide that single screen they everybody can look at to see how the application is doing.

https://redd.it/n9ndr4
@r_devops
Best way to provision email addresses for developer / business services?

What are ways that your team has divided up email accounts that manage things like

* GitHub Organizations
* Twilio / Send Grid
* First Azure account, etc

We are setting up a new domain...

At previous shops I’ve been with,
they will have an address like “[email protected]” to register billable accounts to do with development.

I was pretty much going to do it that way, but had an idea to setup separate domain emails into “business” and “developer” to separate access to dev critical accounts and root emails for stuff like Quickbooks or Trello.

What are y’all’s thoughts?

https://redd.it/n9k1lc
@r_devops