Reddit DevOps
267 subscribers
30.9K links
Reddit DevOps. #devops
Thanks @reddit2telegram and @r_channels
Download Telegram
How are you guys using machine learning models inside services and pipelines?

I am just curious to know how are machine learning models deployed. For example, are you guys deploying it on a GPU enabled instance and keeping the GPU up and running 24/7 or are you using serverless trigger-based services that are doing the job for you?


In our case what we need to do is, run a service once in 24 hours. What is the best way to do this if our models require GPU? And it's a 3 step process i.e. 3 different models in each step are being used.


What can be the best way to do this?


Thanks!

https://redd.it/fx2d1n
@r_devops
Looking for a tool like pagerduty

I'm looking for something like pagerduty for the company I work for we are looking for something to improve the experience for customers of our managed services offering.
Not sure if pagerduty is even what I'm looking for so I thought I'd ask here for suggestions.
Our requirements are quite simple we already have a ticketing system, we also have monitoring and all we need is a tool to get hold of a person that can look at a given issue that comes up and to kind of annoy that person till they acknowledge that they've been reached. Would be cool if it had an API too.
The place I work for does managed services for cloudservices and variations of Kubernetes offerings.

What I basically want to get out of this is:

* Improved customer experience.
* Some visibility for customers in terms of if we are meeting SLA.
* Some cool reporting functionality
* Ability to resolve issues quickly and smoothly both for the team responding to relevant issue and for the customer that the resource belongs to.

Would love to hear your suggestions?

What I've looked at so far:

* Pagerduty
* OpsGenie
* uptimerobot
* site24x7
* zenduty
* victorops

Thanks for your suggestions and time

https://redd.it/fx6576
@r_devops
Unable to Pull the image from docker private repository

"Back-off pulling image "image name""

I am getting above error in kubernetes .From docker private repository its unable to pull the image.

From Public its possible.Anybody please help for this issue.

https://redd.it/fx570c
@r_devops
Project management, spring planning, and code tracking software?

I'm starting a new gig in a startup with a few developers and a couple designers along with overseas developers. We are looking for a tool we can all use that allows for some project management features, sprint planning with subtasks and backlog, and documenting issues with code commits. We are aware of some pros/cons with tools such at Gitlab, Jira, Trello. I'm also aware of PM tools such as Monday.com, Basecamp, or Asana.

Does anyone have recommendations for a common tool for around 10 users that can do what we are looking for?

https://redd.it/fwzmlp
@r_devops
Unable to create Dockerfile on Windows laptop

When I run "docker image build -t web1 ." on my command prompt , I get this error:

"The command '/bin/sh -c mkdir /app' returned a non-zero code: 4294967295: failed to shutdown container: container 985630732d2ac2c1d0fd635071eb1723bc998baaa0a8414c21ab0af9bc07733d encountered an error during hcsshim::System::waitBackground: failure in a Windows system call: The virtual machine or container with the specified identifier is not running. (0xc0370110): subsequent terminate failed container 985630732d2ac2c1d0fd635071eb1723bc998baaa0a8414c21ab0af9bc07733d encountered an error during hcsshim::System::waitBackground: failure in a Windows system call: The virtual machine or container with the specified identifier is not running. (0xc0370110)".

I have done a lot of googling but no progress. My docker runs on windows container since I am using windows, "experimental": true, and enabled experimental features in Command-Line. All still no success.

https://redd.it/fwxyyt
@r_devops
Terraform: Storing State in S3

In this article I'm going over the why and how of storing state in S3 and using DynamoDB to provide state locking: [https://www.thecloud.coach/terraform/storing-state-in-s3/](https://www.thecloud.coach/terraform/storing-state-in-s3/)

I've actually worked with organisations in the past who stored their state in Git. It was a mess and instead of working with Terraform's built in state management functionality they designed a contrived set of processes for not stepping on each other's toes. Yikes!

With this article I wanted to demonstrate why it's important to use something like S3 and why it's equally as important to use locking too. Both are critical to protecting your state's life-cycle and the integrity of your infrastructure.

I use two examples as a "proof of concept": not using S3 and having two people's works collide and cause a nightmare; and a second example along the same lines but using the better approach, demonstrating how it's superior.

Has anyone else found state being stored in a Git repo in their organisation?

https://redd.it/fwwwib
@r_devops
Tomcat web manager with HTTPS

Has anyone ever configured tomcat web manager to use HTTPS instead of HTTP? There doesn’t seem to be much if anything online about this subject and tomcat documentation doesn’t address it at all. Any tips our sources would be greatly appreciated.

What I’ve tried:
I’ve configured the manager.xml and the tomcat-users.xml file and i have it working via cli over http. The web gui hasn’t worked for me yet even after allowing my desktop IP access via the “allow” tag in manager.xml, and configuring the “manager-gui” role and accompanying user with manager-gui permission.

I’ve also tried a few other things involving https, but the results were inconsistent and i was never actually able to reach the “/manager/list” context via cli to see the app statuses. This mostly consisted of using curl to hit the registered dns entry for the server that would match the dns hostname on my ssl cert for this tomcat server, using the SSL port configured for the server as well. I received the “403 access denied” page you would get on the web gui if u navigated to the “/manager/html” context without having gui access configured properly.

https://redd.it/fy2aau
@r_devops
Employment Prospects

Hello everyone! I work in Network Operations and have been working in Information Technology off an on for around 8 years or so. I would consider myself more of a Systems Technician level (in between an advanced helpdesk and Systems Administrator) but am kind of at a standstill of employment prospects around my region. I live in the North Louisiana jurisdiction (ArkLaTex) and will [potentially] be moving to East Texas this fall. I was wondering if anyone either (1) lives around that vicinity and/or (2) has remote work that is hiring?

https://redd.it/fy1gr6
@r_devops
Chocolatey hangs on installation

I have built a cli application which i have been try to deploy with chocolatey, on installation for test using

**choco install trolltower -dvy -s .**

i get the below results but it hangs forever

​

**Running Start-ChocolateyProcessAsAdmin -validExitCodes '0' -workingDirectory 'C:\\Users\\Leon Mwandiringa\\AppData\\Local\\Temp\\chocolatey\\trolltower\\1.0.0' -statements ' ' -exeToRun 'C:\\Users\\Leon Mwandiringa\\AppData\\Local\\Temp\\chocolatey\\trolltower\\1.0.0\\trolltower-x64-1\_0\_0.exe'**

**Unable to use current location for Working Directory. Using Cache Location instead.**

**Test-ProcessAdminRights: returning True**

**Elevating permissions and running \["C:\\Users\\Leon Mwandiringa\\AppData\\Local\\Temp\\chocolatey\\trolltower\\1.0.0\\trolltower-x64-1\_0\_0.exe" \]. This may take a while, depending on the statements.**

https://redd.it/fxznqv
@r_devops
New Google Publication - Building Secure and Reliable Systems

Google has released a new book in its SRE lineup, available for free online, Building Secure and Reliable Systems.

[https://landing.google.com/sre/books/](https://landing.google.com/sre/books/)

I hope to start reading this weekend.

https://redd.it/fxusw9
@r_devops
Building Secure and Reliable Systems - free digital book

I found this useful - the latest version of Building Secure and Reliable Systems - free in digital formats. Thought others might.

[https://landing.google.com/sre/books/](https://landing.google.com/sre/books/)

https://redd.it/fxsgr6
@r_devops
Quay.io -> Github Actions Dispatch

I don't know if anyone else struggled with integrating Quay's image build power with GitHub's versatility of Actions, but here's the solution in Go that I'd came up with and I would love to hear some feedbacks, or whether it could be useful for your Docker/Kubernetes pipelines or not.

​

[Quay Github Actions Dispatch](https://github.com/lewagon/quay-github-actions-dispatch)

https://redd.it/fxxvct
@r_devops
Any Best Practices for deployment and delivery of applications?

Do you guys have any books, blogs or articles that describe Best Practices for the deployment, delivery or distribution of applications in regard of CI/CD pipelines? Thanks in advance!

https://redd.it/fxvjh6
@r_devops
Connect Jenkins for SCMs with (Bitbucket/GitHub/GitLab/Azure Repos)

When design a pipeline of CI/CD for any application, one of the first steps is use a SCM (Source Code Management) to extract the code for build, test and deploy, but sometimes the configuration for the remote repositories change according with the plugin or options. I’ve used the next git repositories: Bitbucket, GitHub, GitLab and Azure Repos in many projects. Check my post for more information.

[https://medium.com/@ricardoupiicsa02/how-to-connect-jenkins-for-scm-with-bitbucket-github-gitlab-azure-repos-e115f1ca897f](https://medium.com/@ricardoupiicsa02/how-to-connect-jenkins-for-scm-with-bitbucket-github-gitlab-azure-repos-e115f1ca897f)

https://redd.it/fxva6a
@r_devops
Fluentd: Trying to flatten json field

Hey Guys,

My docker container gives stdout in json format, so the log key within fluentd output becomes a nested json

I m trying to flatten the log key value,
example:
`{"timestamp":"utc format",`

`"log":"{"docker":"output","in":"json"}",`

`"fluentd_tag":"some_tag"}`

Expected o/p

`{"timestamp":"utc format",`

`"docker":"output",`

`"in":"json",`

`"fluentd_tag":"some_tag"}`

I tried using record\_transformer plugin to remove key "log" to make the value field the root field, but the value also gets deleted.

Any suggestions would be great.

https://redd.it/fxt4uk
@r_devops
Google is giving away 3 SRE books for free online

Here is the direct [link](https://landing.google.com/sre/books/) where you can read these books for free online.

​

Disclaimer: I don't work for Google or related to this company in any way. I am also not an affiliate.

https://redd.it/fxnafb
@r_devops
DevOps Fundamentals

Hi guys!

​

Okay, i know that that DevOps is a set of practices that combines Dev and Ops.

But.

Im now a Junior "DevOps" at a company. I use linux, ansible, aws, scripting in bash - python,docker, networking, pods, etc. Everyting is cool, i love my work.

My question is can it be my disadvantage that i didn't go to university?I am thinking of theoretical knowledge. System design, etc.

Have any online resoruce where can i learn strong fundamentals? I know Udemy, Codecademy, etc. I mean university lectures.

​

Thank you very much your answer.

https://redd.it/fxoy5x
@r_devops
[Discussion] How to perform Log file analysis using Python?

My employer wants me to build an independent solution (AI or Data Science based) which can be used to analyse any user or system written log files to retrieve insights and make predictions like 

when will system require a reboot? 

What caused that high CPU or memory utilisation? 

Or any other insights which could be displayed or helpful to the firm.


For the prediction accuracy part, they want the predictions to be accurate at least 30% of the time.


I'm completely clueless about where and how to start it. I did some Google search but it didn't work for me.




Any path guidance would be highly appreciated.


Thanks

https://redd.it/fxmvty
@r_devops
AWS Code* tool chain

Hey all, I'm spinning up an application running on AWS for a project I'm working on at home. The architecture will leverage their serverless products. I thought this would be an excellent time to dig into CodeBuild, CodePipeline, CloudFormation and CodeStar.

Now, I've been a developer, architect and tech coach for 25 years. That's not to say I'm any kind of awesome. That's only to say that according to my CV, I shouldn't be an idiot. Yet, after a week of pounding my head on my keyboard to get a simple React, S3-hosted webapp calling API Gateway and a Spring Boot lambda in the backend to build, test, and deploy using AWS's toolchain, I can only arrive at one conclusion: I'm an idiot.

Are these tools as obtuse as I think they are? Is this easier than I'm making it out to be?

I'm considering wiping the slate clean and just hosting a Jenkins container on a laptop in the closet under my stairs. I know Jenkins is a dusty, barnacle encrusted CI machine but it's what I know and I can get it up and running in under an hour. At this point, I'm looking for the shortest line between two points.

What is y'all's collective experience with AWS Code\_ tool chains? Better than my experience?

https://redd.it/fyi3gh
@r_devops
PR based deploys?

Hey everybody,

I recently joined a firm where the SDLC includes continuous deployment from pull requests, rather than from master. It seems to work for them, but I don't understand the thinking behind this. I've been away from programming for about 5 years. Is this a new idea, unique to my firm, or something else that I've just never run across.

To be clear, after a PR builds, tests pass and there is a review/approval, the artifact is deployed to prod. We don't build from master at all.

https://redd.it/fxmunl
@r_devops