How do you deal with developers asking for production DB access?
In most of my positions I have been asked by developers to have access to production databases (read-only).
As that could have unwanted consequences like DoS, data leaks, etc I normally have to build some custom tools to give them some sort of access to either anonymized data or to run some EXPLAIN on the data.
What is your experience and what tools have you used?
Thanks!
https://redd.it/11fmo4l
@r_devops
In most of my positions I have been asked by developers to have access to production databases (read-only).
As that could have unwanted consequences like DoS, data leaks, etc I normally have to build some custom tools to give them some sort of access to either anonymized data or to run some EXPLAIN on the data.
What is your experience and what tools have you used?
Thanks!
https://redd.it/11fmo4l
@r_devops
Reddit
r/devops on Reddit: How do you deal with developers asking for production DB access?
Posted by u/Potential_Guest4936 - No votes and 4 comments
From EKS to ECS + Mongodb
I am using EKS for the moment where i deployed my backend app and also deployed mongodb that is using storing data on a EBS Volume , and my questions is , if i move to ECS how should i deploy my mongo deployment and how to mount the EBS ?
Thanks
https://redd.it/11f24xn
@r_devops
I am using EKS for the moment where i deployed my backend app and also deployed mongodb that is using storing data on a EBS Volume , and my questions is , if i move to ECS how should i deploy my mongo deployment and how to mount the EBS ?
Thanks
https://redd.it/11f24xn
@r_devops
Reddit
r/devops on Reddit: From EKS to ECS + Mongodb
Posted by u/Legitimate-Carry7285 - 1 vote and 4 comments
Elastic stack, ELK: Logs drop issue
We have on prem ELK stack, and the primary use is to get the pods logs. K8s have around 100+ micro services
Filebeats >1 Logstash> ES cluster
Issue: inconsistent logs drops ( hard to validate)
1. Can we have one input source and multiple pipeline? Recently we had some changes in pipe post that we saw this issue .. am thinking to make a new pipeline for that requirement
2. How can i find the root causes, what will be your approach
Below is a sample pipeline that has around 15 to 20 output conditions
INPUT
input {
beats {
port => "5044"
}
}
filter {
grok {
match => {"message" => "\[AUDIT\ %{GREEDYDATA:message}"]}
overwrite => "message"
addtag => ["audit"]
}
grok {
match => {"message" => ["\[ERROR\]"]}
addtag => "error"
}
}
OUTPUT
output {
if "audit" in tags {
elasticsearch {
hosts => "comma seprated es nodes
index => "audit-%{+YYYY.MM.dd}"
}
}
else if kubernetescontainername == "containername" {
elasticsearch {
hosts => comma seprated es nodes
index => "conrainername-%{+YYYY.MM.dd}"
}
.
.
.
https://redd.it/11f80y8
@r_devops
We have on prem ELK stack, and the primary use is to get the pods logs. K8s have around 100+ micro services
Filebeats >1 Logstash> ES cluster
Issue: inconsistent logs drops ( hard to validate)
1. Can we have one input source and multiple pipeline? Recently we had some changes in pipe post that we saw this issue .. am thinking to make a new pipeline for that requirement
2. How can i find the root causes, what will be your approach
Below is a sample pipeline that has around 15 to 20 output conditions
INPUT
input {
beats {
port => "5044"
}
}
filter {
grok {
match => {"message" => "\[AUDIT\ %{GREEDYDATA:message}"]}
overwrite => "message"
addtag => ["audit"]
}
grok {
match => {"message" => ["\[ERROR\]"]}
addtag => "error"
}
}
OUTPUT
output {
if "audit" in tags {
elasticsearch {
hosts => "comma seprated es nodes
index => "audit-%{+YYYY.MM.dd}"
}
}
else if kubernetescontainername == "containername" {
elasticsearch {
hosts => comma seprated es nodes
index => "conrainername-%{+YYYY.MM.dd}"
}
.
.
.
https://redd.it/11f80y8
@r_devops
Reddit
r/devops on Reddit: Elastic stack, ELK: Logs drop issue
Posted by u/thenoob_withcamera - 1 vote and no comments
Trunk based development deployment strategies
Trunk based development is the standard for branching today
It is confusing to me and would like to learn how you can perform deployments for different environments when all code is merged to the trunk?
https://redd.it/11f76zz
@r_devops
Trunk based development is the standard for branching today
It is confusing to me and would like to learn how you can perform deployments for different environments when all code is merged to the trunk?
https://redd.it/11f76zz
@r_devops
Reddit
r/devops on Reddit: Trunk based development deployment strategies
Posted by u/ThankfulRobber - 1 vote and 1 comment
Gitlab CI Service Not Reachable
Can anyone help me with my CI troubles? I have a project that sets up a Flask webserver via Docker. In CI, I'm trying to use that image as a service to run integration tests on it, such as checking how it handles bad or messy inputs. However, when I use the Docker image to start a service with an alias, pytest can't access it from the CI job.
The Dockerfile for the test image is [here](https://gitlab.com/kitchen-server/kitchen-server/-/blob/ci-cd/test/assets/Dockerfile#L24-26). The highlighted lines (everything below `ENTRYPOINT`) are the only differences between the base and test image.
Example job that's failing [here](https://gitlab.com/kitchen-server/kitchen-server/-/jobs/3848991470). (The Dockerfile exposes port 8080 and that's what the underlying program binds to, which is why I'm using that port in CI - is that incorrect?)
ERROR test/integration_tests.py - requests.exceptions.ConnectionError: HTTPConnectionPool(host='test-service', port=8080): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f74768f9f50>: Failed to establish a new connection: [Errno -2] Name or service not known'))
Speaking of, that's [here](https://gitlab.com/kitchen-server/kitchen-server/-/blob/ci-cd/ci/branch.gitlab-ci.yml) (ignore that the build job is commented out; I just didn't want to burn CI minutes running it while troubleshooting my testing stage). I define a test image tag, a service alias, and a service URL based on the alias on L24-26. Then in the failing job, I define a service using the test image tag as the name, and the alias from the alias variable. Then the script runs a couple pip installs before calling pytest on a specific file.
variables:
LATEST_TAG: $CI_REGISTRY_IMAGE:latest
IMAGE_TAG: $CI_REGISTRY_IMAGE:$CI_COMMIT_REF_SLUG
TEST_TAG: $IMAGE_TAG-test
TEST_SERVICE_ALIAS: test-service
TEST_URL: https://$TEST_SERVICE_ALIAS:8080
[...]
integration-tests:
stage: test
image: python:3.11.2-slim
services:
- name: $TEST_TAG
alias: $TEST_SERVICE_ALIAS
script:
- pip install -r pie_chart/requirements.txt
- pip install -r test/assets/requirements.txt
- echo "$TEST_SERVICE_ALIAS"
- echo "$TEST_URL"
- pytest test/integration_tests.py
The error comes from [this part](https://gitlab.com/kitchen-server/kitchen-server/-/blob/ci-cd/test/integration_tests.py#L29-31) of the program, specifically the `requests.get()` call. It seems to read and print to stdout the URL correctly as I defined it in the CI file, as shown in the CI job linked above.
test_url: str = os.environ.get('TEST_URL')
print(f"\n == Test URL: {test_url} ==\n")
response: requests.Response = requests.get(test_url)
I've tried
* A ton of different URLs
* All combinations of \["" | "https://" | "https://"\] + \["test-service" | "localhost" | "127.0.0.1"\] + \["" | ":80" | ":8080"\]
* Defining the service in the job
* Defining the service locally
* `FF_NETWORK_PER_BUILD: "true"`
https://redd.it/11f3yqa
@r_devops
Can anyone help me with my CI troubles? I have a project that sets up a Flask webserver via Docker. In CI, I'm trying to use that image as a service to run integration tests on it, such as checking how it handles bad or messy inputs. However, when I use the Docker image to start a service with an alias, pytest can't access it from the CI job.
The Dockerfile for the test image is [here](https://gitlab.com/kitchen-server/kitchen-server/-/blob/ci-cd/test/assets/Dockerfile#L24-26). The highlighted lines (everything below `ENTRYPOINT`) are the only differences between the base and test image.
Example job that's failing [here](https://gitlab.com/kitchen-server/kitchen-server/-/jobs/3848991470). (The Dockerfile exposes port 8080 and that's what the underlying program binds to, which is why I'm using that port in CI - is that incorrect?)
ERROR test/integration_tests.py - requests.exceptions.ConnectionError: HTTPConnectionPool(host='test-service', port=8080): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f74768f9f50>: Failed to establish a new connection: [Errno -2] Name or service not known'))
Speaking of, that's [here](https://gitlab.com/kitchen-server/kitchen-server/-/blob/ci-cd/ci/branch.gitlab-ci.yml) (ignore that the build job is commented out; I just didn't want to burn CI minutes running it while troubleshooting my testing stage). I define a test image tag, a service alias, and a service URL based on the alias on L24-26. Then in the failing job, I define a service using the test image tag as the name, and the alias from the alias variable. Then the script runs a couple pip installs before calling pytest on a specific file.
variables:
LATEST_TAG: $CI_REGISTRY_IMAGE:latest
IMAGE_TAG: $CI_REGISTRY_IMAGE:$CI_COMMIT_REF_SLUG
TEST_TAG: $IMAGE_TAG-test
TEST_SERVICE_ALIAS: test-service
TEST_URL: https://$TEST_SERVICE_ALIAS:8080
[...]
integration-tests:
stage: test
image: python:3.11.2-slim
services:
- name: $TEST_TAG
alias: $TEST_SERVICE_ALIAS
script:
- pip install -r pie_chart/requirements.txt
- pip install -r test/assets/requirements.txt
- echo "$TEST_SERVICE_ALIAS"
- echo "$TEST_URL"
- pytest test/integration_tests.py
The error comes from [this part](https://gitlab.com/kitchen-server/kitchen-server/-/blob/ci-cd/test/integration_tests.py#L29-31) of the program, specifically the `requests.get()` call. It seems to read and print to stdout the URL correctly as I defined it in the CI file, as shown in the CI job linked above.
test_url: str = os.environ.get('TEST_URL')
print(f"\n == Test URL: {test_url} ==\n")
response: requests.Response = requests.get(test_url)
I've tried
* A ton of different URLs
* All combinations of \["" | "https://" | "https://"\] + \["test-service" | "localhost" | "127.0.0.1"\] + \["" | ":80" | ":8080"\]
* Defining the service in the job
* Defining the service locally
* `FF_NETWORK_PER_BUILD: "true"`
https://redd.it/11f3yqa
@r_devops
GitLab
test/assets/Dockerfile · ci-cd · kitchen-server / Kitchen Server · GitLab
Docker homelab server for cooking utilities such as recipes.
Https listener rule weird behavior
Hello, everyone. I have a problem with the listener rule. As seen in the picture the rule 1 https://okynepal.com/login is overridden by the default (last) rule. When i enter https://okynepal.com/login it should have forwarded request to cms-tg but it is forwarding to adminer-8080-tg. When i remove the default (last) rule then it is working properly. What may be the problem?
https://redd.it/11fu154
@r_devops
Hello, everyone. I have a problem with the listener rule. As seen in the picture the rule 1 https://okynepal.com/login is overridden by the default (last) rule. When i enter https://okynepal.com/login it should have forwarded request to cms-tg but it is forwarding to adminer-8080-tg. When i remove the default (last) rule then it is working properly. What may be the problem?
https://redd.it/11fu154
@r_devops
Reddit
r/devops on Reddit: Https listener rule weird behavior
Posted by u/PineappleInformal106 - No votes and no comments
What do you suggest as a distro to learn devops in virtualbox?
Hi devops enthusiasts.
I'm a beginner. I have a 1 year experience of backend development and I want to self learn devops tools and technologies and get a job. I have previously used Ubuntu for around 5-6 years as a personal os.
What are your suggestions for a somewhat lightweight, preferably somewhat graphical os for me to install on virtualbox? My learning path also includes lpic1, lpic2 and networking; the rest are mainly devops tools.
https://redd.it/11f8ycy
@r_devops
Hi devops enthusiasts.
I'm a beginner. I have a 1 year experience of backend development and I want to self learn devops tools and technologies and get a job. I have previously used Ubuntu for around 5-6 years as a personal os.
What are your suggestions for a somewhat lightweight, preferably somewhat graphical os for me to install on virtualbox? My learning path also includes lpic1, lpic2 and networking; the rest are mainly devops tools.
https://redd.it/11f8ycy
@r_devops
Reddit
r/devops on Reddit: What do you suggest as a distro to learn devops in virtualbox?
Posted by u/mrafee113 - No votes and 20 comments
Can someone recommend resources for learning VMware Tanzu?
I usually lookup docs, courses, Kodekloud (website), or any of the other popular sources (e.g. popular YouTube channels to get me started)
I also tried searching on LinkedIn learning and O'Reilly.
No luck though. Maybe I'm going about it the wrong way (I'm searching for Tanzu courses, maybe I should search for another topic that uses Tanzu)
or maybe Tanzu is less popular than I though (or more enterprise-y).
I'm hoping for a course but I'm Okey with docs or even a book (hopefully not but beggers can't be choosers)
https://redd.it/11f0pne
@r_devops
I usually lookup docs, courses, Kodekloud (website), or any of the other popular sources (e.g. popular YouTube channels to get me started)
I also tried searching on LinkedIn learning and O'Reilly.
No luck though. Maybe I'm going about it the wrong way (I'm searching for Tanzu courses, maybe I should search for another topic that uses Tanzu)
or maybe Tanzu is less popular than I though (or more enterprise-y).
I'm hoping for a course but I'm Okey with docs or even a book (hopefully not but beggers can't be choosers)
https://redd.it/11f0pne
@r_devops
Reddit
r/devops on Reddit: Can someone recommend resources for learning VMware Tanzu?
Posted by u/TheDigitalPhoenixX - 1 vote and 8 comments
What is the general or best way to do deploy application to staging and to production
I'll try to elaborate as much as possible:
Our app is dockerized and we are using docker-compose to deploy (k8s not required right now). Everything runs on Azure :VMs,container registry as well as Azure devops for build pipelines, there are two servers for staging and production and both of them have different FQDN configured in azure front door.
Now the problem we are facing is when we are building the containers we can either build it for production or staging because both have diff domain names and other conf files. So I can't leverage the full use of azure devops release pipeline, where we can drop the artifact to staging and then use the same containers for production.
What could be the best solution:
Should we maintain two different branches like master for staging and one branch for release in the production.
Or should the devs find a way to make sure the same containers work in both env by taking variables during run time.
​
How orgs do this in general do they use the same build or maintain diff pipeline/build to diff env or is there a better solution ?
https://redd.it/11fwufz
@r_devops
I'll try to elaborate as much as possible:
Our app is dockerized and we are using docker-compose to deploy (k8s not required right now). Everything runs on Azure :VMs,container registry as well as Azure devops for build pipelines, there are two servers for staging and production and both of them have different FQDN configured in azure front door.
Now the problem we are facing is when we are building the containers we can either build it for production or staging because both have diff domain names and other conf files. So I can't leverage the full use of azure devops release pipeline, where we can drop the artifact to staging and then use the same containers for production.
What could be the best solution:
Should we maintain two different branches like master for staging and one branch for release in the production.
Or should the devs find a way to make sure the same containers work in both env by taking variables during run time.
​
How orgs do this in general do they use the same build or maintain diff pipeline/build to diff env or is there a better solution ?
https://redd.it/11fwufz
@r_devops
Reddit
r/devops on Reddit: What is the general or best way to do deploy application to staging and to production
Posted by u/defact0o - No votes and 1 comment
Using OpenTelemetry with Splunk Enterprise
Hi Everyone!
Recently we have explored the use of OpenTelemetry and Open Source solution to minimize the cost of our monitoring. While we are not removing our commercial solution due to some obvious benefits, we want to slowly introduce the use of Open Source to our infrastructure (and eventually transition to it).
Currently we are using Splunk Enterprise (with Splunk ITSI) to ingest logs, query the metric, and create a KPI on ITSI. We want to utilize OpenTelemetry to ingest data from Splunk Forwarder, and push it to Loki or Prometheus, whichever is available.
Our goal is to forward data from the agent directly to OpenTelemetry, instead of pushing the data to indexer, such that:
Splunk Universal (or Heavy) Forwarder -> OpenTelemetry -> Prometheus / Loki
Instead of:
Splunk Universal Forwarder -> Splunk Indexer
We are able to achieve the ingestion to Loki, however, the data ingested seems to be cooked, and we are getting hex data (as shown here). Anyone who has similar cases before and was able to successfully perform this framework?
https://redd.it/11ex85r
@r_devops
Hi Everyone!
Recently we have explored the use of OpenTelemetry and Open Source solution to minimize the cost of our monitoring. While we are not removing our commercial solution due to some obvious benefits, we want to slowly introduce the use of Open Source to our infrastructure (and eventually transition to it).
Currently we are using Splunk Enterprise (with Splunk ITSI) to ingest logs, query the metric, and create a KPI on ITSI. We want to utilize OpenTelemetry to ingest data from Splunk Forwarder, and push it to Loki or Prometheus, whichever is available.
Our goal is to forward data from the agent directly to OpenTelemetry, instead of pushing the data to indexer, such that:
Splunk Universal (or Heavy) Forwarder -> OpenTelemetry -> Prometheus / Loki
Instead of:
Splunk Universal Forwarder -> Splunk Indexer
We are able to achieve the ingestion to Loki, however, the data ingested seems to be cooked, and we are getting hex data (as shown here). Anyone who has similar cases before and was able to successfully perform this framework?
https://redd.it/11ex85r
@r_devops
pasteboard.co
Pasteboard - Uploaded Image
Simple and lightning fast image sharing. Upload clipboard images with Copy & Paste and image files with Drag & Drop
a silly little idea for a gh bot
yesterday I had an idea to build a github bot or whatever that would automaticly assign a reviewer to a pr based on the diff content. I wonder if it is something that anyone else would want, or am I just thinking of solving a non existent problem
https://redd.it/11ei8dc
@r_devops
yesterday I had an idea to build a github bot or whatever that would automaticly assign a reviewer to a pr based on the diff content. I wonder if it is something that anyone else would want, or am I just thinking of solving a non existent problem
https://redd.it/11ei8dc
@r_devops
Reddit
r/devops on Reddit: a silly little idea for a gh bot
Posted by u/pawellisso - No votes and no comments
Is devops really that difficult???
I've been thinking of learning devops from scratch and when I started to do my research on what tools technologies are needed- IT IS OVERWHELMING. 1. So many softwares tools to play around with
2. 50% of the terms were new to me
3. I read these tools have a lot of depth and are complex.
Someone help me out hereee
https://redd.it/11e68hg
@r_devops
I've been thinking of learning devops from scratch and when I started to do my research on what tools technologies are needed- IT IS OVERWHELMING. 1. So many softwares tools to play around with
2. 50% of the terms were new to me
3. I read these tools have a lot of depth and are complex.
Someone help me out hereee
https://redd.it/11e68hg
@r_devops
Reddit
r/devops on Reddit: Is devops really that difficult???
Posted by u/rohost14 - No votes and no comments
How does DevOps enable continuous integration and continuous delivery (CI/CD)?
Hoping to get a good response from fellow members on this question as I need the answer for my project. Some references related to this will be added advantage.
https://redd.it/11e0egb
@r_devops
Hoping to get a good response from fellow members on this question as I need the answer for my project. Some references related to this will be added advantage.
https://redd.it/11e0egb
@r_devops
Reddit
r/devops on Reddit: How does DevOps enable continuous integration and continuous delivery (CI/CD)?
Posted by u/risabh07 - No votes and no comments
Report on aws cost and usages
Aws currently generates a very nice cost and usage report however if your account uses a lot of services , it will lay out the 9 highest cost of services you used but for the rest it will group the services and name it “others”. Is there a way for AWS not to do this or is there a third party tool that will individually lay out the usage of the services/resources and the cost?
https://redd.it/11g1z2d
@r_devops
Aws currently generates a very nice cost and usage report however if your account uses a lot of services , it will lay out the 9 highest cost of services you used but for the rest it will group the services and name it “others”. Is there a way for AWS not to do this or is there a third party tool that will individually lay out the usage of the services/resources and the cost?
https://redd.it/11g1z2d
@r_devops
Reddit
r/devops on Reddit: Report on aws cost and usages
Posted by u/eyesniper12 - No votes and 2 comments
Atlantis on GCP multiple service accounts
I'm setting up infra automation for google cloud with Atlantis running on GKE, and I'm a bit stumped on how to manage the service accounts. Different environments are in different projects, so my idea was to create the service accounts with terraform (no biggie with for_each and the official module), and then inject it as K8S secret to the Atlantis pod (and consider using vault in the future). From here on I'm very unsure if I can make the workflows will be able to pick the right credentials. I'm not a fan of TF workspaces, but nothing else comes to mind. The best would be something short lived obviously. How's everyone dealing with this?
https://redd.it/11g39h3
@r_devops
I'm setting up infra automation for google cloud with Atlantis running on GKE, and I'm a bit stumped on how to manage the service accounts. Different environments are in different projects, so my idea was to create the service accounts with terraform (no biggie with for_each and the official module), and then inject it as K8S secret to the Atlantis pod (and consider using vault in the future). From here on I'm very unsure if I can make the workflows will be able to pick the right credentials. I'm not a fan of TF workspaces, but nothing else comes to mind. The best would be something short lived obviously. How's everyone dealing with this?
https://redd.it/11g39h3
@r_devops
Reddit
r/devops on Reddit: Atlantis on GCP multiple service accounts
Posted by u/salvaged_goods - No votes and no comments
How to get a joined volume Python Virtual environment to Airflow Docker working with externalpythontask?
# GOAL
​
\- Have a local python environemnt that I can swap up and install things to it
\- withouth needing to build a new image -> stopping the runing container -> starting new container
​
# DONE
​
\- I use the docker version of airflow 2.4.1
\- I have succesfully joined the Python Virtual environment to Airflow Docker as a volume you can see I in the docker-compose.yml
\- After restarting docker with the new yml file it works fine.
\- I can jump in to the container activate manually the python environment import and run python libraries perfectly fine.
​
# CHALLANGE
​
\- The problem comes when I try to run my test dag with the new venv2
\- The DAG works with the original external python environemnt that is installed via the Dockerfile but the goal would be to not to need this as mentioned before
\- My guess is that this error happens because the python environemnt does not activated.
​
# Files and ERRORS
​
docker-compose.yml
​
​
version: '3'
x-airflow-common:
&airflow-common
image: ${AIRFLOWIMAGENAME:-myown-image-apache/airflow:2.4.1}
build: .
environment:
&airflow-common-env
AIRFLOWCOREEXECUTOR: CeleryExecutor
AIRFLOWDATABASESQLALCHEMYCONN: NOTPUBLIC
#ORIGINAL: postgresql+psycopg2://airflow:airflow@postgres/airflow
# For backward compatibility, with Airflow <2.3
AIRFLOWCORESQLALCHEMYCONN: NOTPUBLIC
#ORIGINAL postgresql+psycopg2://airflow:airflow@postgres/airflow
AIRFLOWCELERYRESULTBACKEND: NOTPUBLIC
# ORIGINAL db+postgresql://airflow:airflow@postgres/airflow
AIRFLOWCELERYBROKERURL: redis://:@redis:1111/0
AIRFLOWCOREFERNETKEY: ''
AIRFLOWCOREDAGSAREPAUSEDATCREATION: 'true'
AIRFLOWCORELOADEXAMPLES: 'false'
AIRFLOWAPIAUTHBACKENDS: 'airflow.api.auth.backend.NOTPUBLIC'
PIPADDITIONALREQUIREMENTS: ${PIPADDITIONALREQUIREMENTS:-}
AIRFLOWCOREENABLEXCOMPICKLING: 'NOTPUBLIC'
AIRFLOWSMTPSMTPHOST: NOTPUBLIC
AIRFLOWSMTPSMTPPORT: 222
AIRF LOWSMTPSMTPUSER: "NOTPUBLIC"
AIRFLOWSMTPSMTPPASSWORD: NOTPUBLIC
AIRFLOWSMTPSMTPMAILFROM: [email protected]
AIRFLOWWEBSERVERBASEURL: NOTPUBLIC
AIRFLOWWEBSERVERWEBSERVERSSLCERT: /opt/airflow/certs/NOTPUBLIC.pem
AIRFLOWWEBSERVERWEBSERVERSSLKEY: /opt/airflow/certs/NOTPUBLIC.pem
AIRFLOWCOREMAXACTIVERUNSPERDAG: 1
AIRFLOWCOREDEFAULTTASKEXECUTIONTIMEOUT: 21600
AWSSNOWPLOWACCESSKEY: NOTPUBLIC
AWSSNOWPLOWSECRETKEY: NOTPUBLIC
AIRFLOWSCHEDULERMINFILEPROCESSINTERVAL: 180
#AIRFLOWSCHEDULERDAGDIRLISTINTERVAL: 600
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- routtofolder/NOTPUBLIC1:/opt/airflow/NOTPUBLIC1
- routtofolder/NOTPUBLIC2:/opt/airflow/NOTPUBLIC2
- /routtofolder/NOTPUBLIC3:/opt/airflow/NOTPUBLIC3
- ./venv2:/opt/airflow/venv2 #########################################THIS IS THE PROBLEMATIC PART
user: "${AIRFLOWUID:-50000}:0"
dependson:
&airflow-common-depends-on
redis:
condition: servicehealthy
postgres:
condition: servicehealthy
​
​
my example DAG that I want to work:
​
from future import annotations
import logging
import sys
import tempfile
from pprint import pprint
from datetime import timedelta
import pendulum
from airflow import DAG
from airflow.decorators import task
from airflow.operators.pythonoperator import PythonOperator
from airflow.models import Variable
import requests
from requests.auth
# GOAL
​
\- Have a local python environemnt that I can swap up and install things to it
\- withouth needing to build a new image -> stopping the runing container -> starting new container
​
# DONE
​
\- I use the docker version of airflow 2.4.1
\- I have succesfully joined the Python Virtual environment to Airflow Docker as a volume you can see I in the docker-compose.yml
\- After restarting docker with the new yml file it works fine.
\- I can jump in to the container activate manually the python environment import and run python libraries perfectly fine.
​
# CHALLANGE
​
\- The problem comes when I try to run my test dag with the new venv2
\- The DAG works with the original external python environemnt that is installed via the Dockerfile but the goal would be to not to need this as mentioned before
\- My guess is that this error happens because the python environemnt does not activated.
​
# Files and ERRORS
​
docker-compose.yml
​
​
version: '3'
x-airflow-common:
&airflow-common
image: ${AIRFLOWIMAGENAME:-myown-image-apache/airflow:2.4.1}
build: .
environment:
&airflow-common-env
AIRFLOWCOREEXECUTOR: CeleryExecutor
AIRFLOWDATABASESQLALCHEMYCONN: NOTPUBLIC
#ORIGINAL: postgresql+psycopg2://airflow:airflow@postgres/airflow
# For backward compatibility, with Airflow <2.3
AIRFLOWCORESQLALCHEMYCONN: NOTPUBLIC
#ORIGINAL postgresql+psycopg2://airflow:airflow@postgres/airflow
AIRFLOWCELERYRESULTBACKEND: NOTPUBLIC
# ORIGINAL db+postgresql://airflow:airflow@postgres/airflow
AIRFLOWCELERYBROKERURL: redis://:@redis:1111/0
AIRFLOWCOREFERNETKEY: ''
AIRFLOWCOREDAGSAREPAUSEDATCREATION: 'true'
AIRFLOWCORELOADEXAMPLES: 'false'
AIRFLOWAPIAUTHBACKENDS: 'airflow.api.auth.backend.NOTPUBLIC'
PIPADDITIONALREQUIREMENTS: ${PIPADDITIONALREQUIREMENTS:-}
AIRFLOWCOREENABLEXCOMPICKLING: 'NOTPUBLIC'
AIRFLOWSMTPSMTPHOST: NOTPUBLIC
AIRFLOWSMTPSMTPPORT: 222
AIRF LOWSMTPSMTPUSER: "NOTPUBLIC"
AIRFLOWSMTPSMTPPASSWORD: NOTPUBLIC
AIRFLOWSMTPSMTPMAILFROM: [email protected]
AIRFLOWWEBSERVERBASEURL: NOTPUBLIC
AIRFLOWWEBSERVERWEBSERVERSSLCERT: /opt/airflow/certs/NOTPUBLIC.pem
AIRFLOWWEBSERVERWEBSERVERSSLKEY: /opt/airflow/certs/NOTPUBLIC.pem
AIRFLOWCOREMAXACTIVERUNSPERDAG: 1
AIRFLOWCOREDEFAULTTASKEXECUTIONTIMEOUT: 21600
AWSSNOWPLOWACCESSKEY: NOTPUBLIC
AWSSNOWPLOWSECRETKEY: NOTPUBLIC
AIRFLOWSCHEDULERMINFILEPROCESSINTERVAL: 180
#AIRFLOWSCHEDULERDAGDIRLISTINTERVAL: 600
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- routtofolder/NOTPUBLIC1:/opt/airflow/NOTPUBLIC1
- routtofolder/NOTPUBLIC2:/opt/airflow/NOTPUBLIC2
- /routtofolder/NOTPUBLIC3:/opt/airflow/NOTPUBLIC3
- ./venv2:/opt/airflow/venv2 #########################################THIS IS THE PROBLEMATIC PART
user: "${AIRFLOWUID:-50000}:0"
dependson:
&airflow-common-depends-on
redis:
condition: servicehealthy
postgres:
condition: servicehealthy
​
​
my example DAG that I want to work:
​
from future import annotations
import logging
import sys
import tempfile
from pprint import pprint
from datetime import timedelta
import pendulum
from airflow import DAG
from airflow.decorators import task
from airflow.operators.pythonoperator import PythonOperator
from airflow.models import Variable
import requests
from requests.auth
How to get a joined volume Python Virtual environment to Airflow Docker working with external_python_task?
# GOAL
​
\- Have a local python environemnt that I can swap up and install things to it
\- withouth needing to build a new image -> stopping the runing container -> starting new container
​
# DONE
​
\- I use the docker version of airflow 2.4.1
\- I have succesfully joined the Python Virtual environment to Airflow Docker as a volume you can see I in the docker-compose.yml
\- After restarting docker with the new yml file it works fine.
\- I can jump in to the container activate manually the python environment import and run python libraries perfectly fine.
​
# CHALLANGE
​
\- The problem comes when I try to run my test dag with the new venv2
\- The DAG works with the original external python environemnt that is installed via the Dockerfile but the goal would be to not to need this as mentioned before
\- My guess is that this error happens because the python environemnt does not activated.
​
# Files and ERRORS
​
docker-compose.yml
​
​
version: '3'
x-airflow-common:
&airflow-common
image: ${AIRFLOW_IMAGE_NAME:-myown-image-apache/airflow:2.4.1}
build: .
environment:
&airflow-common-env
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: NOTPUBLIC
#ORIGINAL: postgresql+psycopg2://airflow:airflow@postgres/airflow
# For backward compatibility, with Airflow <2.3
AIRFLOW__CORE__SQL_ALCHEMY_CONN: NOTPUBLIC
#ORIGINAL postgresql+psycopg2://airflow:airflow@postgres/airflow
AIRFLOW__CELERY__RESULT_BACKEND: NOTPUBLIC
# ORIGINAL db+postgresql://airflow:airflow@postgres/airflow
AIRFLOW__CELERY__BROKER_URL: redis://:@redis:1111/0
AIRFLOW__CORE__FERNET_KEY: ''
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
AIRFLOW__API__AUTH_BACKENDS: 'airflow.api.auth.backend.NOTPUBLIC'
_PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
AIRFLOW__CORE__ENABLE_XCOM_PICKLING: 'NOTPUBLIC'
AIRFLOW__SMTP__SMTP_HOST: NOTPUBLIC
AIRFLOW__SMTP__SMTP_PORT: 222
AIRF LOW__SMTP__SMTP_USER: "NOTPUBLIC"
AIRFLOW__SMTP__SMTP_PASSWORD: NOTPUBLIC
AIRFLOW__SMTP__SMTP_MAIL_FROM: [email protected]
AIRFLOW__WEBSERVER__BASE_URL: NOTPUBLIC
AIRFLOW__WEBSERVER__WEB_SERVER_SSL_CERT: /opt/airflow/certs/NOTPUBLIC.pem
AIRFLOW__WEBSERVER__WEB_SERVER_SSL_KEY: /opt/airflow/certs/NOTPUBLIC.pem
AIRFLOW__CORE__MAX_ACTIVE_RUNS_PER_DAG: 1
AIRFLOW__CORE__DEFAULT_TASK_EXECUTION_TIMEOUT: 21600
AWS_SNOWPLOW_ACCESS_KEY: NOTPUBLIC
AWS_SNOWPLOW_SECRET_KEY: NOTPUBLIC
AIRFLOW__SCHEDULER__MIN_FILE_PROCESS_INTERVAL: 180
#AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL: 600
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- routtofolder/NOTPUBLIC1:/opt/airflow/NOTPUBLIC1
- routtofolder/NOTPUBLIC2:/opt/airflow/NOTPUBLIC2
- /routtofolder/NOTPUBLIC3:/opt/airflow/NOTPUBLIC3
- ./venv2:/opt/airflow/venv2 #########################################THIS IS THE PROBLEMATIC PART
user: "${AIRFLOW_UID:-50000}:0"
depends_on:
&airflow-common-depends-on
redis:
condition: service_healthy
postgres:
condition: service_healthy
​
​
my example DAG that I want to work:
​
from __future__ import annotations
import logging
import sys
import tempfile
from pprint import pprint
from datetime import timedelta
import pendulum
from airflow import DAG
from airflow.decorators import task
from airflow.operators.python_operator import PythonOperator
from airflow.models import Variable
import requests
from requests.auth
# GOAL
​
\- Have a local python environemnt that I can swap up and install things to it
\- withouth needing to build a new image -> stopping the runing container -> starting new container
​
# DONE
​
\- I use the docker version of airflow 2.4.1
\- I have succesfully joined the Python Virtual environment to Airflow Docker as a volume you can see I in the docker-compose.yml
\- After restarting docker with the new yml file it works fine.
\- I can jump in to the container activate manually the python environment import and run python libraries perfectly fine.
​
# CHALLANGE
​
\- The problem comes when I try to run my test dag with the new venv2
\- The DAG works with the original external python environemnt that is installed via the Dockerfile but the goal would be to not to need this as mentioned before
\- My guess is that this error happens because the python environemnt does not activated.
​
# Files and ERRORS
​
docker-compose.yml
​
​
version: '3'
x-airflow-common:
&airflow-common
image: ${AIRFLOW_IMAGE_NAME:-myown-image-apache/airflow:2.4.1}
build: .
environment:
&airflow-common-env
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: NOTPUBLIC
#ORIGINAL: postgresql+psycopg2://airflow:airflow@postgres/airflow
# For backward compatibility, with Airflow <2.3
AIRFLOW__CORE__SQL_ALCHEMY_CONN: NOTPUBLIC
#ORIGINAL postgresql+psycopg2://airflow:airflow@postgres/airflow
AIRFLOW__CELERY__RESULT_BACKEND: NOTPUBLIC
# ORIGINAL db+postgresql://airflow:airflow@postgres/airflow
AIRFLOW__CELERY__BROKER_URL: redis://:@redis:1111/0
AIRFLOW__CORE__FERNET_KEY: ''
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
AIRFLOW__API__AUTH_BACKENDS: 'airflow.api.auth.backend.NOTPUBLIC'
_PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
AIRFLOW__CORE__ENABLE_XCOM_PICKLING: 'NOTPUBLIC'
AIRFLOW__SMTP__SMTP_HOST: NOTPUBLIC
AIRFLOW__SMTP__SMTP_PORT: 222
AIRF LOW__SMTP__SMTP_USER: "NOTPUBLIC"
AIRFLOW__SMTP__SMTP_PASSWORD: NOTPUBLIC
AIRFLOW__SMTP__SMTP_MAIL_FROM: [email protected]
AIRFLOW__WEBSERVER__BASE_URL: NOTPUBLIC
AIRFLOW__WEBSERVER__WEB_SERVER_SSL_CERT: /opt/airflow/certs/NOTPUBLIC.pem
AIRFLOW__WEBSERVER__WEB_SERVER_SSL_KEY: /opt/airflow/certs/NOTPUBLIC.pem
AIRFLOW__CORE__MAX_ACTIVE_RUNS_PER_DAG: 1
AIRFLOW__CORE__DEFAULT_TASK_EXECUTION_TIMEOUT: 21600
AWS_SNOWPLOW_ACCESS_KEY: NOTPUBLIC
AWS_SNOWPLOW_SECRET_KEY: NOTPUBLIC
AIRFLOW__SCHEDULER__MIN_FILE_PROCESS_INTERVAL: 180
#AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL: 600
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- routtofolder/NOTPUBLIC1:/opt/airflow/NOTPUBLIC1
- routtofolder/NOTPUBLIC2:/opt/airflow/NOTPUBLIC2
- /routtofolder/NOTPUBLIC3:/opt/airflow/NOTPUBLIC3
- ./venv2:/opt/airflow/venv2 #########################################THIS IS THE PROBLEMATIC PART
user: "${AIRFLOW_UID:-50000}:0"
depends_on:
&airflow-common-depends-on
redis:
condition: service_healthy
postgres:
condition: service_healthy
​
​
my example DAG that I want to work:
​
from __future__ import annotations
import logging
import sys
import tempfile
from pprint import pprint
from datetime import timedelta
import pendulum
from airflow import DAG
from airflow.decorators import task
from airflow.operators.python_operator import PythonOperator
from airflow.models import Variable
import requests
from requests.auth
import HTTPBasicAuth
my_default_args = {
'owner': 'Anonymus',
'email': ['[email protected]'],
'email_on_failure': True,
'email_on_retry': False,
}
with DAG(
dag_id='test_connected_env',
schedule='10 10 * * *',
start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
catchup=False,
#execution_timeout=timedelta(seconds=60),
default_args=my_default_args,
tags=['sample_tag', 'sample_tag2'],
) as dag:
#@task.external_python(task_id="test_external_python_venv_task", python=os.fspath(sys.executable)) # ORIGINAL
#@task.external_python(task_id="test_connected_env_task", python='/opt/airflow/venv1/bin/python3') ### installed via pip via Dockerfile, this works perfectly fine
u/task.external_python(task_id="test_connected_env_task", python='/opt/airflow/venv2/bin/python3')
def go(): # this could be any function name
#import package here
print("My Start")
#if you want to test the error
# print(1+"Airflow")
import pandas as pd
print(pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]}))
import numpy as np
print(np.array([1,2,3]))
return print('my end')
external_python_task = go()
​
​
​
ERROR that I get:
​
*** Reading local file: /opt/airflow/logs/dag_id=test_connected_env/run_id=manual__2023-03-02T14:15:16.674123+00:00/task_id=test_connected_env_task/attempt=1.log
[2023-03-02, 14:15:18 GMT] {taskinstance.py:1165} INFO - Dependencies all met for <TaskInstance: test_connected_env.test_connected_env_task manual__2023-03-02T14:15:16.674123+00:00 [queued]>
[2023-03-02, 14:15:18 GMT] {taskinstance.py:1165} INFO - Dependencies all met for <TaskInstance: test_connected_env.test_connected_env_task manual__2023-03-02T14:15:16.674123+00:00 [queued]>
[2023-03-02, 14:15:18 GMT] {taskinstance.py:1362} INFO -
--------------------------------------------------------------------------------
[2023-03-02, 14:15:18 GMT] {taskinstance.py:1363} INFO - Starting attempt 1 of 1
[2023-03-02, 14:15:18 GMT] {taskinstance.py:1364} INFO -
--------------------------------------------------------------------------------
[2023-03-02, 14:15:18 GMT] {taskinstance.py:1383} INFO - Executing <Task(_PythonExternalDecoratedOperator): test_connected_env_task> on 2023-03-02 14:15:16.674123+00:00
[2023-03-02, 14:15:18 GMT] {standard_task_runner.py:54} INFO - Started process 15812 to run task
[2023-03-02, 14:15:18 GMT] {standard_task_runner.py:82} INFO - Running: ['airflow', 'tasks', 'run', 'test_connected_env', 'test_connected_env_task', 'manual__2023-03-02T14:15:16.674123+00:00', '--job-id', '142443', '--raw', '--subdir', 'DAGS_FOLDER/test_connected_env_task.py', '--cfg-path', '/tmp/tmp1t0wy5hy']
[2023-03-02, 14:15:18 GMT] {standard_task_runner.py:83} INFO - Job 142443: Subtask test_connected_env_task
[2023-03-02, 14:15:18 GMT] {dagbag.py:525} INFO - Filling up the DagBag from /opt/airflow/dags/test_connected_env_task.py
[2023-03-02, 14:15:18 GMT] {task_command.py:384} INFO - Running <TaskInstance: test_connected_env.test_connected_env_task manual__2023-03-02T14:15:16.674123+00:00 [running]> on host 0ad620763627
[2023-03-02, 14:15:18 GMT] {taskinstance.py:1590} INFO - Exporting the following env vars:
[email protected]
AIRFLOW_CTX_DAG_OWNER=Anonymus
AIRFLOW_CTX_DAG_ID=test_connected_env
AIRFLOW_CTX_TASK_ID=test_connected_env_task
AIRFLOW_CTX_EXECUTION_DATE=2023-03-02T14:15:16.674123+00:00
AIRFLOW_CTX_TRY_NUMBER=1
AIRFLOW_CTX_DAG_RUN_ID=manual__2023-03-02T14:15:16.674123+00:00
[2023-03-02, 14:15:18 GMT] {python.py:725} WARNING - When checking for Airflow installed in venv got Command '['/opt/airflow/venv2/bin/python3', '-c', 'from airflow import version; print(version.version)']' returned non-zero exit status
my_default_args = {
'owner': 'Anonymus',
'email': ['[email protected]'],
'email_on_failure': True,
'email_on_retry': False,
}
with DAG(
dag_id='test_connected_env',
schedule='10 10 * * *',
start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
catchup=False,
#execution_timeout=timedelta(seconds=60),
default_args=my_default_args,
tags=['sample_tag', 'sample_tag2'],
) as dag:
#@task.external_python(task_id="test_external_python_venv_task", python=os.fspath(sys.executable)) # ORIGINAL
#@task.external_python(task_id="test_connected_env_task", python='/opt/airflow/venv1/bin/python3') ### installed via pip via Dockerfile, this works perfectly fine
u/task.external_python(task_id="test_connected_env_task", python='/opt/airflow/venv2/bin/python3')
def go(): # this could be any function name
#import package here
print("My Start")
#if you want to test the error
# print(1+"Airflow")
import pandas as pd
print(pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]}))
import numpy as np
print(np.array([1,2,3]))
return print('my end')
external_python_task = go()
​
​
​
ERROR that I get:
​
*** Reading local file: /opt/airflow/logs/dag_id=test_connected_env/run_id=manual__2023-03-02T14:15:16.674123+00:00/task_id=test_connected_env_task/attempt=1.log
[2023-03-02, 14:15:18 GMT] {taskinstance.py:1165} INFO - Dependencies all met for <TaskInstance: test_connected_env.test_connected_env_task manual__2023-03-02T14:15:16.674123+00:00 [queued]>
[2023-03-02, 14:15:18 GMT] {taskinstance.py:1165} INFO - Dependencies all met for <TaskInstance: test_connected_env.test_connected_env_task manual__2023-03-02T14:15:16.674123+00:00 [queued]>
[2023-03-02, 14:15:18 GMT] {taskinstance.py:1362} INFO -
--------------------------------------------------------------------------------
[2023-03-02, 14:15:18 GMT] {taskinstance.py:1363} INFO - Starting attempt 1 of 1
[2023-03-02, 14:15:18 GMT] {taskinstance.py:1364} INFO -
--------------------------------------------------------------------------------
[2023-03-02, 14:15:18 GMT] {taskinstance.py:1383} INFO - Executing <Task(_PythonExternalDecoratedOperator): test_connected_env_task> on 2023-03-02 14:15:16.674123+00:00
[2023-03-02, 14:15:18 GMT] {standard_task_runner.py:54} INFO - Started process 15812 to run task
[2023-03-02, 14:15:18 GMT] {standard_task_runner.py:82} INFO - Running: ['airflow', 'tasks', 'run', 'test_connected_env', 'test_connected_env_task', 'manual__2023-03-02T14:15:16.674123+00:00', '--job-id', '142443', '--raw', '--subdir', 'DAGS_FOLDER/test_connected_env_task.py', '--cfg-path', '/tmp/tmp1t0wy5hy']
[2023-03-02, 14:15:18 GMT] {standard_task_runner.py:83} INFO - Job 142443: Subtask test_connected_env_task
[2023-03-02, 14:15:18 GMT] {dagbag.py:525} INFO - Filling up the DagBag from /opt/airflow/dags/test_connected_env_task.py
[2023-03-02, 14:15:18 GMT] {task_command.py:384} INFO - Running <TaskInstance: test_connected_env.test_connected_env_task manual__2023-03-02T14:15:16.674123+00:00 [running]> on host 0ad620763627
[2023-03-02, 14:15:18 GMT] {taskinstance.py:1590} INFO - Exporting the following env vars:
[email protected]
AIRFLOW_CTX_DAG_OWNER=Anonymus
AIRFLOW_CTX_DAG_ID=test_connected_env
AIRFLOW_CTX_TASK_ID=test_connected_env_task
AIRFLOW_CTX_EXECUTION_DATE=2023-03-02T14:15:16.674123+00:00
AIRFLOW_CTX_TRY_NUMBER=1
AIRFLOW_CTX_DAG_RUN_ID=manual__2023-03-02T14:15:16.674123+00:00
[2023-03-02, 14:15:18 GMT] {python.py:725} WARNING - When checking for Airflow installed in venv got Command '['/opt/airflow/venv2/bin/python3', '-c', 'from airflow import version; print(version.version)']' returned non-zero exit status
1.
[2023-03-02, 14:15:18 GMT] {python.py:726} WARNING - This means that Airflow is not properly installed by /opt/airflow/venv2/bin/python3. Airflow context keys will not be available. Please Install Airflow 2.4.1 in your environment to access them.
[2023-03-02, 14:15:18 GMT] {process_utils.py:179} INFO - Executing cmd: /opt/airflow/venv2/bin/python3 /tmp/tmdqmf6q9rg/script.py /tmp/tmdqmf6q9rg/script.in /tmp/tmdqmf6q9rg/script.out /tmp/tmdqmf6q9rg/string_args.txt
[2023-03-02, 14:15:18 GMT] {process_utils.py:183} INFO - Output:
[2023-03-02, 14:15:18 GMT] {process_utils.py:187} INFO - My Start
[2023-03-02, 14:15:18 GMT] {process_utils.py:187} INFO - Traceback (most recent call last):
[2023-03-02, 14:15:18 GMT] {process_utils.py:187} INFO - File "/tmp/tmdqmf6q9rg/script.py", line 38, in <module>
[2023-03-02, 14:15:18 GMT] {process_utils.py:187} INFO - res = go(*arg_dict["args"], **arg_dict["kwargs"])
[2023-03-02, 14:15:18 GMT] {process_utils.py:187} INFO - File "/tmp/tmdqmf6q9rg/script.py", line 30, in go
[2023-03-02, 14:15:18 GMT] {process_utils.py:187} INFO - import pandas as pd
[2023-03-02, 14:15:18 GMT] {process_utils.py:187} INFO - ModuleNotFoundError: No module named 'pandas'
[2023-03-02, 14:15:18 GMT] {taskinstance.py:1851} ERROR - Task failed with exception
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/decorators/base.py", line 188, in execute
return_value = super().execute(context)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 370, in execute
return super().execute(context=serializable_context)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 175, in execute
return_value = self.execute_callable()
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 678, in execute_callable
return self._execute_python_callable_in_subprocess(python_path, tmp_path)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 426, in _execute_python_callable_in_subprocess
execute_in_subprocess(
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/process_utils.py", line 168, in execute_in_subprocess
execute_in_subprocess_with_kwargs(cmd, cwd=cwd)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/process_utils.py", line 191, in execute_in_subprocess_with_kwargs
raise subprocess.CalledProcessError(exit_code, cmd)
subprocess.CalledProcessError: Command '['/opt/airflow/venv2/bin/python3', '/tmp/tmdqmf6q9rg/script.py', '/tmp/tmdqmf6q9rg/script.in', '/tmp/tmdqmf6q9rg/script.out', '/tmp/tmdqmf6q9rg/string_args.txt']' returned non-zero exit status 1.
[2023-03-02, 14:15:18 GMT] {taskinstance.py:1401} INFO - Marking task as FAILED. dag_id=test_connected_env, task_id=test_connected_env_task, execution_date=20230302T141516, start_date=20230302T141518, end_date=20230302T141518
[2023-03-02, 14:15:18 GMT] {warnings.py:109} WARNING - /home/airflow/.local/lib/python3.8/site-packages/airflow/utils/email.py:120: RemovedInAirflow3Warning: Fetching SMTP credentials from configuration variables will be deprecated in a future release. Please set credentials using a connection instead.
send_mime_email(e_from=mail_from, e_to=recipients, mime_msg=msg, conn_id=conn_id, dryrun=dryrun)
[2023-03-02, 14:15:18 GMT] {email.py:229} INFO - Email alerting: attempt 1
[2023-03-02, 14:15:18 GMT] {email.py:241} INFO - Sent an alert email to ['[email protected]']
[2023-03-02, 14:15:18 GMT] {standard_task_runner.py:102} ERROR - Failed to execute job NONPUBLIC for task test_connected_env_task (Command '['/opt/airflow/venv2/bin/python3', '/tmp/tmdqmf6q9rg/script.py', '/tmp/tmdqmf6q9rg/script.in', '/tmp/tmdqmf6q9rg/script.out', '/tmp/tmdqmf6q9rg/string_args.txt']' returned non-zero exit status 1.; 15812)
[2023-03-02, 14:15:18 GMT]
[2023-03-02, 14:15:18 GMT] {python.py:726} WARNING - This means that Airflow is not properly installed by /opt/airflow/venv2/bin/python3. Airflow context keys will not be available. Please Install Airflow 2.4.1 in your environment to access them.
[2023-03-02, 14:15:18 GMT] {process_utils.py:179} INFO - Executing cmd: /opt/airflow/venv2/bin/python3 /tmp/tmdqmf6q9rg/script.py /tmp/tmdqmf6q9rg/script.in /tmp/tmdqmf6q9rg/script.out /tmp/tmdqmf6q9rg/string_args.txt
[2023-03-02, 14:15:18 GMT] {process_utils.py:183} INFO - Output:
[2023-03-02, 14:15:18 GMT] {process_utils.py:187} INFO - My Start
[2023-03-02, 14:15:18 GMT] {process_utils.py:187} INFO - Traceback (most recent call last):
[2023-03-02, 14:15:18 GMT] {process_utils.py:187} INFO - File "/tmp/tmdqmf6q9rg/script.py", line 38, in <module>
[2023-03-02, 14:15:18 GMT] {process_utils.py:187} INFO - res = go(*arg_dict["args"], **arg_dict["kwargs"])
[2023-03-02, 14:15:18 GMT] {process_utils.py:187} INFO - File "/tmp/tmdqmf6q9rg/script.py", line 30, in go
[2023-03-02, 14:15:18 GMT] {process_utils.py:187} INFO - import pandas as pd
[2023-03-02, 14:15:18 GMT] {process_utils.py:187} INFO - ModuleNotFoundError: No module named 'pandas'
[2023-03-02, 14:15:18 GMT] {taskinstance.py:1851} ERROR - Task failed with exception
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/decorators/base.py", line 188, in execute
return_value = super().execute(context)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 370, in execute
return super().execute(context=serializable_context)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 175, in execute
return_value = self.execute_callable()
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 678, in execute_callable
return self._execute_python_callable_in_subprocess(python_path, tmp_path)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 426, in _execute_python_callable_in_subprocess
execute_in_subprocess(
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/process_utils.py", line 168, in execute_in_subprocess
execute_in_subprocess_with_kwargs(cmd, cwd=cwd)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/process_utils.py", line 191, in execute_in_subprocess_with_kwargs
raise subprocess.CalledProcessError(exit_code, cmd)
subprocess.CalledProcessError: Command '['/opt/airflow/venv2/bin/python3', '/tmp/tmdqmf6q9rg/script.py', '/tmp/tmdqmf6q9rg/script.in', '/tmp/tmdqmf6q9rg/script.out', '/tmp/tmdqmf6q9rg/string_args.txt']' returned non-zero exit status 1.
[2023-03-02, 14:15:18 GMT] {taskinstance.py:1401} INFO - Marking task as FAILED. dag_id=test_connected_env, task_id=test_connected_env_task, execution_date=20230302T141516, start_date=20230302T141518, end_date=20230302T141518
[2023-03-02, 14:15:18 GMT] {warnings.py:109} WARNING - /home/airflow/.local/lib/python3.8/site-packages/airflow/utils/email.py:120: RemovedInAirflow3Warning: Fetching SMTP credentials from configuration variables will be deprecated in a future release. Please set credentials using a connection instead.
send_mime_email(e_from=mail_from, e_to=recipients, mime_msg=msg, conn_id=conn_id, dryrun=dryrun)
[2023-03-02, 14:15:18 GMT] {email.py:229} INFO - Email alerting: attempt 1
[2023-03-02, 14:15:18 GMT] {email.py:241} INFO - Sent an alert email to ['[email protected]']
[2023-03-02, 14:15:18 GMT] {standard_task_runner.py:102} ERROR - Failed to execute job NONPUBLIC for task test_connected_env_task (Command '['/opt/airflow/venv2/bin/python3', '/tmp/tmdqmf6q9rg/script.py', '/tmp/tmdqmf6q9rg/script.in', '/tmp/tmdqmf6q9rg/script.out', '/tmp/tmdqmf6q9rg/string_args.txt']' returned non-zero exit status 1.; 15812)
[2023-03-02, 14:15:18 GMT]
{local_task_job.py:164} INFO - Task exited with return code 1
[2023-03-02, 14:15:18 GMT] {local_task_job.py:273} INFO - 0 downstream tasks scheduled from follow-on schedule check
https://redd.it/11g5j1e
@r_devops
[2023-03-02, 14:15:18 GMT] {local_task_job.py:273} INFO - 0 downstream tasks scheduled from follow-on schedule check
https://redd.it/11g5j1e
@r_devops
Reddit
r/devops on Reddit: How to get a joined volume Python Virtual environment to Airflow Docker working with external_python_task?
Posted by u/glassAlloy - No votes and no comments
Devops age restrictions
Hello guys, do employees look for employees in 20s 30s to work as devops Or can you move up to dev ops in your 40s 50s ? What your expiernece been like.
https://redd.it/11g5ng9
@r_devops
Hello guys, do employees look for employees in 20s 30s to work as devops Or can you move up to dev ops in your 40s 50s ? What your expiernece been like.
https://redd.it/11g5ng9
@r_devops
Reddit
r/devops on Reddit: Devops age restrictions
Posted by u/Titanguru7 - No votes and 8 comments