Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hive operator always success even connection does not success. #31111

Open
1 of 2 tasks
kazuki-ma opened this issue May 6, 2023 · 5 comments
Open
1 of 2 tasks

hive operator always success even connection does not success. #31111

kazuki-ma opened this issue May 6, 2023 · 5 comments

Comments

@kazuki-ma
Copy link

Apache Airflow version

2.6.0

What happened

Looks like hive operator always success when connection is refused (host error and/or credentials error)

What you think should happen instead

Mark task as failed. (And retry etc will be happened)

How to reproduce

Setup clean airflow with hive connector.
And setup some dags/tasks using hive operator.

Connection configuration is invalid at default.
So task should be failed.

But successed.

Operating System

Linux

Versions of Apache Airflow Providers

No response

Deployment

Docker-Compose

Deployment details

Following docker files

FROM apache/airflow:2.6.0-python3.7
USER root
RUN apt-get update && apt-get install --yes python3-dev build-essential libsasl2-dev procps wget java-common graphviz libmariadb-dev

USER airflow
RUN <<__EOF
pip install JPype1 \
    jaydebeapi \
    airflow-exporter \
    airflow-clickhouse-plugin \
    apache-airflow-providers-apache-hive \
    apache-airflow-providers-jdbc \
    hmsclient \
    pandas \
    pyhive[hive] \
    thrift \
    mysqlclient

RUN <<__EOF
mkdir -p /opt/airflow/jdk
curl -L https://cdn.azul.com/zulu-embedded/bin/zulu8.70.0.23-ca-jdk8.0.372-linux_aarch64.tar.gz | tar -xzgvf -C /opt/airflow/jdk --strip-components=1
wget https://archive.apache.org/dist/hadoop/common/hadoop-3.1.2/hadoop-3.1.2.tar.gz -P /opt/airflow/
wget https://dlcdn.apache.org/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz          -P /opt/airflow/
tar -xzvf /opt/airflow/hadoop-3.1.2.tar.gz
tar -xzvf /opt/airflow/apache-hive-3.1.2-bin.tar.gz
rm /opt/airflow/hadoop-3.1.2.tar.gz /opt/airflow/apache-hive-3.1.2-bin.tar.gz
__EOF

COPY container/simplelogger.properties /opt/airflow/simplelogger.properties
COPY container/airflow.cfg /opt/airflow/airflow.cfg
COPY container/webserver_config.py /opt/airflow/webserver_config.py
COPY dags /opt/airflow/dags
COPY container/java.security /opt/airflow/jdk/jre/lib/security/java.security

ENV JAVA_HOME=/opt/airflow/jdk
ENV HADOOP_HOME=/opt/airflow/hadoop-3.1.2
ENV HIVE_HOME=/opt/airflow/apache-hive-3.1.2-bin
ENV PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$PATH
ENV CLASSPATH=/opt/airflow

Anything else

Logs.

00856f78ce9e
*** Reading local file: /opt/airflow/logs/dag_id=00_airflow_heart_beat/run_id=scheduled__2023-05-06T16:27:00+00:00/task_id=hive.heartbeat/attempt=1.log
[2023-05-06, 16:28:17 UTC] {taskinstance.py:1090} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: 00_airflow_heart_beat.hive.heartbeat scheduled__2023-05-06T16:27:00+00:00 [queued]>
[2023-05-06, 16:28:17 UTC] {taskinstance.py:1090} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: 00_airflow_heart_beat.hive.heartbeat scheduled__2023-05-06T16:27:00+00:00 [queued]>
[2023-05-06, 16:28:17 UTC] {taskinstance.py:1288} INFO - 
--------------------------------------------------------------------------------
[2023-05-06, 16:28:17 UTC] {taskinstance.py:1289} INFO - Starting attempt 1 of 1
[2023-05-06, 16:28:17 UTC] {taskinstance.py:1290} INFO - 
--------------------------------------------------------------------------------
[2023-05-06, 16:28:17 UTC] {taskinstance.py:1309} INFO - Executing <Task(HiveOperator): hive.heartbeat> on 2023-05-06 16:27:00+00:00
[2023-05-06, 16:28:17 UTC] {standard_task_runner.py:55} INFO - Started process 126 to run task
[2023-05-06, 16:28:17 UTC] {standard_task_runner.py:82} INFO - Running: ['airflow', 'tasks', 'run', '00_airflow_heart_beat', 'hive.heartbeat', 'scheduled__2023-05-06T16:27:00+00:00', '--job-id', '25', '--raw', '--subdir', 'DAGS_FOLDER/heartbeat.py', '--cfg-path', '/tmp/tmpeadnozw7']
[2023-05-06, 16:28:17 UTC] {standard_task_runner.py:83} INFO - Job 25: Subtask hive.heartbeat
[2023-05-06, 16:28:17 UTC] {task_command.py:389} INFO - Running <TaskInstance: 00_airflow_heart_beat.hive.heartbeat scheduled__2023-05-06T16:27:00+00:00 [running]> on host 00856f78ce9e
[2023-05-06, 16:28:17 UTC] {taskinstance.py:1518} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=00_airflow_heart_beat
AIRFLOW_CTX_TASK_ID=hive.heartbeat
AIRFLOW_CTX_EXECUTION_DATE=2023-05-06T16:27:00+00:00
AIRFLOW_CTX_TRY_NUMBER=1
AIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-05-06T16:27:00+00:00
[2023-05-06, 16:28:17 UTC] {hive.py:144} INFO - Executing: 
-- noinspection SqlResolveForFile
select "Hello world!", version()
[2023-05-06, 16:28:17 UTC] {base.py:73} INFO - Using connection ID 'hive_cli_default' for task execution.
[2023-05-06, 16:28:17 UTC] {hive.py:162} INFO - Passing HiveConf: {'airflow.ctx.dag_owner': 'airflow', 'airflow.ctx.dag_id': '00_airflow_heart_beat', 'airflow.ctx.task_id': 'hive.heartbeat', 'airflow.ctx.execution_date': '2023-05-06T16:27:00+00:00', 'airflow.ctx.try_number': '1', 'airflow.ctx.dag_run_id': 'scheduled__2023-05-06T16:27:00+00:00'}
[2023-05-06, 16:28:17 UTC] {hive.py:276} INFO - beeline -u "server" -n WRONG_USER_NAME -p *** -hiveconf airflow.ctx.dag_id=00_airflow_heart_beat -hiveconf airflow.ctx.task_id=hive.heartbeat -hiveconf airflow.ctx.execution_date=2023-05-06T16:27:00+00:00 -hiveconf airflow.ctx.try_number=1 -hiveconf airflow.ctx.dag_run_id=scheduled__2023-05-06T16:27:00+00:00 -hiveconf airflow.ctx.dag_owner=airflow -hiveconf airflow.ctx.dag_email= -hiveconf mapred.job.name=Airflow HiveOperator task for 00856f78ce9e.00_airflow_heart_beat.hive.heartbeat.2023-05-06T16:27:00+00:00 -f /tmp/airflow_hiveop_2wkewnyk/tmp8fyi6l5b
[2023-05-06, 16:28:18 UTC] {hive.py:288} INFO - SLF4J: Class path contains multiple SLF4J bindings.
[2023-05-06, 16:28:18 UTC] {hive.py:288} INFO - SLF4J: Found binding in [jar:file:/opt/airflow/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
[2023-05-06, 16:28:18 UTC] {hive.py:288} INFO - SLF4J: Found binding in [jar:file:/opt/airflow/hadoop-3.1.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
[2023-05-06, 16:28:18 UTC] {hive.py:288} INFO - SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
[2023-05-06, 16:28:18 UTC] {hive.py:288} INFO - SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
[2023-05-06, 16:28:19 UTC] {hive.py:288} INFO - Connecting to xxxxx
[2023-05-06, 16:28:19 UTC] {hive.py:288} INFO - 23/05/06 16:28:19 [main]: WARN jdbc.HiveConnection: Failed to connect xxxxx
[2023-05-06, 16:28:19 UTC] {hive.py:288} INFO - Unknown HS2 problem when communicating with Thrift server.
[2023-05-06, 16:28:19 UTC] {hive.py:288} INFO - Error: Could not open client transport with JDBC Uri: xxxx Peer indicated failure: PLAIN auth failed: javax.security.sasl.AuthenticationException: Error validating LDAP user [Caused by javax.naming.AuthenticationException: [LDAP: error code 49 - Invalid Credentials]] (state=08S01,code=0)
[2023-05-06, 16:28:19 UTC] {taskinstance.py:1332} INFO - Marking task as SUCCESS. dag_id=00_airflow_heart_beat, task_id=hive.heartbeat, execution_date=20230506T162700, start_date=20230506T162817, end_date=20230506T162819
[2023-05-06, 16:28:19 UTC] {local_task_job.py:212} INFO - Task exited with return code 0
[2023-05-06, 16:28:19 UTC] {taskinstance.py:2596} INFO - 0 downstream tasks scheduled from follow-on schedule check

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@kazuki-ma kazuki-ma added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels May 6, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented May 6, 2023

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@potiuk potiuk added good first issue and removed needs-triage label for new issues that we didn't triage yet labels May 6, 2023
@potiuk
Copy link
Member

potiuk commented May 6, 2023

Yes - looks like it's the case. Marged it as good first issue, you might want to try to fix it yourself and submit (does not seem to be super difficult), otherwise it will wait for someone to pick it up

@phanikumv phanikumv self-assigned this May 8, 2023
@phanikumv
Copy link
Contributor

I can take it up @potiuk , assigned it to myself

@kazuki-ma
Copy link
Author

Thank you. Sorry I'm not familier with hive operator inside. I'm grad if you fix it.

@eladkal
Copy link
Contributor

eladkal commented Oct 31, 2023

@phanikumv are you still working on this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants