Skip to content

Switch all airflow logging to structlog#52651

Merged
kaxil merged 5 commits intomainfrom
structlog-in-logging-mixin
Sep 9, 2025
Merged

Switch all airflow logging to structlog#52651
kaxil merged 5 commits intomainfrom
structlog-in-logging-mixin

Conversation

@ashb
Copy link
Copy Markdown
Member

@ashb ashb commented Jul 1, 2025

This change continues the work that was started in Airflow 3.0 and AIP-72 to
use Structlog in place of stdlib logging in Airflow.

The primary change here is to make LoggingMixin return a customized
structlogger; customized to maintain surface compatability with logging.Logger
(surface meaning things like handlers and filters aren't preserved) and to
then capture/redirect all logging via stdlib via Structlog processors.

This is the first step in allowing all Airflow components to be able to
produce JSON logs natively.

Things of note in this implementation:

  • We have a customized structlog filtering logger that has a "per-logger-tree"
    level concept.

    This is implemented using a prefix trie to lookup the logging level to
    efficiently be able to look up the level for child loggers when configured
    only at the parent level.

  • We have a custom PercentFormatRender class that renders non-JSON logs that
    understands the stdlib style format strings, meaning users custom logging
    config will be respected again for the daemon components.

    (Note though: this won't help with Task logs, as those are always JSON and
    the UI does the rendering of those.)

  • There is no longer a need for a different log format for colored and plain -- using
    color format specifiers (%(blue)s, %(log_level)s etc) when colors are
    disabled/not available will output nothing in their place.

  • Introduce an mechanism for users to easily set the log level for loggers --
    for instance if you are debugging the scheduler, it would be nice to be able
    to set the airflow.jobs.scheduler_job_runner logger to DEBUG while keeping
    everything else at info. The change to allow this is not in this PR, but once this
    lands it will be a simpler follow on PR.

@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Aug 17, 2025
@ashb ashb removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Aug 17, 2025
@ashb ashb force-pushed the structlog-in-logging-mixin branch 3 times, most recently from d27ea4f to e0e9a7d Compare September 1, 2025 20:44
kaxil pushed a commit to astronomer/airflow that referenced this pull request Sep 9, 2025
This change continues the work that was started in Airflow 3.0 and AIP-72 to
use Structlog in place of stdlib logging in Airflow.

The primary change here is to make LoggingMixin return a customized
structlogger; customized to maintain surface compatability with logging.Logger
(surface meaning things like handlers and filters aren't preserved) and to
then capture/redirect all logging via stdlib via Structlog processors.

This is the first step in allowing all Airflow components to be able to
produce JSON logs natively.

Things of note in this implementation:

- We have a customized structlog filtering logger that has a "per-logger-tree"
  level concept.

  This is implemented using a prefix trie[1] to lookup the logging level to
  efficiently be able to look up the level for child loggers when configured
  only at the parent level.

- We have a custom `PercentFormatRender` class that renders non-JSON logs that
  understands the stdlib style format strings, meaning users custom logging
  config will be respected again for the daemon components.

  (Note though: this won't help with Task logs, as those are always JSON and
  the UI does the rendering of those.)

- There is no longer a need for a different log format for colored and plain -- using
  color format specifiers (`%(blue)s`, `%(log_level)s` etc) when colors are
  disabled/not available will output nothing in their place.

- Introduce an mechanism for users to easily set the log level for loggers --
  for instance if you are debugging the scheduler, it would be nice to be able
  to set the `airflow.jobs.scheduler_job_runner` logger to DEBUG while keeping
  everything else at info.

[1]: https://en.wikipedia.org/wiki/Trie


The reason for not using caplog has gone away with the switch to structlog,
and we already override the builtin `caplog` fixture to use our structlog
version
ashb added a commit to astronomer/airflow that referenced this pull request Sep 9, 2025
…ctly.

This was broken in apache#52651 with our move away from FileTaskHandler, and hidden
when running in Breeze due to the default logs folder already existing.
ashb added a commit that referenced this pull request Sep 9, 2025
…ctly. (#55431)

This was broken in #52651 with our move away from FileTaskHandler, and hidden
when running in Breeze due to the default logs folder already existing.
kaxil pushed a commit that referenced this pull request Sep 9, 2025
This change continues the work that was started in Airflow 3.0 and AIP-72 to
use Structlog in place of stdlib logging in Airflow.

The primary change here is to make LoggingMixin return a customized
structlogger; customized to maintain surface compatability with logging.Logger
(surface meaning things like handlers and filters aren't preserved) and to
then capture/redirect all logging via stdlib via Structlog processors.

This is the first step in allowing all Airflow components to be able to
produce JSON logs natively.

Things of note in this implementation:

- We have a customized structlog filtering logger that has a "per-logger-tree"
  level concept.

  This is implemented using a prefix trie[1] to lookup the logging level to
  efficiently be able to look up the level for child loggers when configured
  only at the parent level.

- We have a custom `PercentFormatRender` class that renders non-JSON logs that
  understands the stdlib style format strings, meaning users custom logging
  config will be respected again for the daemon components.

  (Note though: this won't help with Task logs, as those are always JSON and
  the UI does the rendering of those.)

- There is no longer a need for a different log format for colored and plain -- using
  color format specifiers (`%(blue)s`, `%(log_level)s` etc) when colors are
  disabled/not available will output nothing in their place.

- Introduce an mechanism for users to easily set the log level for loggers --
  for instance if you are debugging the scheduler, it would be nice to be able
  to set the `airflow.jobs.scheduler_job_runner` logger to DEBUG while keeping
  everything else at info.

[1]: https://en.wikipedia.org/wiki/Trie


The reason for not using caplog has gone away with the switch to structlog,
and we already override the builtin `caplog` fixture to use our structlog
version
kaxil pushed a commit that referenced this pull request Sep 9, 2025
…ctly. (#55431)

This was broken in #52651 with our move away from FileTaskHandler, and hidden
when running in Breeze due to the default logs folder already existing.
ashb added a commit to astronomer/airflow that referenced this pull request Sep 10, 2025
… fields

With the move to structured logging wholesale in apache#52651, we are going to start
seeing a lot more structured log key/values other than just `logger` and
`chan` -- so "Toggle Source" now just hides those specific fields.

It also changes the format/display to cope better with more than one KV being shown
in the logs.
ashb added a commit to astronomer/airflow that referenced this pull request Sep 10, 2025
… fields

With the move to structured logging wholesale in apache#52651, we are going to start
seeing a lot more structured log key/values other than just `logger` and
`chan` -- so "Toggle Source" now just hides those specific fields.

It also changes the format/display to cope better with more than one KV being shown
in the logs.
ashb added a commit to astronomer/airflow that referenced this pull request Sep 12, 2025
…sk Logs

This was present in 2.x (and earlier) in the default config which most people
left as is. This brings the information back (though displayed in task logs in
a different format)

Additionally, although the `log_format` config is not respected for the
display, we do examine it to see if any of the other "callsite parameters" (as
structlog calls them) such as `processName` or `process` (the pid) are
present, and if they are, those will be recorded in the JSON task logs too.

As part of this, the already-ignored (since apache#52651) colored_log_format has
been removed from the config so it doesn't show up in the docs.

THe "Toggle Source" option in the front end now also hides the "loc" (short
for location. Maybe should be localized? Ditto with "source" though) field. I
also added a unit test to the "Toggle Source" in the front end which wasn't
covered by any unit tests.

Fixes apache#54145
ashb added a commit to astronomer/airflow that referenced this pull request Sep 12, 2025
…sk Logs

This was present in 2.x (and earlier) in the default config which most people
left as is. This brings the information back (though displayed in task logs in
a different format)

Additionally, although the `log_format` config is not respected for the
display, we do examine it to see if any of the other "callsite parameters" (as
structlog calls them) such as `processName` or `process` (the pid) are
present, and if they are, those will be recorded in the JSON task logs too.

As part of this, the already-ignored (since apache#52651) colored_log_format has
been removed from the config so it doesn't show up in the docs.

THe "Toggle Source" option in the front end now also hides the "loc" (short
for location. Maybe should be localized? Ditto with "source" though) field. I
also added a unit test to the "Toggle Source" in the front end which wasn't
covered by any unit tests.

Fixes apache#54145
ashb added a commit to astronomer/airflow that referenced this pull request Sep 13, 2025
…sk Logs

This was present in 2.x (and earlier) in the default config which most people
left as is. This brings the information back (though displayed in task logs in
a different format)

Additionally, although the `log_format` config is not respected for the
display, we do examine it to see if any of the other "callsite parameters" (as
structlog calls them) such as `processName` or `process` (the pid) are
present, and if they are, those will be recorded in the JSON task logs too.

As part of this, the already-ignored (since apache#52651) colored_log_format has
been removed from the config so it doesn't show up in the docs.

THe "Toggle Source" option in the front end now also hides the "loc" (short
for location. Maybe should be localized? Ditto with "source" though) field. I
also added a unit test to the "Toggle Source" in the front end which wasn't
covered by any unit tests.

Fixes apache#54145
ashb added a commit to astronomer/airflow that referenced this pull request Sep 15, 2025
…sk Logs

This was present in 2.x (and earlier) in the default config which most people
left as is. This brings the information back (though displayed in task logs in
a different format)

Additionally, although the `log_format` config is not respected for the
display, we do examine it to see if any of the other "callsite parameters" (as
structlog calls them) such as `processName` or `process` (the pid) are
present, and if they are, those will be recorded in the JSON task logs too.

As part of this, the already-ignored (since apache#52651) colored_log_format has
been removed from the config so it doesn't show up in the docs.

THe "Toggle Source" option in the front end now also hides the "loc" (short
for location. Maybe should be localized? Ditto with "source" though) field. I
also added a unit test to the "Toggle Source" in the front end which wasn't
covered by any unit tests.

Fixes apache#54145
ashb added a commit to astronomer/airflow that referenced this pull request Sep 15, 2025
…sk Logs

This was present in 2.x (and earlier) in the default config which most people
left as is. This brings the information back (though displayed in task logs in
a different format)

Additionally, although the `log_format` config is not respected for the
display, we do examine it to see if any of the other "callsite parameters" (as
structlog calls them) such as `processName` or `process` (the pid) are
present, and if they are, those will be recorded in the JSON task logs too.

As part of this, the already-ignored (since apache#52651) colored_log_format has
been removed from the config so it doesn't show up in the docs.

THe "Toggle Source" option in the front end now also hides the "loc" (short
for location. Maybe should be localized? Ditto with "source" though) field. I
also added a unit test to the "Toggle Source" in the front end which wasn't
covered by any unit tests.

Fixes apache#54145
ashb added a commit to astronomer/airflow that referenced this pull request Sep 15, 2025
…sk Logs

This was present in 2.x (and earlier) in the default config which most people
left as is. This brings the information back (though displayed in task logs in
a different format)

Additionally, although the `log_format` config is not respected for the
display, we do examine it to see if any of the other "callsite parameters" (as
structlog calls them) such as `processName` or `process` (the pid) are
present, and if they are, those will be recorded in the JSON task logs too.

As part of this, the already-ignored (since apache#52651) colored_log_format has
been removed from the config so it doesn't show up in the docs.

THe "Toggle Source" option in the front end now also hides the "loc" (short
for location. Maybe should be localized? Ditto with "source" though) field. I
also added a unit test to the "Toggle Source" in the front end which wasn't
covered by any unit tests.

Fixes apache#54145
kaxil pushed a commit that referenced this pull request Sep 15, 2025
…sk Logs (#55581)

This was present in 2.x (and earlier) in the default config which most people
left as is. This brings the information back (though displayed in task logs in
a different format)

Additionally, although the `log_format` config is not respected for the
display, we do examine it to see if any of the other "callsite parameters" (as
structlog calls them) such as `processName` or `process` (the pid) are
present, and if they are, those will be recorded in the JSON task logs too.

As part of this, the already-ignored (since #52651) colored_log_format has
been removed from the config so it doesn't show up in the docs.

THe "Toggle Source" option in the front end now also hides the "loc" (short
for location. Maybe should be localized? Ditto with "source" though) field. I
also added a unit test to the "Toggle Source" in the front end which wasn't
covered by any unit tests.

Fixes #54145
kaxil pushed a commit that referenced this pull request Sep 15, 2025
…sk Logs (#55581)

This was present in 2.x (and earlier) in the default config which most people
left as is. This brings the information back (though displayed in task logs in
a different format)

Additionally, although the `log_format` config is not respected for the
display, we do examine it to see if any of the other "callsite parameters" (as
structlog calls them) such as `processName` or `process` (the pid) are
present, and if they are, those will be recorded in the JSON task logs too.

As part of this, the already-ignored (since #52651) colored_log_format has
been removed from the config so it doesn't show up in the docs.

THe "Toggle Source" option in the front end now also hides the "loc" (short
for location. Maybe should be localized? Ditto with "source" though) field. I
also added a unit test to the "Toggle Source" in the front end which wasn't
covered by any unit tests.

Fixes #54145

(cherry picked from commit 6801bca)
yash1thsa pushed a commit to yash1thsa/airflow that referenced this pull request Sep 16, 2025
…sk Logs (apache#55581)

This was present in 2.x (and earlier) in the default config which most people
left as is. This brings the information back (though displayed in task logs in
a different format)

Additionally, although the `log_format` config is not respected for the
display, we do examine it to see if any of the other "callsite parameters" (as
structlog calls them) such as `processName` or `process` (the pid) are
present, and if they are, those will be recorded in the JSON task logs too.

As part of this, the already-ignored (since apache#52651) colored_log_format has
been removed from the config so it doesn't show up in the docs.

THe "Toggle Source" option in the front end now also hides the "loc" (short
for location. Maybe should be localized? Ditto with "source" though) field. I
also added a unit test to the "Toggle Source" in the front end which wasn't
covered by any unit tests.

Fixes apache#54145
suman-himanshu pushed a commit to suman-himanshu/airflow that referenced this pull request Sep 17, 2025
…sk Logs (apache#55581)

This was present in 2.x (and earlier) in the default config which most people
left as is. This brings the information back (though displayed in task logs in
a different format)

Additionally, although the `log_format` config is not respected for the
display, we do examine it to see if any of the other "callsite parameters" (as
structlog calls them) such as `processName` or `process` (the pid) are
present, and if they are, those will be recorded in the JSON task logs too.

As part of this, the already-ignored (since apache#52651) colored_log_format has
been removed from the config so it doesn't show up in the docs.

THe "Toggle Source" option in the front end now also hides the "loc" (short
for location. Maybe should be localized? Ditto with "source" though) field. I
also added a unit test to the "Toggle Source" in the front end which wasn't
covered by any unit tests.

Fixes apache#54145
Brunda10 pushed a commit to Brunda10/airflow that referenced this pull request Sep 17, 2025
This change continues the work that was started in Airflow 3.0 and AIP-72 to
use Structlog in place of stdlib logging in Airflow.

The primary change here is to make LoggingMixin return a customized
structlogger; customized to maintain surface compatability with logging.Logger
(surface meaning things like handlers and filters aren't preserved) and to
then capture/redirect all logging via stdlib via Structlog processors.

This is the first step in allowing all Airflow components to be able to
produce JSON logs natively.

Things of note in this implementation:

- We have a customized structlog filtering logger that has a "per-logger-tree"
  level concept.

  This is implemented using a prefix trie[1] to lookup the logging level to
  efficiently be able to look up the level for child loggers when configured
  only at the parent level.

- We have a custom `PercentFormatRender` class that renders non-JSON logs that
  understands the stdlib style format strings, meaning users custom logging
  config will be respected again for the daemon components.

  (Note though: this won't help with Task logs, as those are always JSON and
  the UI does the rendering of those.)

- There is no longer a need for a different log format for colored and plain -- using
  color format specifiers (`%(blue)s`, `%(log_level)s` etc) when colors are
  disabled/not available will output nothing in their place.

- Introduce an mechanism for users to easily set the log level for loggers --
  for instance if you are debugging the scheduler, it would be nice to be able
  to set the `airflow.jobs.scheduler_job_runner` logger to DEBUG while keeping
  everything else at info.

[1]: https://en.wikipedia.org/wiki/Trie


The reason for not using caplog has gone away with the switch to structlog,
and we already override the builtin `caplog` fixture to use our structlog
version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants