Sentry 9 - fix for PagerDuty legacy integration.
sentry
pagerduty
Recently I’ve been involved in investigating why PagerDuty
integration with sentry
9.1.1 doesn’t work.
Same thing was happening to 9.1.2 version. The problem was not visible in UI but in logs such error message was repeating:
19:13:04 [INFO] sentry.plugins.pagerduty: notification-plugin.notify-failed. (plugin=u’pagerduty’ error=u’Error Communicating with PagerDuty (HTTP 400): Event object is invalid’)
No incoming event on PagerDuty side assured me that this problem is real.
Problem investigation
It’s clear that Sentry
PagerDuty
plugin is using APIv1 because client.py
is using https://events.pagerduty.com/generic/2010-04-15/create_event.json url as integration endpoint.
Error we get in logs has 400 HTTP code:
19:13:04 [INFO] sentry.plugins.pagerduty: notification-plugin.notify-failed. (plugin=u’pagerduty’ error=u’Error Communicating with PagerDuty (HTTP 400): Event object is invalid’)
Related documentation on PagerDuty
.
If the event is improperly formatted, a 400 Bad Request will be returned.
Quick search on github
revealed other people struggling with same issue #356
Conversation is locked unfortunately.
Immediate fix
TLDR: docker image cloudposse/sentry:9.1.3
Some more search revealed that one fix (PR 469) to PagerDuty
was merged to master but never released as a version.
This problem could possibly be easily fixed by just changing two lines in PagerDuty plugin. Such change should alleviate the problem.
I created repository cloudposse/sentry with Dockerfile
that just replace affected PagerDuty
python file.
After building and testing cloudposse/sentry:9.1.3
it turned out that events were correctly sent to PagerDuty
Root cause
Our request sent to PagerDuty
is the culprit.
But what is exactly wrong with it?
I spinned up testing sentry
installation with docker-compose
. Added volume on PagerDuty
plugin for debug and simply logged event sent to PagerDuty
API.
What draws my attention is 'incident_key': 1L,
- it’s of type long
(Python3
only have int
as long
and int
was unified).
What six.text_type
do is it change this variable to str
.
After fix incident_key
changed to unicode
type.
Sending an event with that change works flawlessly.
From PagerDuty
docs we know that incident_key
must be string.
The issue was happening because sentry
was sending json
event with integer for incident_key
.
Request is sent by session
(from sentry.http import build_session
) which is part of sentry
and is out of scope of my investigation.
If one want to follow then I’d start here
Conclusion
Even though the change is rather simple it might be not easy to get it. Sentry
is big product with a lot of moving parts.
Debugging in docker-compose
is not the easiest thing but it helped me to understand what went wrong and what exactly this fix does.
Root cause investigation was done only “for fun”. Issue was fixed before that but this allowed me to created this post so one can follow debugging path.
What’s also worth to note is that closing github
issues and restricting commenting does not help community.
3h4x