Fastly UI Displaying Intermittent Errors for Streaming Logs

Incident
15 November 2023, 22:49 UTC

Fastly UI Displaying Intermittent Errors for Streaming Logs

Status: closed
Start: 14 November 2023, 13:41 UTC
End: 15 November 2023, 22:49 UTC
Duration: 1 day 9 hours 8 minutes
Affected Components:
Observability Real-time Log Streaming
Affected Groups:
All Public Users
Identified

14 November 2023, 13:41 UTC

14 November 2023, 13:41 UTC

Fastly has identified an issue in which customers may see error messages in the Fastly UI for S3 and Kinesis endpoints indicating that a token is expired. However, this is isolated to the Fastly logging system is intermittently hitting a rate with the AWS Security Token Service (STS) API. This intermittent error does not appear to be causing log loss, but results in an error messaging in the UI. This only affects endpoints S3 and Kinesis endpoints that are using role-based authentication. 

Fastly is currently working to resolve this intermittent error. All other locations and services are unaffected. 

Monitoring

15 November 2023, 00:16 UTC

15 November 2023, 00:16 UTC

Engineering has deployed a fix to mitigate rate limiting errors and have observed a gradual recovery for streaming log services. We will continue to monitor the effects of the change and will post an update once services have been fully restored.

Update

15 November 2023, 16:52 UTC

15 November 2023, 16:52 UTC

Our investigations into previously deployed mitigation measures has verified that our customers should no longer experience log loss as a result of this incident.

We investigated into the continued reports of error messages observed within the Fastly App and identified an error in the timing when reacquiring temporary credentials. We have confirmed that the impact to streaming log services has been resolved, and we do not see log loss in connection to this error message.

We are deploying an additional fix to resolve this Fastly App UI error message for our customers. We will post an update once all remaining error messages have been fully corrected.

Resolved

15 November 2023, 22:49 UTC

15 November 2023, 22:49 UTC

A fix was deployed and we have observed role-based S3 and Kinesis logging endpoints returning to normal in the Fastly UI. Services that handle little to no traffic may see the error remaining until the logging system has successfully sent a batch of logs.