Provation Apex is currently experiencing technical difficulties
Incident Report for Provation
Postmortem

Postmortem: Sporadic Error Saving Notes & Printing Issues

Incident Summary

On March 7th 09:19 CST Apex customers were experiencing sporadic errors when saving notes and encountering printing issues. Investigation revealed that 1 out of 4 apex instances were not processing larger payload traffic successfully. All Apex instances were cleared, and issue was resolved at 10:00 CST.

Root Cause

The root cause of the issue was a lack of available disk space on certain apex instances.

Detailed Analysis

  1. Disk Space Shortage:
* The lack of available disk space was identified as the primary issue.
* apex instances were unable to process larger payloads due to insufficient disk space.
* This impacted the overall system performance and caused sporadic errors for users.
  1. Excessive Log Files:
* Further investigation revealed that log files were consuming a significant amount of disk space.
* These log files were not being deleted frequently enough, leading to the disk space shortage.
* The increasing Apex traffic contributed to the accumulation of log files.
  1. Log File Management:
* The team had not adjusted the log file deletion frequency based on the increased Apex traffic.
* As a result, log files were not being purged at an acceptable rate.
* No alerting mechanism existed to warn the team about the scarce disk space capacity.

Corrective Actions

  1. Immediate Disk Space Cleanup:
* The team performed an emergency cleanup to free up disk space on affected Apex instances.
* Old log files were removed to alleviate the shortage.
  1. Log Rotation and Deletion Strategy:
* A log rotation and deletion strategy was implemented.
* Log files are now rotated and deleted at regular intervals based on traffic patterns.
* The deletion frequency is adjusted dynamically to accommodate increased traffic.
  1. Alerting System Enhancement:
* An alerting system was set up to notify the team when disk space reaches critical levels.
* Alerts are triggered based on predefined thresholds to prevent future incidents.

Preventive Measures

  1. Capacity Planning:
* Regular capacity planning exercises will be conducted to anticipate resource needs.
* Disk space requirements will be reviewed and adjusted as necessary.
  1. Automated Log Management:
* Explore automated log management tools to ensure timely deletion and rotation.
* Regularly monitor log file sizes and adjust retention policies accordingly.
  1. Documentation and Training:
* Document the log management process and educate team members.
* Ensure everyone understands the importance of disk space management.
Posted Mar 21, 2024 - 18:04 CDT

Resolved
Provation Apex has fully recovered. We apologize for the inconvenience.
Posted Mar 07, 2024 - 10:00 CST
Identified
Provation Apex is currently experiencing a partial outage, resulting in some users finding it unreachable. Investigation is underway. New information will be posted here as it becomes available.

Please reference this link for Apex Desktop App offline mode instructions -

Offline Mode Instructions

Click the "Subscribe to Updates" button on this page to get email updates sent to your inbox whenever a change is made to this page.
Posted Mar 07, 2024 - 09:19 CST
This incident affected: Provation Apex.