Aethir Earth Service Guide
Last updated
Last updated
Aethir ensures reliable, high-performance GPU infrastructure through proactive monitoring, fast issue resolution, and clear service levels—minimizing disruptions and maximizing uptime.
Our servers come with the following specifications and limitations:
CPU: All servers use x86-based processors.
GPU: We offer a range of GPU-equipped servers. For current GPU models, please refer to the .
Infrastructure: All servers are bare-metal and run either Ubuntu 20.04 LTS or Ubuntu 22.04 LTS. Other distributions, including Debian or similar derivatives, are not supported.
Location Restrictions: We do not provide GPU resources in the following regions: China and OFAC/UN-sanctioned countries, including Cuba, Iran, North Korea, Syria, and the Crimea, Donetsk, and Luhansk regions.
Proactive Monitoring and First-Level Support
First-Level Debugging:
Identify the root cause of issues and resolve them where possible.
Escalate internally or to relevant Cloud Hosts if necessary.
Proactive Infrastructure Monitoring:
Continuous monitoring of GPU infrastructure, including network health and server availability.
Automatic ticket creation upon detection of incidents, with immediate notifications sent to you via the communication channels listed below.
Intermediate Support and Server Management
Server Rebooting:
Perform server reboots or rebuilds based on customer requests.
New User Setup:
Ensure that all new client information is safely stored and users gain secure access to servers.
Verify that login issues are resolved before closing tickets.
Server Cleanup:
After the client's usage period ends, the server will be reset to its original state and prepared for the next client.
Advanced Support and Server Provisioning
GPU Burn Tests:
Run stress tests to validate GPU performance and ensure reliability under load.
Network Configuration and Firewall Setup:
Configure network policies, firewall rules, and SSH keys to ensure secure and uninterrupted access.
GPU Driver Support:
Install and maintain NVIDIA GPU drivers, ensuring compatibility and functionality with customer workloads.
Server Provisioning:
Provision bare metal or virtual servers based on project requirements.
Configure hardware to meet the performance needs of GPU-intensive applications.
Zabbix Monitoring Installation:
Install and maintain Zabbix-Agent and Zabbix-Server for advanced infrastructure monitoring.
Operating System Installation:
Install Ubuntu 20.04/22.04 LTS and apply all necessary security patches and updates during the initial setup. Debian and other derivative or similar distributions are not supported.
BIOS/Firmware Updates:
Ensure BIOS and firmware are always up-to-date to maintain performance and security.
Orchestration and Management Software:
Installation, configuration, or management of orchestration software such as Slurm, Kubernetes, or Nomad are not included.
These tools require specific configurations based on individual workload requirements and will be the responsibility of the customer to implement.
Application and Tool Configuration:
The installation and configuration of development tools and libraries, such as PyTorch, TensorFlow, JAX, or any other machine learning frameworks and libraries, are out of scope.
Configuration of these tools is specific to the customer’s project needs and must be managed by the customer's technical team.
Ongoing Maintenance of Customer-Deployed/Modified Software:
Technical support and maintenance for software or services that are deployed or modified by the customer.
This includes troubleshooting and assisting with issues that arise in customer-deployed software.
Ongoing Updates and Upgrades:
Updates or upgrades of the OS, drivers, or any installed software beyond the initial setup are not covered unless specifically included in a separate maintenance agreement.
Training and Usage:
Training on how to use the GPU clusters or consultation on best practices for optimizing GPU utilization are not included in this scope.
Third-party Software Licensing:
Any licensing required for operating systems or other software utilized in the GPU clusters is the responsibility of the customer.
Aethir is committed to providing exceptional support and service for your GPU infrastructure needs. This operational support plan ensures smooth operations, quick resolution of issues, and transparency in communication. We look forward to partnering with you to achieve your business goals.
To access our full support guide or explore partnership opportunities, please complete the . A member of our team will reach out to you shortly after receiving your submission.