Automating VM Shutdown on Google Cloud Vertex AI at the End of the Day

Krishna Pullakandam
3 min readOct 17, 2023

In the world of cloud computing, optimizing resources and costs is a critical aspect of managing any project. Google Cloud Vertex AI provides an array of machine learning and AI-related services, including AI Platform Training, where you can train models on scalable infrastructure. While this is incredibly powerful, it can also lead to higher costs if these resources are left running unnecessarily. In this blog post, we will explore a practical solution to automatically shut down Google Cloud Vertex AI VMs at the end of the day to save on cloud costs.

Why Automate VM Shutdown?
Automating the shutdown of Vertex AI VMs at the end of the day offers several benefits:

  1. Cost Savings: By shutting down resources during non-working hours, you can significantly reduce your cloud costs.
  2. Resource Optimization: It helps ensure that resources are available when needed during working hours while avoiding resource waste.
  3. Environmental Responsibility: Reducing resource consumption when not in use aligns with environmental and sustainability goals.

Automating VM Shutdown with Google Cloud: Automating VM shutdown on Google Cloud Vertex AI involves a few key steps.

  1. Authorization and Setup: Ensure you have the necessary permissions and credentials to manage VM instances and services.
  2. Monitoring Job State: Monitor the state of your Vertex AI training jobs to identify running instances.
  3. Time-Based Scheduling: Determine the specific time to initiate the shutdown process. This can be set to align with the end of your working hours.
  4. Shut Down Instances: When the defined time is reached, shut down the running instances.

Using Python to Automate VM Shutdown: To automate VM shutdown, you can use Python along with the Google Cloud Vertex AI Python client library. Here’s a code snippet illustrating the process:

from google.cloud import aiplatform

project_id = "your-project-id"
location = "us-central1"

job_state_to_stop = aiplatform.gapic.JobState.JobStateValueValuesEnum.JOB_STATE_RUNNING

# Initialize Vertex AI client
client = aiplatform.gapic.JobServiceClient(client_options={"api_endpoint": f"{location}-aiplatform.googleapis.com"})

# List running training jobs
parent = f"projects/{project_id}/locations/{location}"
jobs = client.list(parent=parent, filter=f"state={job_state_to_stop}")

# Define the time to stop jobs
stop_time = current_time.replace(hour=17, minute=0, second=0)

# Loop through running jobs and stop them if created before the stop time
for job in jobs:
job_name = job.name
created_time = job.create_time.ToDatetime()
if created_time < stop_time:
client.cancel_job(name=job_name)

Note: Make sure to replace placeholders such as “your-project-id” with your project details when implementing the automation in your environment.

This code checks for running training jobs, compares their creation time to the defined stop time, and cancels any jobs that should be stopped.

Customization and Best Practices:
When implementing this automation, you can customize the script to suit your specific needs. Some best practices include:

  1. Regularly review and update the script as your work hours may change.
  2. Notify users and stakeholders about the automation to avoid unexpected interruptions.
  3. Monitor the script’s performance and resource savings to ensure it’s effective.

Conclusion:
Automating the shutdown of Vertex AI VMs at the end of the day is a practical way to optimize cloud resources and reduce costs. With the power of Python and the Google Cloud Vertex AI Python client library, you can implement this cost-saving automation to align your cloud usage with your work schedule, reduce resource waste, and contribute to sustainability efforts.

By following the steps and best practices outlined in this blog post, you can harness the full potential of Google Cloud Vertex AI while maintaining control over your costs.

References
1. Google Cloud Vertex AI Documentation — https://cloud.google.com/vertex-ai/docs

Note: Additionally, this blog post only provides a simplified overview of the process, and the actual implementation may require further configuration and considerations based on your organization’s needs and policies.

--

--

Krishna Pullakandam

Content writer and AI enthusiast. I love to write about technology, business, and culture.