Categories
Google JSON Schema SEO Structured Data

Does Vertex AI Use Schema Markup?

Does Vertex AI Use Your Schema Markup? Clarifying Data Needs for SEOs

Why your schema markup still matters for search, even if Vertex AI works differently: Clearing up the confusion between Schema.org for Google Search and data schemas within Vertex AI.

The rapid rise of Generative AI and platforms like Google Cloud’s Vertex AI is changing how businesses think about data and search. For SEO professionals and marketers accustomed to using Schema.org markup to enhance visibility in traditional Google Search, a critical question emerges: Does Vertex AI, Google’s powerful AI platform, directly use the Schema.org markup embedded on websites?

The short answer is no, not in the way Google Search does. This article clarifies the distinction and explains how each system uses “schemas.”

Table of Contents

Does Google Vertex AI Rely on Schema Markup?

Many linked data cloud users are seeking to clarify if Vertex AI services directly consume or depend on Schema.org markup found on websites. Searching methods are changing. How search works is changing. LLM integration is a reality. The 10 blue links search engine result pages (SERPs) are being replaced.

What does “updating schemas” mean in Vertex AI?

What specific schemas is it talking about? Schema markup? No. We need the ability to place key distinctions and context.

In the context of Vertex AI, “updating schemas” refers to modifying the structure or organization of data. This could involve adding new fields, changing data types, or altering relationships between data elements, typically within a database or data pipeline.

The “schemas” referred to in Google Cloud Update a Schema documentation are NOT the same as the Schema.org markup used by Google Search on public web pages. The documentation relates to Google’s Generative AI App Builder, which is now part of Vertex AI Search and Conversation.

Table Demonstrating the Different Contexts and Purposes for Schemas

Criterion Schema.org Markup (for Google Search) Data Store Schemas (for Vertex AI Search / Gen App Builder)
Purpose A standardized vocabulary (using formats like JSON-LD, Microdata, RDFa) embedded in the HTML of public web pages. Defines the structure, data types, and indexing options for the specific data you upload into your private data store within the Vertex AI Search service.
Goal To help general web crawlers (like Googlebot) recognize the content and context of a web page to improve search results (e.g., enabling rich snippets, Knowledge Graph entries). To tell Vertex AI Search how to index, filter, facet, search, and retrieve the documents you provide for your specific search application.
Scope Public web, standardized types (Product, Recipe, Event, etc.). Private to your Google Cloud project and specific data store. User-defined or inferred based on uploaded data (often JSON).
Consumption Primarily by Google Search crawlers. By the Vertex AI Search service to power the search/recommendation/conversational application you are building.

What I understand is that the “schemas” in that documentation refer to the configuration of your specific data store within a Google Cloud service, not the public Schema.org vocabulary used by traditional Google Search.

So, how we prepare and ground our data for Vertex AI or Agentic AI matters.

How Vertex AI Actually Gets Its Data

So, if Vertex AI isn’t scraping your website’s Schema.org markup, how does it get information? As AI platforms become primary tools for search, structuring your content aids parsing and agentic understanding of the information presented. This is crucial for providing direct and clear answers in AI-driven search experiences.

Vertex AI primarily relies on structured data explicitly provided by the user through specific channels:

  1. APIs: Directly sending structured data (often JSON) via API calls.
  2. Google Cloud Storage (GCS): Uploading files (like JSON Lines, CSV) containing structured data.
  3. BigQuery: Connecting directly to tables containing your structured data.
  4. Vertex AI Feature Store: Ingesting pre-defined features.[1]

Within these methods, you typically provide structured data using fields like struct_data (using Google’s standard Protobuf Struct format) or json_data (providing a JSON string) within the Document objects you send to services like Vertex AI Search.

The focus is on providing data formatted according to Vertex AI’s requirements, not embedding Schema.org on external websites.

While Vertex AI doesn’t directly consume your website’s Schema.org markup like Google Search does, understanding the underlying concepts of how AI systems process structured information helps clarify why both types of structured data are valuable in their respective contexts. This needs to be working knowledge for AI SEO strategists.

The Graph Concept as a Representation Model

Let’s take a look at how Graph Databases or Knowledge Graphs play a part in underlying AI technologies.

Both Google Search (when processing Schema.org and other signals) and Vertex AI services (when processing your uploaded data) need to recognize entities, facts, and relationships connecting them. The Google Knowledge Graph strives to organize this data.

In theory, Vertex AI uses graph concepts internally to model relationships within your private, structured data for better machine learning. Google Search uses schema markup externally, on the open web, to recognize the semantic meaning of public content.

It may leverage a standardized graph vocabulary (Schema.org) to “feed” its Knowledge Graph and enhance search results. Google may use the data from your schema markup to update, augment, or confirm information about that entity within the Knowledge Graph, but it doesn’t directly ingust it.

Both use graphs to represent structured information, but for fundamentally different audiences and objectives.

Components of structured information:

  • Entities as Nodes: This relates to real-world things (your company, a specific product, a location) as points or nodes in a network. A node is a single point within the graph.

    The “entire known data graph of an entity” would encompass the node itself, all its properties, all its direct relationships (edges) to other nodes, and potentially those neighboring nodes as well.

  • Facts as Properties & Edges: Information about these entities (e.g., a product’s price, a company’s address) can be seen as properties attached to these nodes.

    How entities relate to each other (e.g., ‘Product X’ is manufactured by ‘Company Y’, ‘Event Z’ takes place at ‘Location A’) are represented by connections or edges between the nodes.

  • Numerical values as factual representations (Embeddings): In machine learning contexts related to graph embeddings or knowledge graph embeddings (for potential training or use within Vertex AI), nodes are frequently mapped to numerical vectors (embeddings).

    These dense numerical vectors encode information about the node’s properties and relationships within the graph structure.

In traditonal Google Search (and possibly in AI Overview answers), this is often supplied by schema markup on a website. When Google and Bing draw data from a website, schema markup is helpful to overcome content and entity ambiguity issues.

This graph-like way of representing structured information allows systems to recognize context and answer complex questions. Using semantic triples to explain data relationships is extremely helpful. It is part of creating good content structure. I like to focus on a continued building on the web’s core data infrastructure while increasing the capacity of a web application.

Table: Vertex AI vs. Google Search: Data Handling Comparison

Feature Vertex AI Google Search
Primary Data Source Structured data explicitly provided by the user. Content crawled from public websites.
Data Input Method Via APIs, Google Cloud Storage (GCS), BigQuery uploads. Automated web crawling (Googlebot).
Relevant “Schema” Type Platform-specific configurations (e.g., defining structure for struct_data, json_data fields). Schema.org markup (JSON-LD, Microdata, RDFa) embedded in web page HTML.
Main Purpose of Using Data To power specific AI applications (search, recommendations, etc.) built by the user on the platform. To recognize and interpret public web content, enable Rich Results in SERPs, and feed the Knowledge Graph.

How to prepare data for ingesting when using Vertex AI Agent Builder:

If you plan to import data from Cloud Storage with metadata, put a JSON file that contains the metadata into a Cloud Storage bucket whose location you provide during import.

Let’s look at specific tools that can help.

Tools for Creating Internal Data Schemas for Vertex AI

Tool/Mechanism Purpose & Context Implementation Details
Custom Schemas (MetadataSchemas) Used to standardize and type-check metadata properties associated with MLOps resources, such as Artifacts, Executions, or Contexts, within a MetadataStore. They allow users to query resources by schema, for instance, to “list all Artifacts of type MyCustomModel“. This is part of tracking and analyzing ML metadata in Vertex AI.
  • The schema format adheres to a subset of the OpenAPI 3.0 specification.
  • The top-level schema must be restricted to be of type object.
  • They are defined and registered as a MetadataSchema resource via the REST API or the Vertex AI SDK for Python.
  • The schema can be versioned by providing a different schema_version.
Vertex AI Agent Builder Console Provides a graphical interface to manage schemas for data stores containing structured data or websites with structured data. This ensures the underlying Retrieval Augmented Generation (RAG) system correctly processes and retrieves data for AI grounding.
  • Allows users to manually map key properties (such as title, uri, and description). Google strongly recommends updating the schema with key property mappings, especially for title, to ensure correct display and better generative results.
  • Users can update field annotations (like Retrievable, Indexable, Dynamic Facetable, and Searchable).
  • Adding new fields before importing documents can shorten reindexing time.
  • Editing the schema triggers reindexing of the data store, which can take several hours for large data stores and may incur costs.
REST API (schemas.patch and create_metadata_schema) Used for updating schemas, especially for website data stores or custom definitions in Vertex AI Agent Builder (also referred to as AI Applications). It is also used to manage MLOps schemas (`create_metadata_schema`).
  • For Vertex AI Agent Builder, schema updates use the schemas.patch API method.
  • The schema is provided as a JSON object, often conforming to the JSON Schema structure.
  • API client libraries are available for multiple languages, including C#, Go, Java, Python, and Ruby, to integrate schema management into production environments.
  • Schema updates must be backward compatible, and unsupported changes include changing a field’s type or removing an existing field.
Python Functions (Function Calling) When building AI agents using code-first approaches like **LangChain on Vertex AI (Reasoning Engine)**, Python functions serve as tools that define the agent’s capabilities. These implicitly define the schema the agent uses to perform actions.
  • The functions interact with external systems and APIs to retrieve real-time information.
  • The functions can perform RAG by retrieving indexed documents from a vector database or query external APIs (e.g., exchange rates, Google Drive).
  • The LangChain template in Reasoning Engine introspects the function’s name, arguments, docstrings, and type hints to create the tool description. The Gemini model uses for decision-making (tool selection)
  • Using the Reasoning Engine abstracts away the need to write an OpenAPI specification for the API call.

“Structured data

Prepare your data according to the import method that you plan to use. If you plan to ingest media data, also see Structured media data.

You can import structured data from the following sources:

  • Cloud Storage.
  • Local JSON data.
  • Third-party data sources.

When you import structured data from BigQuery or from Cloud Storage, you are given the option to import the data with metadata. (Structured with metadata is also referred to as enhanced structured data.)” – Vertex AI Agent Builder: Prepare data for ingesting

While the Vertex AI and Schema.org distinction is key, we focus on understanding how it fits into the broader evolution of search. Here are related considerations for SEOs:

The future of AI SEO, or Generative Engine Optimization (GEO), or Answer Engine Optimization (AEO), lies in the strategic integration of AI into broader marketing and content strategies.

As AI continues to evolve, its technology and insights will become a central part of shaping more effective SEO campaigns. Keeping pace with AI-driven updates to search engine algorithms—shifting from tactical execution to strategic oversight means that SEO professionals need to adapt nonstop.

A human with imagination, creativity, and a keen understanding of this evolving landscape must always be at the helm.

To help SEO’s and Businesses transition effectively, Hill Web Marketing recommends:

  • Creating human curated/edited, comprehensive content, factual/sourced content, and converting AI-friendly content. This is where human editors play a vital role in refining AI-generated content, ensuring accuracy, adding nuance, and aligning the output with specific goals.
  • Focusing on niche expertise articles and use topic expert authors.
  • Optimizing to satisfy user experience signals, contextual meaning and the intent behind a user’s search query.
  • Checking your data quality includes steps to ensure data accuracy, completeness, relevance, and freshness. Go to the Data quality page in the Search for commerce console and check your “Critical Threshold” score. [2]
  • Prioritizing credibility and trust by avoiding low-effort content and adding clear value to what is already on the web.

Gaining a strong data footprint in Google’s AI Overviews means adapting your strategies now to secure a competitive advantage and maintain SERP visibility. Think of schemas as predefined templates or blueprints that define the structure and type of information to be stored. Like a form with specific fields for a kids art retailer (e.g., “Products,” “Customers,” “Orders,” “Timestamp”).

What Can We Conclude About the Relationship Between Google Vertex AI and Schemas?

  • Direct answer: Within my current experience, I cannot say that Google Vertex AI directly relies on Schema.org markup embedded on websites as a primary data source.
  • Clarification of what we do know: Vertex AI heavily relies on structured data formats. This data is typically provided through dedicated channels (APIs, GCS, BigQuery, etc.). It is consumed in formats defined by Vertex AI services themselves.
  • Schema markup’s indirect connection: Schema.org data could potentially be extracted from websites, processed, structured into a suitable format (e.g., BigQuery table), and then fed into Vertex AI; however, this is an indirect data pipeline, versus a native reliance within Vertex AI.
  • Key takeaway: Focus on preparing and structuring data that aligns to the current specific requirements of the Vertex AI service being used. It is best to not assume it consumes or is reliant on web Schema.org markup.

Gaining “word” clarity: “Schema” is often used informally to refer to the structure you put inside struct_data or json_data. However, the actual field name for transmitting that data in the Vertex AI API here is struct_data or json_data.

“Vertex AI uses tabular (structured) data to train a machine learning model to make predictions on new data.

You can import data either from your computer or from Cloud Storage in an available format (CSV or JSON Lines) with the labels (and bounding boxes, if necessary) inline. For more information on import file format, see Preparing your training data. If you want to split your dataset manually, you can specify the splits in your CSV or JSON Lines import file.” – Google AutoML Beginner’s Guide

Legacy schema terms and methods need clarifying as AI is rapidly evolving

Official Google Cloud documentation for Vertex AI Search and Conversation clearly outlines the use of struct_data and json_data for incorporating structured data into documents for indexing and search features (like filtering, faceting, boosting).

How Schema MarkUp Helps Search Engines Recognize Content

I had a great conversation with Jarno Van Driel about how Google uses schema markup to “recognize content” versus “understand content.”

Jarno is a true expert at placing the value and use of structured data markup. AI agents do not possess consciousness, subjective experience, emotions, or intentionality in the same way that humans do when we “think.” They operate based on complex algorithms, the data they have been trained on, and the rules that govern their functionality.

While AI can perform increasingly sophisticated processing, such as Google’s ability to “understand” or “recognize” content through schema markup, or Vertex AI data models, we are talking about a level of interpretation and contextual awareness.

The context of using the term “intelligent agents” refers to Vertex AI’s ability to perform tasks effectively and autonomously within their defined parameters. Its “reasoning logic” describes the algorithmic processes they follow to arrive at conclusions or actions, which is different from human reasoning that involves consciousness and a broader understanding of how people find solutions, products, and services.

AI agents do not “think” or “understand” in the human sense of the word, which involves consciousness, subjective experience, and genuine understanding.

My guess is that Google uses schema markup to better process and interpret the content on a webpage, which goes beyond mere “text recognition.”

For example, your organization schema markup can influence your Google Knowledge Graph by using schema vocabulary to describe your organization’s key entities and properties.

Why and when schema markup is still important?

Just because Vertex AI doesn’t directly rely on your website’s Schema.org doesn’t mean it’s unimportant! Schema markup remains critical for:

  • Helpful to Google Search: While the key benefit is providing semantic meaning that is consumed by Google’s core search engine, signals indicate that it continues to support visibility, rich snippets, and entity disambiguation.
  • Potential Future Use: While not currently the case, acknowledge the possibility (though speculative) of future integrations.
  • Feeds the Knowledge Graph: It potentially influences how you appear in Knowledge Panels and other entity-focused search features.
  • Good Data Practice: Implementing structured data (or schema markup) reflects good structured data practices, which might facilitate easier extraction and transformation for other systems (like Vertex AI pipelines) later.

While Google is getting better at content recognition without it, schema markup remains a powerful tool recommended by Google itself. Ignoring it means potentially missing out on significant visibility and CTR benefits in Google Search.

Your schema might confirm or add details like the official logo, founder information, contact details, location, product attributes, event details, etc., to the Knowledge Graph’s representation of that local business entity. When using Vertex AI, the best thing is to adhere to its specific requirements for structured data.

How web data is organized is advancing so fast, there is one more thing to mention here – MCP.

AI Agents and the Model Context Protocol for Organized and Contextualized Data

Search is evolving from human users typing into a Google Search box to agents acting on their behalf; the later requires structured, machine-actionable context. Model Context Protocol (MCP) provides another standardized, structured interface for AI agents to access various types of business data.

MCP is a way for businesses to expose their data (product catalogs, stock levels, etc.) through a standardized interface for AI agents. Its protocol relies on the same underlying principle of being intentionally structured and prepared for their consumption. Think of it as utilizing dynamic knowledge graphs and ontologies to build a foundation linking data through semantic connections, which is crucial for providing context to AI agents.

The semantic structuring inherent in Schema.org could potentially facilitate easier extraction and transformation of website data into formats suitable for MCP or for direct ingestion into Vertex AI. Structured and semantically rich data is paramount for the effective functioning of AI systems, whether they are public search engines, private AI platforms, or MCP’s contextual capability bounding.

While the specific protocols and methods of data consumption differ, the fundamental need for well-organized and contextualized data is consistent.

“With the AI revolution sweeping everything in marketing and technology, I believe a better framing is systems of context and systems of truth. But the defining characteristic of the martech “stack” in an AI world is going to be context and the truth it’s wrapped around.” – Scott Brinker [4]

Let’s wrap it up.

Table of Key Differences: Vertex AI Graph Concept vs. Google Search Schema

Key Differences Summarized
Feature The Graph Concept in Vertex AI (Structured Data) Schema Markup Consumption by Google Search
Primary Goal Improve ML model performance on a specific dataset Enhance search engine recognition and enable Rich Results
Data Source Internal structured data (BigQuery, SQL, CSV) Public web page content (HTML)
Representation Often implicit graph traversal for features; explicit for GNNs Explicit semantic markup using Schema.org vocabulary
Consumer ML models, Data Scientists within Vertex AI / GCP Google Search engine, Google Knowledge Graph
Scope Internal to a specific ML project/pipeline Public Web / Global Search Ecosystem
Control User controls data and processing within Vertex AI Website owner provides markup; Google controls consumption
Nature of Graph Represents relationships within the provided dataset Represents semantic meaning on a webpage for external use

In my opinion weak, vague, SEO’d, mall-formed, obsolete or redundant schema markup may be a contributing factor to Google relying on it less in the future. I’d rather see Google announce markup will be used to enrich AIO results.

Google’s Open Multimodal Model with Long Context & Vision seeks to reduce hallucinations by implementing in-context attribution techniques to minimize factual errors. It may consider this a more trustworthy approach to data facts than manually influenced schema markup. [3]

SUMMARY: Explore Google Vertex AI and Useful Schema Markup

Focus on preparing data according to the specific requirements of the Vertex AI service you’re using, while continuing to implement robust Schema.org markup on your website for its crucial role in Google Search visibility and entity search. The key takeaway is to leverage either or both methods to transform raw data into actionable insights for business growth.

As an AI integration consultant, Hill Web Marketing helps businesses sort through what effort, impact, and timing will make a worthwhile difference. We look for real results that matter to you.

Call us at 651-206-2410 to incorporate a semantic learning approach by Using Artifical Intelligence for Improved Search Visibility

 

Resources:

[1] https://cloud.google.com/vertex-ai/docs/featurestore/latest/overview

[2] https://cloud.google.com/retail/docs/data-quality

[3] https://developers.google.com/search/docs/appearance/structured-data/intro-structured-data

[4] https://chiefmartec.com/2025/02/meet-the-new-martech-stack-systems-of-context-and-systems-of-truth/