SEO That Starts in the Markup
SEO is often treated as something separate from engineering. In practice, many of the things that improve SEO are the same things that improve accessibility, structure and overall clarity.
Why? Because good HTML tends to be easier for users to navigate, browsers to render and search engines to understand
Many SEO discussions online still focus on isolated tactics or ranking tricks. Well, time has progressed and so have the modern search systems. Today they are far more sophisticated than that. The goal today is usually not to “optimize for an algorithm” but to help search engines clearly understand useful content.
SEO starts with structure
Search engines primarily consume HTML.
That means the structure of the document matters. If you haven’t read our second post in this series about Semantic HTML, you can find it here [Part2: Semantic HTML as the foundation]
Basically a page with meaningful headings, good semantic landmarks and a clear content hierarchy is easier to understand than a page built entirely from generic containers.
<main>
<article>
<h1>Building a Modern Website</h1>
<section>
<h2>Semantic HTML</h2>
<p>...</p>
</section>
</article>
</main> This creates context not only for users but also for search engines trying to understand the purpose of the page. This is one reason semantic HTML remains important even in the modern frontend world.
Search engines have evolved
A lot of SEO advice still comes from an older understanding of how search engines work.
Modern search systems are significantly better at understanding context, links/relationships between topics, structure, intent and overall page quality.
Google repeatedly emphasizes creating content for people first, not content designed primarily to manipulate rankings.
What this actually means for us as developers is that it changes the role of SEO. Instead of trying to “optimize for algorithms”, the goal becomes helping search engines clearly understand useful content.
Metadata still matters
Metadata helps describe the page before the content is even parsed. Metadata alone will not make a page rank highly, but it helps establish clear context and improves how pages appear when discovered.
At minimum, most pages should at least define its title and description:
<title>Building a Modern Website</title>
<meta
name="description"
content="A practical series about accessibility, performance, and maintainable frontend architecture."
/> This information is commonly used in search engine previews, social sharing, browser tabs and bookmarking systems.
Open Graph and social sharing
Standard metadata as the example above is primarily for browsers and search engines. Open Graph metadata is primarily for social platforms and link previews. There is an important distinction here.
Open Graph metadata controls how pages appear when shared on platforms like:
- Slack
- Discord
- Teams
Without Open Graph metadata, these platforms often try to guess the title, description and preview image from the page itself. That can lead to inconsistent or incomplete previews. Standard metadata is still important but it was not originally designed for modern social sharing systems.
Take this example:
<meta property="og:title" content="Building a Modern Website" />
<meta property="og:description" content="A practical frontend engineering series." />
<meta property="og:image" content="https://example.com/cover.jpg" /> This allows platforms to generate more predictable and visually consistent link previews. Open Graph provides a more explicit format specifically intended for rich previews across platforms.
This does not directly influence ranking, but it improves presentation, consistency and shareability especially when content is shared externally.
Canonical URLs
A canonical URL helps indicate the preferred public version of a page.
<link
rel="canonical"
href="https://example.com/posts/semantic-html"
/> So what problem does canonical URLs solve?
Imagine the same content is accessible through multiple URLs:
-> /posts/semantic-html
-> /posts/semantic-html/
-> /posts/semantic-html?ref=linkedin
-> /posts?id=123 Even if those pages visually look identical to users, search engines may initially treat them as separate URLs. That creates ambiguity around:
- which URL should appear in search
- which URL should collect ranking signals
- which version should be indexed
A canonical URL is essentially a hint saying: “This is the preferred version of this content.”
Important note: canonical tags are treated as hints, not strict directives. Search engines may still choose a different canonical URL if other signals suggest a better alternative. Canonical tags are not a ranking boost, a way to “prioritize” pages or a versioning system like Git.
So couldn’t we intentionally create many near-identical URLs to dominate the results in the search engine?
Historically, people absolutely tried this.
Modern search systems are significantly better at detecting duplicate or near-duplicate content and consolidating those pages automatically, choosing one canonical version and ignoring the others in its search results.
At a high level, canonicalization helps search engines like Google to:
- consolidate duplicate or near-duplicate content
- combine ranking signals
- reduce crawl inefficiency
- choose which URL to display in search
Crawlability still matters
Search engines still fundamentally rely on discovering and parsing HTML documents. As previously mentioned, this is one reason semantic HTML and predictable structure remain important even in modern frontend architectures. This is also why a static site approach, as we wrote about in post 6 & 7, works well for search engines. Most of the content is rendered immediately and only small sections/islands depends on JavaScript.
In contrary, if content:
- only exists after heavy client-side rendering
- depends entirely on JavaScript execution
- is hidden behind inaccessible navigation
then it becomes harder to discover and understand. Modern search engines can execute JavaScript, but relying entirely on client-side rendering still introduces additional complexity.
Structured data still matters
Structured data helps search engines understand what a page represents.
This is commonly implemented using JSON-LD, which stands for JavaScript Object Notation for Linked Data.
In practice, JSON-LD is a structured way to describe a page or website’s entities, relationships, content types and purpose. This information is typically added inside a script tag:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "BlogPosting",
"headline": "SEO That Starts in the Markup",
"author": {
"@type": "Person",
"name": "John Doe"
}
}
</script> Unlike visible HTML content, this data is primarily intended for machines rather than users. The goal is not to manipulate rankings but to provide clearer context about the page and its content.
When to use @graph
As structured data grows, multiple schema objects often need to reference each other.
For example, you might have an article (Blogposting-Object), that exist on a webpage (WebPage-Object), that is part of a website (WebSite-Object), that is owned by an organization (Organization-Object), where people work (Person-Object).
You could of course define these objects independently and create references between them but overtime it can become very complex. If entities are related, objects referencing each other or you have a global reusable schema, this is where @graph becomes useful.
{
"@context": "https://schema.org",
"@graph": [
{
"@type": "WebSite",
"@id": "https://example.com/#website",
"url": "https://example.com",
"name": "Example"
},
{
"@type": "Organization",
"@id": "https://example.com/#organization",
"name": "Example Company",
"member": [
{
"@id": "https://example.com/#john-doe"
}
]
},
{
"@type": "BlogPosting",
"@id": "https://example.com/#blog-post",
"headline": "SEO That Starts in the Markup",
"isPartOf": {
"@id": "https://example.com/#website"
},
"author": {
"@id": "https://example.com/#john-doe"
}
},
{
"@type": "Person",
"@id": "https://example.com/#john-doe",
"name": "John Doe",
"worksFor": {
"@id": "https://example.com/#organization"
}
}
]
} Compare this to a setup without @graph, the BlogPosting-Object becomes nested and hard to reuse.
{
"@context": "https://schema.org",
"@type": "BlogPosting",
"headline": "SEO That Starts in the Markup",
"author": {
"@type": "Person",
"name": "John Doe",
"worksFor": {
"@type": "Organization",
"name": "Example Company"
}
},
"isPartOf": {
"@type": "WebSite",
"name": "Example"
}
} This is just for one entity, imagine how the full implemention would look.
When @graph is unnecessary
For smaller pages, a single schema object is often enough. If we only have a simple blog post, then the previous example is actually a perfect valid example when to NOT introduce @graph .
Why? Because in this case the relationships are simple, the entities are only used once and there is little benefit in separating everything into reusable objects. By adding @graph here would increase complexity without adding much value.
A useful rule is:
Use the simplest structure that accurately describes the page.
The goal of structured data is clarity, not complexity.
Structured data is not a ranking shortcut
Structured data can help search engines understand content more clearly but structured data alone does not guarantee stronger rankings. If the underlying content is weak, adding JSON-LD will not solve that problem.
Clear structure and useful content still matter more. Google emphasizes that structured data should reflect real content that exists on the page.
- If the page is not an article, it should not pretend to be one.
- If there is no FAQ section, it should not expose FAQ schema.
The purpose is to describe content accurately.
Ranking is not a single signal
Search ranking is not determined by one factor, or put it like this, there is no single SEO score, perfect metadata setup, exact heading count or magical word count.
Search systems evaluate many signals together.
- Is the page relevant?
- Is the page structure good?
- Is the content of good quality?
- Has the page internal linking?
- Does it comply to accessibility rules?
- What about mobile usability?
- Overall page experience?
- How many external pages links to our page and gives it reputation?
This is one reason why good engineering practices often support SEO naturally.
Performance and page experience
Performance alone does not guarantee strong rankings but poor performance can create a worse user experience, especially on slower devices or networks.
Search engines increasingly consider:
- responsiveness
- stability during loading
- mobile usability
- overall page experience
Heavy JavaScript bundles, layout shifts and delayed rendering affect both users and crawlability why frontend architecture decisions matter long before launch.
Internal linking matters more than most tricks
One of the simplest SEO improvements is often internal linking.
<a href="/posts/semantic-html">
Read more about semantic HTML
</a> Not only helps internal linking users to navigate but search engines to discover pages and establish relationships between topics. Google explicitly states that links help search engines discover pages and understand content relationships across a site. [Google documentation]
Common SEO misconceptions
Many commonly repeated SEO ideas are misunderstood or outdated. There are still examples on the internet emphasizing adding keywords to metadata and spending time to consider which exact keywords to add. To not mention some SEO checkers measuring and scoring arbitrary word counts or exact heading counts.
Most modern SEO improvements are less about “hacks” and more about:
- What a page is about and the purpose of it.
- How content is structured and how pages relate to each other.
- How it comply to the accessibility rules and if it works well on mobile devices.
- What is the reputation and trustworthiness of the site?
- Do other relevant sites reference or link to it?
Most SEO improvements in modern websites emerge naturally from semantic HTML, clear hierarchy, accessible navigation, fast performance and meaningful content. Most of what we have already covered in this series. The technical implementation matters but the underlying structure matters more.
Summary
SEO is often treated as something separate from engineering. In practice good SEO is often the result of good engineering decisions.
Semantic structure, accessible navigation, crawlable content, and clear metadata all help search engines understand a site more effectively.
Trying to outsmart ranking systems usually creates more complexity than value. Clear structure and useful content tend to age better than most SEO tactics.
Up Next
In the next part, we’ll look at performance engineering and why many performance problems are introduced long before a site reaches production.
Next: Coming soon
This is part of our “Building a Modern Website” series where we share how we approach software development, architecture and frontend engineering. If you’re building something similar, we also work as software development consultants. Check our Services for more details.
