Robots.txt: Panduan Lengkap untuk SEO (2026)

Robots.txt adalah file kecil yang punya impact besar ke SEO. Satu kesalahan kecil bisa block seluruh website dari Google.

Tapi kalau dipakai dengan benar, robots.txt bisa: - ✅ Optimize crawl budget - ✅ Prevent duplicate content issues - ✅ Protect sensitive pages - ✅ Improve indexing efficiency

Artikel ini bahas everything you need to know tentang robots.txt untuk SEO.

Apa Itu Robots.txt?

Robots.txt adalah text file yang memberitahu search engine bots: - Halaman mana yang boleh di-crawl - Halaman mana yang tidak boleh di-crawl - Lokasi XML sitemap

File ini harus ada di root directory website: https://yourdomain.com/robots.txt

Contoh Robots.txt Sederhana:

User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /

Sitemap: https://jasaseo.id/sitemap.xml

Penjelasan: - User-agent: * = Berlaku untuk semua bots - Disallow: /admin/ = Jangan crawl folder /admin/ - Allow: / = Boleh crawl semua halaman lain - Sitemap: = Lokasi XML sitemap

warning

Important

Robots.txt adalah directive, bukan command. Good bots akan follow, tapi bad bots bisa ignore.

build Robots Txt Generator

Gunakan Robots Txt Generator secara gratis untuk membantu optimasi Anda.

Coba Sekarang Gratis

Syntax Robots.txt

1. User-agent

Specify bot mana yang di-target:

User-agent: Googlebot
Disallow: /private/

User-agent: Bingbot
Disallow: /admin/

User-agent: *
Disallow:

Common User-agents: - Googlebot - Google Search - Googlebot-Image - Google Images - Googlebot-News - Google News - Bingbot - Bing - * - All bots

2. Disallow

Block bots dari crawl specific paths:

Disallow: /admin/           # Block folder
Disallow: /page.html        # Block specific file
Disallow: /*.pdf$           # Block all PDF files
Disallow: /*?               # Block URLs with parameters

3. Allow

Override Disallow untuk specific paths:

User-agent: *
Disallow: /private/
Allow: /private/public-page.html

Ini allow /private/public-page.html meskipun /private/ di-block.

4. Sitemap

Specify lokasi XML sitemap:

Sitemap: https://jasaseo.id/sitemap.xml
Sitemap: https://jasaseo.id/sitemap-blog.xml
Sitemap: https://jasaseo.id/sitemap-products.xml

Bisa list multiple sitemaps.

5. Crawl-delay (Deprecated untuk Google)

User-agent: *
Crawl-delay: 10

Google tidak support Crawl-delay. Pakai Google Search Console untuk adjust crawl rate.

Robots.txt Best Practices untuk SEO

1. Block Admin & System Pages

User-agent: *
Disallow: /admin/
Disallow: /wp-admin/
Disallow: /wp-login.php
Disallow: /cgi-bin/

Why: Pages ini nggak ada value untuk SEO dan waste crawl budget.

2. Block Duplicate Content

Disallow: /print/
Disallow: /*?sort=
Disallow: /*?filter=
Disallow: /search?

Why: Prevent Google crawl duplicate versions dari same content.

3. Block Low-Value Pages

Disallow: /cart/
Disallow: /checkout/
Disallow: /thank-you/
Disallow: /account/

Why: User-specific pages nggak perlu di-index.

4. Allow Important Resources

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

Why: Some resources (CSS, JS, images) needed untuk render pages correctly.

5. Specify Sitemap Location

Sitemap: https://jasaseo.id/sitemap.xml

Why: Help bots discover all important pages faster.

lightbulb

Pro Tip

Test robots.txt pakai Google Robots.txt Tester sebelum deploy.

Common Robots.txt Mistakes (& Cara Fixnya)

❌ Mistake #1: Blocking Entire Website

User-agent: *
Disallow: /

Impact: Website completely deindexed dari Google.

Fix:

User-agent: *
Disallow:

❌ Mistake #2: Blocking CSS/JS

Disallow: /assets/
Disallow: *.css
Disallow: *.js

Impact: Google nggak bisa render pages correctly → ranking turun.

Fix:

Allow: /assets/css/
Allow: /assets/js/

❌ Mistake #3: Blocking Important Pages

Disallow: /blog/

Impact: Blog posts nggak di-index.

Fix: Remove line atau specify specific pages to block.

❌ Mistake #4: Wrong File Location

File location: https://yourdomain.com/assets/robots.txt ❌

Fix: Move to root: https://yourdomain.com/robots.txt ✅

❌ Mistake #5: Case Sensitivity

Disallow: /Admin/

Tapi actual folder: /admin/

Impact: Rule nggak work (robots.txt is case-sensitive).

Fix: Match exact case.

Advanced Robots.txt Strategies

1. Optimize Crawl Budget (Large Sites)

Untuk website dengan 10,000+ pages, prioritize important pages:

User-agent: *
# Block low-value pages
Disallow: /tag/
Disallow: /author/
Disallow: /page/
Disallow: /*?

# Allow high-value pages
Allow: /blog/
Allow: /products/
Allow: /services/

Sitemap: https://jasaseo.id/sitemap.xml

User-agent: *
# Block filter combinations
Disallow: /*?color=
Disallow: /*?size=
Disallow: /*?price=
Disallow: /*&

# Allow main category pages
Allow: /products/

Why: Prevent crawling millions of filter combinations.

3. Block Search Results Pages

Disallow: /search?
Disallow: /search/
Disallow: /*?s=

Why: Internal search results are duplicate/low-value content.

4. Block Staging/Development Sites

User-agent: *
Disallow: /

Why: Prevent staging site dari competing dengan production site.

5. Different Rules untuk Different Bots

# Google: Allow everything except admin
User-agent: Googlebot
Disallow: /admin/

# Bing: More restrictive
User-agent: Bingbot
Disallow: /admin/
Disallow: /search/

# Block bad bots
User-agent: AhrefsBot
Disallow: /

User-agent: SemrushBot
Disallow: /

Robots.txt vs Meta Robots vs X-Robots-Tag

Comparison:

Method	Scope	Use Case
Robots.txt	Site-wide rules	Block crawling (save crawl budget)
Meta Robots	Per-page rules	Control indexing (`noindex`, `nofollow`)
X-Robots-Tag	HTTP header	Control indexing untuk non-HTML (PDFs, images)

When to Use Each:

Robots.txt: - ✅ Block low-value sections (admin, search, filters) - ✅ Optimize crawl budget - ❌ JANGAN pakai untuk prevent indexing (use noindex instead)

Meta Robots:

<meta name="robots" content="noindex, follow">

✅ Prevent specific pages dari indexing
✅ Control link equity flow

X-Robots-Tag:

X-Robots-Tag: noindex

✅ Control indexing untuk PDFs, images, videos

error

Warning

Robots.txt blocks crawling, NOT indexing. Kalau page sudah di-index, blocking via robots.txt nggak akan remove dari index. Pakai noindex instead.

Cara Test & Validate Robots.txt

1. Google Search Console

Go to Robots.txt Tester
Paste robots.txt content
Test specific URLs
Check untuk errors/warnings

2. Manual Check

Visit: https://yourdomain.com/robots.txt

Verify: - ✅ File accessible (200 status code) - ✅ Correct syntax - ✅ Sitemap URL correct

3. Third-Party Tools

Screaming Frog: Crawl website & check robots.txt compliance
Ahrefs Site Audit: Identify blocked resources
SEMrush Site Audit: Check robots.txt issues

4. Common Validation Checks

✅ File location: /robots.txt (root directory)
✅ File name: lowercase robots.txt (not Robots.txt)
✅ Syntax: No typos, correct directives
✅ Sitemap URL: Valid & accessible
✅ Important pages: NOT blocked

Robots.txt Templates

Template 1: Small Business Website

User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml

Template 2: E-commerce Site

User-agent: *
# Block admin & checkout
Disallow: /admin/
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/

# Block search & filters
Disallow: /search?
Disallow: /*?color=
Disallow: /*?size=
Disallow: /*?price=

# Allow products & categories
Allow: /products/
Allow: /categories/

Sitemap: https://yourdomain.com/sitemap.xml
Sitemap: https://yourdomain.com/sitemap-products.xml

Template 3: Blog/Content Site

User-agent: *
# Block admin
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

# Block low-value pages
Disallow: /tag/
Disallow: /author/
Disallow: /search/

# Allow blog content
Allow: /blog/
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml

Template 4: Staging Site (Block Everything)

User-agent: *
Disallow: /

Monitoring & Maintenance

Monthly Tasks:

✅ Check Google Search Console untuk crawl errors
✅ Verify sitemap URL masih valid
✅ Review blocked URLs (ada yang accidentally blocked?)

Quarterly Tasks:

✅ Audit crawl budget usage
✅ Review & update blocked sections
✅ Test robots.txt dengan new pages/features

After Major Updates:

✅ Verify robots.txt nggak accidentally changed
✅ Test dengan Google Robots.txt Tester
✅ Monitor indexing status di GSC

Kesimpulan: Robots.txt adalah Powerful Tool (Kalau Dipakai Benar)

Robots.txt bisa significantly improve SEO dengan: 1. Optimize crawl budget (block low-value pages) 2. Prevent duplicate content (block filter pages, search results) 3. Protect sensitive pages (admin, user accounts) 4. Guide bots ke important content (via sitemap)

Tapi one mistake bisa deindex entire website. Always test before deploy!

Action Items:

✅ Audit current robots.txt (ada mistakes?)
✅ Block low-value sections (admin, search, filters)
✅ Add sitemap URL
✅ Test dengan Google Robots.txt Tester
✅ Monitor crawl stats di Google Search Console

Butuh bantuan technical SEO audit? Konsultasi gratis dengan tim kami atau coba Free SEO Audit Tool.

read_more Artikel Terkait

Article

Crawlability Masterclass: Robots.txt, Sitemap XML, dan Indexing untuk SEO 2026

Crawlability adalah kemampuan Google untuk mengakses dan memahami website Anda. Salah setting Robots...

calendar_today 25 Jan 2026

Article

Schema Markup 2026: Panduan Lengkap Structured Data untuk SEO

"Kenapa kompetitor saya muncul dengan bintang review dan FAQ di Google, sedangkan saya tidak?"...

calendar_today 2026-01-25

Article

XML Sitemap Optimization: Panduan Lengkap untuk SEO (2026)

Pelajari selengkapnya tentang topik ini....

calendar_today 27 Jan 2026

Robots.txt: Panduan Lengkap untuk SEO (2026)

TL;DR (Ringkasan Singkat)

Daftar Isi

Apa Itu Robots.txt?

Contoh Robots.txt Sederhana:

build Robots Txt Generator

Syntax Robots.txt

1. User-agent

2. Disallow

3. Allow

4. Sitemap

5. Crawl-delay (Deprecated untuk Google)

Robots.txt Best Practices untuk SEO

1. Block Admin & System Pages

2. Block Duplicate Content

3. Block Low-Value Pages

4. Allow Important Resources

5. Specify Sitemap Location

Common Robots.txt Mistakes (& Cara Fixnya)

❌ Mistake #1: Blocking Entire Website

❌ Mistake #2: Blocking CSS/JS

❌ Mistake #3: Blocking Important Pages

❌ Mistake #4: Wrong File Location

❌ Mistake #5: Case Sensitivity

Advanced Robots.txt Strategies

1. Optimize Crawl Budget (Large Sites)

2. E-commerce Faceted Navigation

3. Block Search Results Pages

4. Block Staging/Development Sites

5. Different Rules untuk Different Bots

Robots.txt vs Meta Robots vs X-Robots-Tag

Comparison:

When to Use Each:

Cara Test & Validate Robots.txt

1. Google Search Console

2. Manual Check

3. Third-Party Tools

4. Common Validation Checks

Robots.txt Templates

Template 1: Small Business Website

Template 2: E-commerce Site

Template 3: Blog/Content Site

Template 4: Staging Site (Block Everything)

Monitoring & Maintenance

Monthly Tasks:

Quarterly Tasks:

After Major Updates:

Kesimpulan: Robots.txt adalah Powerful Tool (Kalau Dipakai Benar)

Action Items:

read_more Artikel Terkait

Crawlability Masterclass: Robots.txt, Sitemap XML, dan Indexing untuk SEO 2026

Schema Markup 2026: Panduan Lengkap Structured Data untuk SEO

XML Sitemap Optimization: Panduan Lengkap untuk SEO (2026)

Butuh Bantuan SEO Profesional?