
Robots.txt: Panduan Lengkap untuk SEO (2026)
TL;DR (Ringkasan Singkat)
Tapi kalau dipakai dengan benar, robots.txt bisa: - ✅ Optimize crawl budget - ✅ Prevent duplicate content issues - ✅ Protect sensitive pages - ✅ Improve indexing efficiency
format_list_bulleted
Daftar Isi
expand_more
Daftar Isi
Robots.txt adalah file kecil yang punya impact besar ke SEO. Satu kesalahan kecil bisa block seluruh website dari Google.
Tapi kalau dipakai dengan benar, robots.txt bisa: - ✅ Optimize crawl budget - ✅ Prevent duplicate content issues - ✅ Protect sensitive pages - ✅ Improve indexing efficiency
Artikel ini bahas everything you need to know tentang robots.txt untuk SEO.
Baca Juga Crawlability Masterclass: Robots.txt, Sitemap XML, dan Indexing untuk SEO 2026 arrow_forwardApa Itu Robots.txt?
Robots.txt adalah text file yang memberitahu search engine bots: - Halaman mana yang boleh di-crawl - Halaman mana yang tidak boleh di-crawl - Lokasi XML sitemap
File ini harus ada di root directory website: https://yourdomain.com/robots.txt
Contoh Robots.txt Sederhana:
User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /
Sitemap: https://jasaseo.id/sitemap.xml
Penjelasan:
- User-agent: * = Berlaku untuk semua bots
- Disallow: /admin/ = Jangan crawl folder /admin/
- Allow: / = Boleh crawl semua halaman lain
- Sitemap: = Lokasi XML sitemap
Baca juga: Crawlability Masterclass
build Robots Txt Generator
Gunakan Robots Txt Generator secara gratis untuk membantu optimasi Anda.
Syntax Robots.txt
1. User-agent
Specify bot mana yang di-target:
User-agent: Googlebot
Disallow: /private/
User-agent: Bingbot
Disallow: /admin/
User-agent: *
Disallow:
Common User-agents:
- Googlebot - Google Search
- Googlebot-Image - Google Images
- Googlebot-News - Google News
- Bingbot - Bing
- * - All bots
2. Disallow
Block bots dari crawl specific paths:
Disallow: /admin/ # Block folder
Disallow: /page.html # Block specific file
Disallow: /*.pdf$ # Block all PDF files
Disallow: /*? # Block URLs with parameters
3. Allow
Override Disallow untuk specific paths:
User-agent: *
Disallow: /private/
Allow: /private/public-page.html
Ini allow /private/public-page.html meskipun /private/ di-block.
4. Sitemap
Specify lokasi XML sitemap:
Sitemap: https://jasaseo.id/sitemap.xml
Sitemap: https://jasaseo.id/sitemap-blog.xml
Sitemap: https://jasaseo.id/sitemap-products.xml
Bisa list multiple sitemaps.
5. Crawl-delay (Deprecated untuk Google)
User-agent: *
Crawl-delay: 10
Google tidak support Crawl-delay. Pakai Google Search Console untuk adjust crawl rate.
Baca juga: XML Sitemap Optimization
Robots.txt Best Practices untuk SEO
1. Block Admin & System Pages
User-agent: *
Disallow: /admin/
Disallow: /wp-admin/
Disallow: /wp-login.php
Disallow: /cgi-bin/
Why: Pages ini nggak ada value untuk SEO dan waste crawl budget.
2. Block Duplicate Content
Disallow: /print/
Disallow: /*?sort=
Disallow: /*?filter=
Disallow: /search?
Why: Prevent Google crawl duplicate versions dari same content.
3. Block Low-Value Pages
Disallow: /cart/
Disallow: /checkout/
Disallow: /thank-you/
Disallow: /account/
Why: User-specific pages nggak perlu di-index.
4. Allow Important Resources
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Why: Some resources (CSS, JS, images) needed untuk render pages correctly.
5. Specify Sitemap Location
Sitemap: https://jasaseo.id/sitemap.xml
Why: Help bots discover all important pages faster.
Common Robots.txt Mistakes (& Cara Fixnya)
❌ Mistake #1: Blocking Entire Website
User-agent: *
Disallow: /
Impact: Website completely deindexed dari Google.
Fix:
User-agent: *
Disallow:
❌ Mistake #2: Blocking CSS/JS
Disallow: /assets/
Disallow: *.css
Disallow: *.js
Impact: Google nggak bisa render pages correctly → ranking turun.
Fix:
Allow: /assets/css/
Allow: /assets/js/
❌ Mistake #3: Blocking Important Pages
Disallow: /blog/
Impact: Blog posts nggak di-index.
Fix: Remove line atau specify specific pages to block.
❌ Mistake #4: Wrong File Location
File location: https://yourdomain.com/assets/robots.txt ❌
Fix: Move to root: https://yourdomain.com/robots.txt ✅
❌ Mistake #5: Case Sensitivity
Disallow: /Admin/
Tapi actual folder: /admin/
Impact: Rule nggak work (robots.txt is case-sensitive).
Fix: Match exact case.
Baca juga: Technical SEO Best Practices
Advanced Robots.txt Strategies
1. Optimize Crawl Budget (Large Sites)
Untuk website dengan 10,000+ pages, prioritize important pages:
User-agent: *
# Block low-value pages
Disallow: /tag/
Disallow: /author/
Disallow: /page/
Disallow: /*?
# Allow high-value pages
Allow: /blog/
Allow: /products/
Allow: /services/
Sitemap: https://jasaseo.id/sitemap.xml
2. E-commerce Faceted Navigation
User-agent: *
# Block filter combinations
Disallow: /*?color=
Disallow: /*?size=
Disallow: /*?price=
Disallow: /*&
# Allow main category pages
Allow: /products/
Why: Prevent crawling millions of filter combinations.
Baca juga: Faceted Navigation SEO
3. Block Search Results Pages
Disallow: /search?
Disallow: /search/
Disallow: /*?s=
Why: Internal search results are duplicate/low-value content.
4. Block Staging/Development Sites
User-agent: *
Disallow: /
Why: Prevent staging site dari competing dengan production site.
5. Different Rules untuk Different Bots
# Google: Allow everything except admin
User-agent: Googlebot
Disallow: /admin/
# Bing: More restrictive
User-agent: Bingbot
Disallow: /admin/
Disallow: /search/
# Block bad bots
User-agent: AhrefsBot
Disallow: /
User-agent: SemrushBot
Disallow: /
Robots.txt vs Meta Robots vs X-Robots-Tag
Comparison:
| Method | Scope | Use Case |
|---|---|---|
| Robots.txt | Site-wide rules | Block crawling (save crawl budget) |
| Meta Robots | Per-page rules | Control indexing (noindex, nofollow) |
| X-Robots-Tag | HTTP header | Control indexing untuk non-HTML (PDFs, images) |
When to Use Each:
Robots.txt: - ✅ Block low-value sections (admin, search, filters) - ✅ Optimize crawl budget - ❌ JANGAN pakai untuk prevent indexing (use noindex instead)
Meta Robots:
<meta name="robots" content="noindex, follow">
- ✅ Prevent specific pages dari indexing
- ✅ Control link equity flow
X-Robots-Tag:
X-Robots-Tag: noindex
- ✅ Control indexing untuk PDFs, images, videos
noindex instead.
Baca juga: Canonical Tag Guide
Cara Test & Validate Robots.txt
1. Google Search Console
- Go to Robots.txt Tester
- Paste robots.txt content
- Test specific URLs
- Check untuk errors/warnings
2. Manual Check
Visit: https://yourdomain.com/robots.txt
Verify: - ✅ File accessible (200 status code) - ✅ Correct syntax - ✅ Sitemap URL correct
3. Third-Party Tools
- Screaming Frog: Crawl website & check robots.txt compliance
- Ahrefs Site Audit: Identify blocked resources
- SEMrush Site Audit: Check robots.txt issues
4. Common Validation Checks
- ✅ File location:
/robots.txt(root directory) - ✅ File name: lowercase
robots.txt(notRobots.txt) - ✅ Syntax: No typos, correct directives
- ✅ Sitemap URL: Valid & accessible
- ✅ Important pages: NOT blocked
Robots.txt Templates
Template 1: Small Business Website
User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /
Sitemap: https://yourdomain.com/sitemap.xml
Template 2: E-commerce Site
User-agent: *
# Block admin & checkout
Disallow: /admin/
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
# Block search & filters
Disallow: /search?
Disallow: /*?color=
Disallow: /*?size=
Disallow: /*?price=
# Allow products & categories
Allow: /products/
Allow: /categories/
Sitemap: https://yourdomain.com/sitemap.xml
Sitemap: https://yourdomain.com/sitemap-products.xml
Template 3: Blog/Content Site
User-agent: *
# Block admin
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
# Block low-value pages
Disallow: /tag/
Disallow: /author/
Disallow: /search/
# Allow blog content
Allow: /blog/
Allow: /
Sitemap: https://yourdomain.com/sitemap.xml
Template 4: Staging Site (Block Everything)
User-agent: *
Disallow: /
Monitoring & Maintenance
Monthly Tasks:
- ✅ Check Google Search Console untuk crawl errors
- ✅ Verify sitemap URL masih valid
- ✅ Review blocked URLs (ada yang accidentally blocked?)
Quarterly Tasks:
- ✅ Audit crawl budget usage
- ✅ Review & update blocked sections
- ✅ Test robots.txt dengan new pages/features
After Major Updates:
- ✅ Verify robots.txt nggak accidentally changed
- ✅ Test dengan Google Robots.txt Tester
- ✅ Monitor indexing status di GSC
Kesimpulan: Robots.txt adalah Powerful Tool (Kalau Dipakai Benar)
Robots.txt bisa significantly improve SEO dengan: 1. Optimize crawl budget (block low-value pages) 2. Prevent duplicate content (block filter pages, search results) 3. Protect sensitive pages (admin, user accounts) 4. Guide bots ke important content (via sitemap)
Tapi one mistake bisa deindex entire website. Always test before deploy!
Action Items:
- ✅ Audit current robots.txt (ada mistakes?)
- ✅ Block low-value sections (admin, search, filters)
- ✅ Add sitemap URL
- ✅ Test dengan Google Robots.txt Tester
- ✅ Monitor crawl stats di Google Search Console
Butuh bantuan technical SEO audit? Konsultasi gratis dengan tim kami atau coba Free SEO Audit Tool.
Baca Juga: - XML Sitemap Optimization - Crawlability Masterclass - Canonical Tag Guide - Technical SEO Best Practices - Faceted Navigation SEO
read_more Artikel Terkait
Crawlability Masterclass: Robots.txt, Sitemap XML, dan Indexing untuk SEO 2026
Crawlability adalah kemampuan Google untuk mengakses dan memahami website Anda. Salah setting Robots...
Schema Markup 2026: Panduan Lengkap Structured Data untuk SEO
"Kenapa kompetitor saya muncul dengan bintang review dan FAQ di Google, sedangkan saya tidak?"...
XML Sitemap Optimization: Panduan Lengkap untuk SEO (2026)
Pelajari selengkapnya tentang topik ini....
Butuh Bantuan SEO Profesional?
Tim ahli kami siap membantu website Anda ranking di halaman 1 Google.