Ads

Use Of Robot.txt file ................. !

Thursday, October 28, 2010


Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.
It works likes this: a robot wants to vists a Web site URL, say http://www.example.com/index.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:
User-agent: *
Disallow: /
The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.
There are two important considerations when using /robots.txt:
robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use.
So don't try to use /robots.txt to hide information.
See also:
o Can I block just bad robots?
o Why did this robot ignore my /robots.txt?
o What are the security implications of /robots.txt?
The details
The /robots.txt is a de-facto standard, and is not owned by any standards body. There are two historical descriptions:
the original 1994 A Standard for Robot Exclusion document.
a 1997 Internet Draft specification A Method for Web Robots Control
In addition there are external resources:
HTML 4.01 specification, Appendix B.4.1
Wikipedia - Robots Exclusion Standard
The /robots.txt standard is not actively developed. See What about further development of /robots.txt? for more discussion.
The rest of this page gives an overview of how to use /robots.txt on your server, with some simple recipes. To learn more see also the FAQ.
How to create a /robots.txt file
Where to put it
The short answer: in the top-level directory of your web server.
The longer answer:
When a robot looks for the "/robots.txt" file for URL, it strips the path component from the URL (everything from the first single slash), and puts "/robots.txt" in its place.
For example, for "http://www.example.com/shop/index.html, it will remove the "/shop/index.html", and replace it with "/robots.txt", and will end up with "http://www.example.com/robots.txt".
So, as a web site owner you need to put it in the right place on your web server for that resulting URL to work. Usually that is the same place where you put your web site's main "index.html" welcome page. Where exactly that is, and how to put the file there, depends on your web server software.
Remember to use all lower case for the filename: "robots.txt", not "Robots.TXT.
See also:
  • What program should I use to create /robots.txt?
  • How do I use /robots.txt on a virtual host?
  • How do I use /robots.txt on a shared host?
What to put in it The "/robots.txt" file is a text file, with one or more records. Usually contains a single record looking like this:

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /~joe/

In this example, three directories are excluded.
Note that you need a separate "Disallow" line for every URL prefix you want to exclude -- you cannot say "Disallow: /cgi-bin/ /tmp/" on a single line. Also, you may not have blank lines in a record, as they are used to delimit multiple records.

Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "User-agent: *bot*", "Disallow: /tmp/*" or "Disallow: *.gif".

What you want to exclude depends on your server. Everything not explicitly disallowed is considered fair game to retrieve. Here follow some examples:

To exclude all robots from the entire server
User-agent: *
Disallow: /

To allow all robots complete access
User-agent: *
Disallow:
(or just create an empty "/robots.txt" file, or don't use one at all)

To exclude all robots from part of the server
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/

To exclude a single robot
User-agent: BadBot
Disallow: /

To allow a single robot
User-agent: Google
Disallow:

User-agent: *
Disallow: /

To exclude all files except one
This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "stuff", and leave the one file in the level above this

directory:
User-agent: *
Disallow: /~joe/stuff/

Alternatively you can explicitly disallow all disallowed pages:

User-agent: *
Disallow: /~joe/junk.html
Disallow: /~joe/foo.html
Disallow: /~joe/bar.html

Read more ...

How XML sitemap is useful in Search Engine Optimization?

Wednesday, October 27, 2010
Search engine sitemaps are organized structures of websites' pages, usually useful to search engines for better content indexing. A search engine sitemap (meant only for search engines to see) is a tool provided by Google, Yahoo!, and MSN to allow webmasters to suggest how often, and in what order, a search engine should spider each page within their websites.
Google Sitemaps has become a popular tool in the world of webmasters. The service helps them continually submit fresh content to the search engine.

How to build Sitemaps?

Creating a search engine sitemap is quite simple thanks to a free online tool atXML-Sitemaps.com that will automatically spider your website and create the sitemap for you. They offer up to 500 webpages maximum spidered for free.

Alternative tools :

1. Simply create a sitemap with the free Search Engine Sitemap Generatorand upload it to your server.
2. The Google Sitemap Generator is an open source online application that can help you build a sitemap from scratch. It's a Python script that creates valid search engine sitemaps for your sites using the Sitemap protocol.

Upload the generated script (say sitemap.xml) to your Web site, and tell Google how to find it. Note the file name, and which URL you will use to find it. If you upload it to the root of your domain, then it will be http://www.seo-expert-gaurav.blogspot.com/sitemap.xml.

Now log into your Google sitemap account (Google Webmaster Tools), and point Google to the sitemap stored on your site, using the URL that you already know.

Should you get stuck at any point, feel free to browse through Google's official documentation and tutorials on the subject at Using the Sitemap Generator.

Advantages of search engine site map :

The advantage of Search Engine Sitemaps (or XML sitemaps) over a normal “page of links” sitemap is that you can:

1. Specify the priority of pages to be crawled and/or indexed.
2. Exclude lower priority pages.
3. Insure that Search Engines know about every page on your website.

Until today, only Google, Yahoo and MSN supported this protocol. Today however, there is a new member to the family. Ask.com.

No longer! Vanessa announced that all Search Engines have now agreed to accept Sitemap submissions through the robots.txt file on your server.

The robots.txt file is a Search Engine industry standard file that is the VERY FIRST file a LEGIT Search Engine will view when it first comes to your website. Now, you can simply add your sitemap URL to this file in the form of:
Sitemap: http://www.seo-expert-gaurav.blogspot.com/sitemap.xml

Simply create a sitemap with the free Search Engine Sitemap Generator and upload it to your server. Then open the robots.txt file on your server and add the address as above. This makes it as simple as ever to insure that all Search Engines know about your site, know about what pages are in your site and know what pages of your site to list in the search results.

How to Check Your Google Sitemap Reports :

Google will identify any errors in your site and publish the results to you in the form of a report.
Setps to check sitemap report:
1. Visit the Google Web master tools section of the site. This can be found at www.google.com/webmasters.
2. Add your site to Google if you haven't already done so.
3. Verify that you are the site's owner by either uploading an HTML file to your site or adding a Meta tag.
4. View the statistics and report that Google has already generated about your sitemap. It details the last time the spider went to your page. If your site is not new, then chances are Google has already crawled your site.
5. Change the option to "Enable Google Page Rank" when prompted in the install process of the program. Then hit Finish and wait.
6. Click the Add sitemap link to create a new Google sitemap.
7. Enter your sitemap to tell Google all about your pages.
8. Visit again to view the reports on your pages.

Read more ...

What is back link or inbound links ??

Friday, October 22, 2010
Back links are link to a website or web page. Inbound links were originally important (prior to the emergence of search engines) as a primary means of web navigation; today their significance lies in search engine optimization (SEO). The number of backlinks is one indication of the popularity or importance of that website or page (for example, this is use by Google to determine the PageRank of a webpage). Outside of SEO, the backlinks of a webpage may be of significant personal, cultural or semantic interest: they indicate who is paying attention to that page.
In basic link terminology, a backlink is any link received by a web node (web page, directory, website, or top level domain) from another web node [1]. Backlinks are also known asincoming links, inbound links, inlinks, and inward links.
Contents
[hide]
1 Search engine rankings
2 Technical
3 See also
4 References

[edit]Search engine rankings
Search engines often use the number of backlinks that a website has as one of the most important factors for determining that website's search engine ranking, popularity and importance. Google's description of their PageRank system, for instance, notes that Google interprets a link from page A to page B as a vote, by page A, for page B.[2] Knowledge of this form of search engine rankings has fueled a portion of the SEO industry commonly termed linkspam, where a company attempts to place as many inbound links as possible to their site regardless of the context of the originating site.
Websites often employ various techniques (called search engine optimization, usually shortened to SEO) to increase the number of backlinks pointing to their website. Some methods are free for use by everyone whereas some methods like linkbaiting requires quite a bit of planning and marketing to work. Some websites stumble upon "linkbaiting" naturally; the sites that are the first with a tidbit of 'breaking news' about a celebrity are good examples of that. When "linkbait" happens, many websites will link to the 'baiting' website because there is information there that is of extreme interest to a large number of people.
There are several factors that determine the value of a backlink. Backlinks from authoritative sites on a given topic are highly valuable.[3] If both sites have content geared toward the keyword topic, the backlink is considered relevant and believed to have strong influence on the search engine rankings of the webpage granted the backlink. A backlink represents a favorable 'editorial vote' for the receiving webpage from another granting webpage. Another important factor is the anchor text of the backlink. Anchor text is the descriptive labeling of the hyperlink as it appears on a webpage. Search engine bots (i.e., spiders, crawlers, etc.) examine the anchor text to evaluate how relevant it is to the content on a webpage. Anchor text and webpage content congruency are highly weighted in search engine results page (SERP) rankings of a webpage with respect to any given keyword query by a search engine user.
Increasingly, inbound links are being weighed against link popularity and originating context. This transition is reducing the notion of one link, one vote in SEO, a trend proponents[who?]hope will help curb linkspam as a whole.
It should also be noted that building too many Backlinks over a short period of time can get a website's ranking penalized, and in extreme cases, the website is de-indexed altogether. Anything above a couple of hundred a day is considered "dangerous".
[edit]Technical
When HTML (Hyper Text Markup Language) was designed, there was no explicit mechanism in the design to keep track of backlinks in software, as this carried additional logistical and network overhead.
Some website software internally keeps track of backlinks. Examples of this include most wiki and CMS software.
Most commercial search engines provide a mechanism to determine the number of backlinks they have recorded to a particular web page. For example, Google can be searched usinglink:wikipedia.org to find the number of pages on the Web pointing to http://wikipedia.org/. Google only shows a small fraction of the number of links pointing to a site. It credits many more backlinks than it shows for each website.
Other mechanisms have been developed to track backlinks between disparate webpages controlled by organizations that aren't associated with each other. The most notable example of this is TrackBacks between blogs.

Read more ...

Page Rank Based On Popularity

Saturday, October 16, 2010

The web search technology offered by Google is often the technology of choice of the world’s leading portals and websites. It has also benefited the advertisers with its unique advertising program that does not hamper the web surfing experience of its users but still brings revenues to the advertisers.


When you search for a particular keyword or a phrase, most of the search engines return a list of page in order of the number of times the keyword or phrase appears on the website. Google web search technology involves the use of its indigenously designed Page Rank Technology and hypertext-matching analysis which makes several instantaneous calculations undertaken without any human intervention. Google’s structural design also expands simultaneously as the internet expands.


Page Rank technology involves the use of an equation which comprises of millions of variables and terms and determines a factual measurement of the significance of web pages and is calculated by solving an equation of 500 million variables and more than 3 billion terms. Unlike some other search engines, Google does not calculate links, but utilizes the extensive link structure of the web as an organizational tool. When the link to a Page, let’s say Page B is clicked from a Page A, then that click is attributed as a vote towards Page B on behalf of Page A.




Read more ...

GOOGLE Algorithm Is Key

Saturday, October 16, 2010

Google has a comprehensive and highly developed technology, a straightforward interface and a wide-ranging array of search tools which enable the users to easily access a variety of information online.

Google users can browse the web and find information in various languages, retrieve maps, stock quotes and read news, search for a long lost friend using the phonebook listings available on Google for all of US cities and basically surf the 3 billion odd web pages on the internet! Google boasts of having world’s largest archive of Usenet messages, dating all the way back to 1981. Google’s technology can be accessed from any conventional desktop PC as well as from various wireless platforms such as WAP and i-mode phones, handheld devices and other such Internet equipped gadgets.

Read more ...

Ultimate Benefits of SEO

Saturday, October 16, 2010

Selecting search engine optimization for promoting your business on the internet, one must knows the ultimate benefits of seo campaign.

Perspective (Global / Regional)
Selecting your keywords or phrases to target your audience, search engine optimization ensures that you and your company are found globally or regionally by those who require exactly what you offer. SEO has many benefits for any organization which wants to reach all potential customers locally or globally. You can reach the targeted customers of your own choice.

Targeted Traffic
Search engine optimization campaign can increase the number of visitors for your website for the targeted keyword(s) or phrase. Converting those visitors into potential customers is one of the arts of search engine optimization. Search engine optimization is the only campaign which can derive targeted traffic through your website. Essentially more targeted traffic equal more sales.

Increase Visibility
Once a website has been optimized, it will increase the visibility of your website in search engines. More people will visit your website and it will give international recognition to your products/services.

High ROI (Return on Investment)
An effective SEO campaign can bring a higher return on your investment than any other type of marketing for your company. This will therefore increase your volume of sales and profit overall.

Long term positioning
Once a website obtains position through a SEO campaign, it should stay there for long term as opposed to PPC (Pay Per Click). SEO is a cheaper and long term solution than any other search engine marketing strategy.

Cost-effective
One of the great benefits of search engine optimization is that it is cost effective and requires the minimum amount of capital for the maximum exposure of your website.

Flexibility
It is possible to reach an audience of your own choice through a seo campaign. You can get traffic according to the organizational strategy to meet the needs and requirements of your choice.

Measurable results
It is a unique quality of seo campaigns that you can quantify the results of seo by positioning reports of search engines, visitor conversion and the other factors of this nature.

Read more ...

Types of Search Engines

Friday, October 15, 2010
Three Types of Search Engines

The term "search engine" is often used generically to describe crawler-based search engines, human-powered directories, and hybrid search engines. These types of search engines gather their listings in different ways, through crawler-based searches, human-powered directories, and hybrid searches.

Crawler-based search engines

Crawler-based search engines, such as Google (http://www.google.com), create their listings automatically. They "crawl" or "spider" the web, then people search through what they have found. If web pages are changed, crawler-based search engines eventually find these changes, and that can affect how those pages are listed. Page titles, body copy and other elements all play a role.
The life span of a typical web query normally lasts less than half a second, yet involves a number of different steps that must be completed before results can be delivered to a person seeking information. The following graphic (Figure 1) illustrates this life span (from http://www.google.com/corporate/tech.html):
1. The web server sends the query to the index servers. The content inside the index servers is similar to the index in the back of a book - it tells which pages contain the words that match the query.
2. The query travels to the doc servers, which actually retrieve the stored documents. Snippets are generated to describe each search result.

3. The search results are returned to the user in a fraction of a second.

Human-powered directories

A human-powered directory, such as the Open Directory Project (http://www.dmoz.org/about.html) depends on humans for its listings. (Yahoo!, which used to be a directory, now gets its information from the use of crawlers.) A directory gets its information from submissions, which include a short description to the directory for the entire site, or from editors who write one for sites they review. A search looks for matches only in the descriptions submitted. Changing web pages, therefore, has no effect on how they are listed. Techniques that are useful for improving a listing with a search engine have nothing to do with improving a listing in a directory. The only exception is that a good site, with good content, might be more likely to get reviewed for free than a poor site.

Hybrid search engines

Today, it is extremely common for crawler-type and human-powered results to be combined when conducting a search. Usually, a hybrid search engine will favor one type of listings over another. For example, MSN Search (http://www.imagine-msn.com/search/tour/moreprecise.aspx) is more likely to present human-powered listings from LookSmart (http://search.looksmart.com/). However, it also presents crawler-based results, especially for more obscure queries.

Read more ...

How Google Works??

Friday, October 15, 2010

The technology behind Google's great results

As a Google user, you're familiar with the speed and accuracy of a Google search. How exactly does Google manage to find the right results for every query as quickly as it does? The heart of Google's search technology is PigeonRank™, a system for ranking web pages developed by Google founders Larry Page and Sergey Brin at Stanford University.

Building upon the breakthrough work of B. F. Skinner, Page and Brin reasoned that low cost pigeon clusters (PCs) could be used to compute the relative value of web pages faster than human editors or machine-based algorithms. And while Google has dozens of engineers working to improve every aspect of our service on a daily basis, PigeonRank continues to provide the basis for all of our web search tools.

Why Google's patented PigeonRank™ works so well

PigeonRank's success relies primarily on the superior trainability of the domestic pigeon (Columba livia) and its unique capacity to recognize objects regardless of spatial orientation. The common gray pigeon can easily distinguish among items displaying only the minutest differences, an ability that enables it to select relevant web sites from among thousands of similar pages.

By collecting flocks of pigeons in dense clusters, Google is able to process search queries at speeds superior to traditional search engines, which typically rely on birds of prey, brooding hens or slow-moving waterfowl to do their relevance rankings.

When a search query is submitted to Google, it is routed to a data coop where monitors flash result pages at blazing speeds. When a relevant result is observed by one of the pigeons in the cluster, it strikes a rubber-coated steel bar with its beak, which assigns the page a PigeonRank value of one. For each peck, the PigeonRank increases. Those pages receiving the most pecks, are returned at the top of the user's results page with the other results displayed in pecking order.

Integrity

Google's pigeon-driven methods make tampering with our results extremely difficult. While some unscrupulous websites have tried to boost their ranking by including images on their pages of bread crumbs, bird seed and parrots posing seductively in resplendent plumage, Google's PigeonRank technology cannot be deceived by these techniques. A Google search is an easy, honest and objective way to find high-quality websites with information relevant to your search.

PigeonRank Frequently Asked Questions

How was PigeonRank developed?

The ease of training pigeons was documented early in the annals of science and fully explored by noted psychologist B.F. Skinner, who demonstrated that with only minor incentives, pigeons could be trained to execute complex tasks such as playing ping pong, piloting bombs orrevising the Abatements, Credits and Refunds section of the national tax code.

Brin and Page were the first to recognize that this adaptability could be harnessed through massively parallel pecking to solve complex problems, such as ordering large datasets or ordering pizza for large groups of engineers. Page and Brin experimented with numerous avian motivators before settling on a combination of linseed and flax (lin/ax) that not only offered superior performance, but could be gathered at no cost from nearby open space preserves. This open space lin/ax powers Google's operations to this day, and a visit to the data coop reveals pigeons happily pecking away at lin/ax kernels and seeds.

What are the challenges of operating so many pigeon clusters (PCs)?

Pigeons naturally operate in dense populations, as anyone holding a pack of peanuts in an urban plaza is aware. This compactability enables Google to pack enormous numbers of processors into small spaces, with rack after rack stacked up in our data coops. While this is optimal from the standpoint of space conservation and pigeon contentment, it does create issues during molting season, when large fans must be brought in to blow feathers out of the data coop. Removal of other pigeon byproducts was a greater challenge, until Page and Brin developed groundbreaking technology for converting poop to pixels, the tiny dots that make up a monitor's display. The clean white background of Google's home page is powered by this renewable process.

Aren't pigeons really stupid? How do they do this?

While no pigeon has actually been confirmed for a seat on the Supreme Court, pigeons are surprisingly adept at making instant judgments when confronted with difficult choices. This makes them suitable for any job requiring accurate and authoritative decision-making under pressure. Among the positions in which pigeons have served capably are replacement air traffic controllers, butterfly ballot counters and pro football referees during the "no-instant replay" years.

Where does Google get its pigeons? Some special breeding lab?

Google uses only low-cost, off-the-street pigeons for its clusters. Gathered from city parks and plazas by Google's pack of more than 50 Phds (Pigeon-harvesting dogs), the pigeons are given a quick orientation on web site relevance and assigned to an appropriate data coop.

Isn't it cruel to keep pigeons penned up in tiny data coops?

Google exceeds all international standards for the ethical treatment of its pigeon personnel. Not only are they given free range of the coop and its window ledges, special break rooms have been set up for their convenience. These rooms are stocked with an assortment of delectable seeds and grains and feature the finest in European statuary for roosting.

What's the future of pigeon computing?

Google continues to explore new applications for PigeonRank and affiliated technologies. One of the most promising projects in development involves harnessing millions of pigeons worldwide to work on complex scientific challenges. For the latest developments on Google's distributed cooing initiative, please consider signing up for our Google Friends newsletter.
Read more ...

51 SEO Tips Just a Starter Kit

Friday, October 15, 2010

1. Place your keyword phrases at title tag. Title must be less than 65 characters.

2. Place the most important keyword phrase close the beginning of the page title.

3. Put some main keyword in keyword meta tag.

4. Write a good description for meta description tag, description must be unique to each page.

5. Keep Meta description short and meaningful write only 1 or 2 sentences in description.

6. Target the most important competitive keyword phrase at the home page.

7. Target one or two keywords phrases per page.

8. Use only one H1 header per page.

9. Place the most important keyword in H1 tag.

10. Use H2 and H3 for sub headers where required.

11. Use Bold / Italic / Underline for your keyword phrase for extra weight in contents.

12. Use bulleted lists to make content easier to read.

13. Use ALT tag for images, so that crawler can know about images.

14. Don’t use flash on your website because crawler can’t read flash.

15. Try to keep easier navigation of your website.

16. Use text based navigation.

17. Use CSS to creating navigation menu instead of JavaScript.

18. Use keyword phrases in file name, you can use hyphens (-) to separate the word in file names.

19. Create a valid robot.txt file.

20. Create a HTML site map for crawler and user.

21. Create a XML site map for Google crawler.

22. Add text links to others page in the footer of site.

23. Use keyword phrase in Anchor text.

24. Link the entire pages to each others.

25. Use keyword rich breadcrumb navigation to help search engines understand the structure of your site.

26. Add a feedback form and place a link to this form from all the pages.

27. Add bookmark button.

28. Add a subscription form at every page for increasing your mailing list.

29. Add RSS feed button so that user can subscribe easily.

30. Add Social Media sharing button.

31. Use images on every page but don’t forget to use ALT tag.

32. Use videos on your site which is related to your niche.

33. Write informative, fresh, unique, useful content on your site.

34. Write site content in between 300 to 500 words.

35. Keywords % must be 3 to 5%.

36. Don’t copy any content from other websites, fresh and unique content is the key of your success.

37. Add deep link to related articles.

38. Regular update your website with fresh content.

39. Use CSS to improve the look of your website.

40. Write your content for human not for robot.

41. Buy Country Level Domain name if your website is targeting local users.

42. Use a good keyword suggestion tools to finding good keywords phrases for your website.

43. Use 301 redirection to redirect http://www.domainname.com to http://domainname.com.

44. Try to buy local hosting server for your website if your site is targeting local people.

45. Use keywords rich URL instead of dynamic URL.

46. Break your article in paragraph if your article is long.

47. Add full contact address on contact page with direction map.

48. Validate XHTML and CSS at http://validator.w3.org/.

49. Don’t use hidden text or hidden links.

50. Avoid graphic links, because text in the image can not be indexed by the spiders.

51. Don’t create multi page for with same contents.

Read more ...