15 September 2007

Searching non .edu web resources

The following are tips for searching academic web resources (web, ie, available to the www) at academic institutions with tld other than "edu" which is used widely and almost exclusively only in the US , which limits search scope.

Some web engines provide syntax to restrict a web search to a specific class of sites.
For instance, the query "quantum +site:.org" for the google search engine will restrict search results to only sites that have a domain address ending in .org. The + signs means that the site criterion is required. Without it, results matching the site: criterion will be given higher ranking in the results set, but the set may also include non-matching results.

Other search engines have different names for the site criterion.
yahoo
altavista "host:"
webcrawler
google "site:"

Restricting the search to just universities omits a lot of research centers that would have .org domains (TLD's). But it is a good first step in focusing a web search on academic resources.

The most famous academic top-level domain is .edu . Unfortunately most sites with domains ending in .edu are american universities. So a google search query like "your search terms +site:.edu" will limit the result set to only academic resources in the US.

To expand the result set to other parts of the world, we first identify the domain naming patterns corresponding to a given country's universities, then we use that in the site criterion.

For Belgium, Japan and the UK, academic website domains end in an ".ac" followed by the country top-level domain, "*.ac.be", "*.ac.jp" and "*.ac.uk" respectively.

A web search query to cover these three countries would include the criterion, "site:.ac.*"

In Germany, university usually have the domain name prefixed with "uni-*.de". So a query might look like "host:*.uni-*.de".

Other countries, like argentina, use the top-level "*.edu." where country-tld is the country top-level domain. To search results in all countries that use this convention, use "site:.edu.*" .

Generally, if curious about the information from a given country, run a search for its universities and see how the domain is formed, then search for that pattern (the top parts of the domain).

No comments: