RessourcesMoteurRechercheWeb – Etudes_
Trois chercheurs viennent de publier dans l’excellent revue en ligne First Monday des résultats qui devraient nous inquiéter. Ils démontrent en effet que les moteurs de recherche qui proposent des classements (Google, Altavista) par date de document ne prennent pas en compte la date de création du document mais la plupart du temps la dernière date d’indexation. Cela est évidemment problématique dans la citation de documents, notamment dans les sciences humaines.
Voici quelques éléments de leur conclusion:
"We have shown that the search engines AltaVista and Google systematically relocate the time stamp of Web documents in their databases from the more distant past into the present and the very recent past. Second, they also delete documents from the year they were initially assigned to. This leads to the loss of information in the historical record on the Web as represented in the search engine databases. Third, information also gets lost in the sense of loss of structure in the semantic networks.
This has major consequences for the use of search engines in social science research. In short, search engines are unreliable tools for data collection for research that aims to reconstruct the historical record or for research that aims to analyze the structure of information at a particular moment in history. Only those Web pages that contain the date of the publishing document in question (for example, in various Web archives and citation index databases), can be used for this purpose (Hellsten, 2003). This unreliability is not caused by sudden instabilities of search engines, but precisely by their operational stability in systematically updating the Internet. For many types of social science research, it is therefore necessary to build tailor made archiving tools that are not based on the available commercial search engines."