10 April, 1998

Dear Dr. English:

I am writing because of my concern about the critical omission of an important collection of Canadian research data from the current institutional arrangements for preserving Canada's heritage. As we approach the year 2000, a tremendous amount of media coverage has focused on the so-called Millenium bug, a phenomenon self-inflicted by computer memory-saving programming practices. However, very little attention has been given to the large volume of research data in Canada that are in grave danger -- not because of a 'bug' -- but because of the failure to provide the necessary stewardship for these data. Canada simply has not provided the required infrastructure to ensure the long-term preservation and care of valuable research data. As a consequence, data that uniquely describe aspects of Canadian society and the environment in which we live are being lost or are at risk of being lost.

In a submission to the Social Sciences and Humanities Research Council of Canada in June 1997, the Canadian Association of Public Data Users (CAPDU) expressed its concern about the general failure in Canada to preserve publicly funded research data. In this submission, CAPDU urged the Council to adopt as a strategic direction the advancement of a National Social Science Data Archive. Five primary reasons were presented in defense of this position.

  1. Among the world's leading information-based economies, Canada is negligently behind in providing the infrastructure to preserve its digital research heritage. In Europe, national social science data archives have been in existence for up to thirty years.(see footnote 1) For example, the United Kingdom, Norway, the Netherlands, Denmark, Sweden, Germany, France, and Australia all have well-established data archives. The United States has several prominent data archives, although the Archive of the Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan functions unofficially as the States' national data archive.
  2. The emergence of a global economy has heightened the value of comparative research among nations. A Canadian national social science data archive would play a meaningful role in housing Canadian data that enables cross-cultural comparisons and data exchanges with other countries. I have argued elsewhere that the absence of a Canadian national data archive hindered Canada's participation in the Human Dimensions component of the International Global Change Program. (see footnote 2) Without an institution responsible for the systematic collection of national data resources, Canadian researchers are at a disadvantage when participating in some of the new international research projects. The Metropolis project, which is investigating the social phenomenon of immigration and integration across nations, is another example of this type of global research requiring national data resources to permit international comparisons. (see footnote 3)
  3. For future generations of Canadians, research data are an important element of our nation's collective memory and provide another resource to help understand who we are as a society. The recent formation of the South African Data Archive (SADA) by the post-apartheid government in that country attests to the importance of research data in comprehending the identity of a society. A recent conference announcement from the SADA states, "The new democratic South Africa is faced with a myriad of challenges as it develops and reconstructs its society and economy. Data play a major role in monitoring the country's progress in social transition." Clearly, the perception is that research data will contribute to the assessment of the society that they will become.
  4. As the Information Age matures, a movement is afoot in Canada to engage citizens throughout Canada in policy formation. One aspect of this movement is the creation of a 'data culture' to enrich policy analysis. In his article, "Phase Two of the Data Liberation Initiative: Extending the Data Culture," Dr. Paul Bernard describes this model of participatory democracy. (see footnote 4) A major premise of the data culture movement is readily accessible data available to researchers, policy analysts, decision-makers, journalists, and the public. Without an institution such as a national data archive that preserves and provides access to Canadian data, building a society that is actively involved in policy analysis will be greatly hampered.
  5. Finally, these research data have been the raw material from which a segment of Canadian scholarship has been founded. A basic tenant of science is replication and without the preservation of research data, this principle is undermined. This is particularly true for data that cannot be collected again, such as cross-sectional surveys of Canadian society at a specific time or climatic data for a certain period.
Why are Canadian research data at great risk? The urgency of this problem arises from several factors, including the volatility of digital media, the rapid pace by which computing technology changes, and the indispensable dependency of research data on supplementary documentation.

* Our cultural heritage to a certain extent is held captive on some form of perishable medium, but the shelf-life of the media used to store research data has until recently been among the shortest and poses a preservation problem greater than acidity in paper. For example, a large volume of the data gathered in the 1970's and 1980's were stored on magnetic tapes that had a typical life expectancy of five years.

* Changes in computing move at a pace measurable in dog years and as a result, research data can quickly become dead-ended and unretrievable in outmoded technology. This factor is complicated by obsolescence both in hardware (for example the demise of the eighty-column punched card reader, seven and nine-track tape drives, 5.25 inch floppy disk drives, etc.) and software (for example, system upgrades that are incompatible with earlier versions or software systems that are not converted to operate on replacement technology).

* Research data require extensively detailed documentation to be understood. Without accompanying documentation, research data are less intelligible than hieroglyphics were without the Rosetta stone. Research computer files are by design constructed to be machine-readable for data processing. As a result, they tend not to be directly readable by humans. Rather, documentation is necessary to decipher the machine design into something more easily grasped by humans. Unfortunately, data documentation is an often-neglected practice and a major task in preserving data is the preparation of accompanying documentation.

A comprehensive report of the risks facing research data is available in a discussion paper published in 1996 by the Data and Information Systems Panel of the Canadian Global Change Program entitled, Data Policy and Barriers to Data Access in Canada: Issues for Global Change Research. (see footnote 5)

The value of Canadian research data lies in its scientific worth, in the public investment expended on its creation, and in its important contribution to the overall record of our society. The problem we face in Canada is that without the intentional collection, systematic preservation, intellectual organization, and purposeful management of our research data, we will lose this part of our heritage.

  1. For further information, see the Internet Web homepage for the Council of European Social Science Data Archives (CESSDA)
  2. See "A Case for a Canadian National Social Science Data Archive", published in the electronic journal, Government Information in Canada.
  3. More information about the Metropolis project is available on the Internet at http://international.metropolis.globalx.net/
  4. This article appears in the electronic journal, Government Information in Canada, and is available on the Internet
  5. Copies of this report can be obtained from The Canadian Global Change Program, c/o The Royal Society of Canada, 225 Metcalfe #308, Ottawa, Ontario K2P 1P9. This report is also available on the Internet in French or English