10 April, 1998
Dear Dr. English:
I am writing because of my concern about the critical
omission of an important collection of Canadian research
data from the current institutional arrangements for
preserving Canada's heritage. As we approach the year
2000, a tremendous amount of media coverage has focused
on the so-called Millenium bug, a phenomenon self-inflicted
by computer memory-saving programming practices. However,
very little attention has been given to the large volume
of research data in Canada that are in grave danger -- not
because of a 'bug' -- but because of the failure to provide
the necessary stewardship for these data. Canada simply
has not provided the required infrastructure to ensure the
long-term preservation and care of valuable research data.
As a consequence, data that uniquely describe aspects of
Canadian society and the environment in which we live are
being lost or are at risk of being lost.
In a submission to the Social Sciences and Humanities
Research Council of Canada in June 1997, the Canadian
Association of Public Data Users (CAPDU) expressed its
concern about the general failure in Canada to preserve
publicly funded research data. In this submission, CAPDU
urged the Council to adopt as a strategic direction the
advancement of a National Social Science Data Archive.
Five primary reasons were presented in defense of this
position.
-
Among the world's leading information-based economies,
Canada is negligently behind in providing the
infrastructure to preserve its digital research
heritage. In Europe, national social science data
archives have been in existence for up to thirty
years.(see footnote 1)
For example, the United Kingdom, Norway, the
Netherlands, Denmark, Sweden, Germany, France, and
Australia all have well-established data archives.
The United States has several prominent data archives,
although the Archive of the Inter-university
Consortium for Political and Social Research (ICPSR)
at the University of Michigan functions unofficially
as the States' national data archive.
- The emergence of a global economy has heightened the
value of comparative research among nations. A
Canadian national social science data archive would
play a meaningful role in housing Canadian data that
enables cross-cultural comparisons and data exchanges
with other countries. I have argued elsewhere that
the absence of a Canadian national data archive
hindered Canada's participation in the Human
Dimensions component of the International Global
Change Program.
(see footnote 2)
Without an institution responsible
for the systematic collection of national data
resources, Canadian researchers are at a disadvantage
when participating in some of the new international
research projects. The Metropolis project, which is
investigating the social phenomenon of immigration and
integration across nations, is another example of this
type of global research requiring national data
resources to permit international comparisons.
(see footnote 3)
- For future generations of Canadians, research data are
an important element of our nation's collective memory
and provide another resource to help understand who we
are as a society. The recent formation of the South
African Data Archive (SADA) by the post-apartheid
government in that country attests to the importance
of research data in comprehending the identity of a
society. A recent conference announcement from the
SADA states, "The new democratic South Africa is faced
with a myriad of challenges as it develops and
reconstructs its society and economy. Data play a
major role in monitoring the country's progress in
social transition." Clearly, the perception is that
research data will contribute to the assessment of the
society that they will become.
- As the Information Age matures, a movement is afoot in
Canada to engage citizens throughout Canada in policy
formation. One aspect of this movement is the
creation of a 'data culture' to enrich policy
analysis. In his article, "Phase Two of the Data
Liberation Initiative: Extending the Data Culture,"
Dr. Paul Bernard describes this model of participatory
democracy.
(see footnote 4)
A major premise of the data culture
movement is readily accessible data available to
researchers, policy analysts, decision-makers,
journalists, and the public. Without an institution
such as a national data archive that preserves and
provides access to Canadian data, building a society
that is actively involved in policy analysis will be
greatly hampered.
- Finally, these research data have been the raw
material from which a segment of Canadian scholarship
has been founded. A basic tenant of science is
replication and without the preservation of research
data, this principle is undermined. This is
particularly true for data that cannot be collected
again, such as cross-sectional surveys of Canadian
society at a specific time or climatic data for a
certain period.
Why are Canadian research data at great risk? The urgency
of this problem arises from several factors, including the
volatility of digital media, the rapid pace by which
computing technology changes, and the indispensable
dependency of research data on supplementary documentation.
* Our cultural heritage to a certain extent is held
captive on some form of perishable medium, but the
shelf-life of the media used to store research data
has until recently been among the shortest and poses
a preservation problem greater than acidity in paper.
For example, a large volume of the data gathered in
the 1970's and 1980's were stored on magnetic tapes
that had a typical life expectancy of five years.
* Changes in computing move at a pace measurable in
dog years and as a result, research data can quickly
become dead-ended and unretrievable in outmoded technology.
This factor is complicated by obsolescence both in
hardware (for example the demise of the eighty-column
punched card reader, seven and nine-track tape drives,
5.25 inch floppy disk drives, etc.) and software (for
example, system upgrades that are incompatible with
earlier versions or software systems that are not
converted to operate on replacement technology).
* Research data require extensively detailed
documentation to be understood. Without accompanying
documentation, research data are less intelligible
than hieroglyphics were without the Rosetta stone.
Research computer files are by design constructed
to be machine-readable for data processing. As a
result, they tend not to be directly readable by
humans. Rather, documentation is necessary to
decipher the machine design into something more
easily grasped by humans. Unfortunately, data
documentation is an often-neglected practice and a
major task in preserving data is the preparation of
accompanying documentation.
A comprehensive report of the risks facing research data is
available in a discussion paper published in 1996 by the
Data and Information Systems Panel of the Canadian Global
Change Program entitled, Data Policy and Barriers to Data
Access in Canada: Issues for Global Change Research.
(see footnote 5)
The value of Canadian research data lies in its scientific
worth, in the public investment expended on its creation,
and in its important contribution to the overall record of
our society. The problem we face in Canada is that without
the intentional collection, systematic preservation,
intellectual organization, and purposeful management of our
research data, we will lose this part of our heritage.
-
For further information, see the Internet Web homepage for
the Council of
European Social Science Data Archives (CESSDA)
-
See
"A Case for a Canadian National Social Science Data
Archive", published in the electronic journal, Government
Information in Canada.
-
More information about the Metropolis project is available
on the Internet at
http://international.metropolis.globalx.net/
-
This article appears in the electronic journal, Government
Information in Canada, and is available on the Internet
- Copies of this report can be obtained from The Canadian
Global Change Program, c/o The Royal Society of Canada,
225 Metcalfe #308, Ottawa, Ontario K2P 1P9. This report
is also available on the Internet in
French
or
English