Assembling the Digital Sky
By David Essex
November 22, 2002
The Sloan Digital Sky Survey telescope in Sunspot, NM. (photo courtesy of Fermilab Visual Media
Services) U.S. astronomers are gathering terabytes of data into a worldwide “virtual observatory” that will
be accessible to scientists and laymen alike.
Scientists in the United States, armed with a $10 million grant from the National Science Foundation, are
building a National Virtual Observatory (NVO) that will make the world’s huge store of astronomical data
available to anyone with a Web browser.
“History has shown us that the greatest leaps forward have occurred not when you observe the universe
through just one window, but when you compare the views of the universe obtained through different
windows,” says Ray Norris, deputy director of the Australia Telescope National Facility in Epping, New
South Wales, Australia. “The NVO will enable any astronomer to do this easily, combining all available
data on one object or one region of the sky, or perhaps even using data-mining techniques to look for
subtle correlations between the properties of a class of objects when viewed through different windows.”
The hope is to dramatically advance this computational approach to astronomy. “I can imagine entire
research projects being done from NVO data,” says Bob Hanisch, the NVO project manager and an
astronomer at the Space Telescope Science Institute in Baltimore.
The inspiration for the NVO is the Sloan Digital Sky Survey, an electronic catalog of images in multiple
wavelengths spanning half the northern sky—100 million celestial objects in all, encoded in four databases
and viewable from a Web portal. The NVO will take the Sloan survey and combine it with other, smaller
U.S. and international surveys, including some maintained by the United Kingdom, Australia, India, and
As the virtual in NVO suggests, the project is more about computing than the optical telescope images
and gamma ray, infrared, radio, ultraviolet, and X-ray snapshots of the heavens collected in the surveys.
The main hardware platform will be the emerging “grids” that federate research centers’ supercomputers,
servers, and high-speed networks into single, powerful computing resources. The NVO will both depend on
grid computing and demonstrate its usefulness, astronomy being an uncommonly good test case, say
NVO advocates, because of its large yet manageable universe of free, publicly available data.
Building out grids is more the task of participating grid-computing hotbeds such as the San Diego
Supercomputer Center (SDSC). For their part, NVO architects will instead tackle other challenges on the
bleeding edge of computing, most of which involve managing large distributed databases. The trick is to
make a collection of fundamentally different databases (some in Oracle, others in SQL Server, for
example) work uniformly with the software that displays and analyzes the information. The databases
themselves will usually remain in separate locations to avoiding clogging network bandwidth, but
performance will still be an issue, especially when researchers want to run complex queries. In response,
Hanisch says, NVO data centers plan to offer additional services that take over such jobs from remote
Other database-intensive disciplines, such as bioinformatics, astrophysics, and the earth sciences, stand
to gain from potential advances in grid computing and database technology. Bioinformatics is eyeing the
NVO for new approaches to storing and exchanging multi- gigabyte maps of the human genome. Earth
scientists are also involved in the NVO research effort because, Hanisch says, like astronomers, they
work by comparing data from different instruments.
The initial NSF-funded work focuses on data interoperability, a key component of which is VOTable 1.0, a
data-exchange standard released on April 15 that uses the Extensible Markup Language (XML) to
represent large datasets. “We are putting VOTable into practical, everyday use now,” Hanisch says. Next
on tap: the Simple Image Access Prototype specification, an image-handling complement to VOTable now
under discussion with international partners. In addition, Hanisch expects within a year or two to see Web
services directories that will make it easier to deliver and search through newly published data.
Metadata (data about data) and Semantic Web technology are two other elements the NVO team has
deemed essential in its ambitious effort to federate the data of an entire scientific discipline. “The rate at
which services are being defined is limited by how fast the community can reach consensus on difficult
semantic and knowledge-management issues,” says Reagan Moore, an associate director at SDSC.
“Given the need for a consensus across multiple groups, the services that are being implemented are very
impressive.” One promising example: researchers at the University of Strasbourg in France created Unified
Column Descriptors (UCDs)--standard names for the columns in astronomical tables--that Alex Szalay of
Baltimore’s Johns Hopkins University, one of the NVO’s two principle investigators, has semantically
mapped to 1,300 Sloan items.
With so many sites providing content, the NVO will also need a way to indicate how reliable its data is,
cautions Michael Skrutskie, principle investigator of the Two Micron Sky Survey and a professor of
astronomy at the University of Virginia in Charlottesville. “People will need to know how much trust they
can put into those data points.” Skrutskie suggests the issue might be solved with a labeling system, and
Hanisch says a peer-review process for one is in the works.
Proponents say the NVO could be up in two to three years, especially if there’s money for the operational
phase. They plan to demonstrate real-time analysis of clustered galaxies at a January 2003 meeting of the
American Astronomical Society in Seattle. The first showing of interoperability among international VOs
should be ready for the July 2003 general assembly of the International Astronomical Union in Sydney,
Australia. “I think by the end of two years, we’ll have interoperable data centers and a bunch of toolkits,”
predicts Jim Annis, an astrophysicist at the Fermi National Accelerator Laboratory in Batavia, IL. Longer
term, the project could still falter if the NVO’s middleware standards make it too expensive for institutions
to prepare their survey data, the fault that doomed the pre-Web Astrophysics Data System, in Hanisch’s
Regardless of how it gets assembled, astronomers seem excited about the NVO’s potential as a research
tool, sometimes referring to it as an instrument on a par with the telescope. “The important part of it is just
being able to do searches and queries and being able to get all that information on one object,” says Dave
Turnshek, a professor in the astrophysics and astronomy department at the University of Pittsburgh, one of
17 research centers sharing the NSF grant. Turnshek’s school paid to get access to the Sloan survey, and
he uses it heavily for his research in quasar and galaxy formation. “The exciting thing about the NVO is,
eventually everybody will be able to do that,” he says.
Adds Hanisch: “My wildest dreams of success are that the VO stuff becomes just part of doing
astronomy. It will be just like going to Google.”
David Essex is a technology writer based in New Hampshire.
* Origin: [adminz] tech, security, support (192.168.0.2)
generated by msg2page 0.06 on Jul 21, 2006 at 19:04:33