Unlocking the grid

e-Science

Scotland's capital has recently hosted and is gearing up for summer schools in e-Science and Neuroinformatics, thanks to the good offices of the . Established in 2001 with 5.5m from the DTI and private business, the NeSC is jointly administered by the University of Edinburgh and the University of Glasgow: its stated mission is to promote and support e-Science in Scotland and the UK. Its "good offices" are situated in central Edinburgh, but the two dozen senior staff are drawn from world-class departments in both cities.

The NeSC defines e-Science as "large scale science that will increasingly be carried out through distributed global collaborations enabled by the Internet. Typically, a feature of such collaborative scientific enterprises is that they will require access to very large data collections, very large scale computing resources and high performance visualisation". It's basically a grown-up version of the using screen-savers on home PCs, except e-Science is out to solve much more complex and interesting problems. Current projects include the genomics of heart defects, animated computer graphics, search engines for large databases (including the old Internet), and the ultimate challenge of working out how to make buses run on time.

The Grid is a global computing infrastructure which will support e-Science. It's not the only one: the Americans have a bigger one of course, and there are others around the world. All of them are very different from the slow and leaky Internet we all know and tolerate. The Grid needs to be secure: e-Scientists don't want their results stolen by eavesdroppers. It also needs to be free of the spam, advertising and flashy graphics which slow down Internet traffic: you don't want to be gridlocked when you're busy revolutionising science.

Mountains don't move not even virtual ones

Although some e-Science is possible with the bad old Internet, most serious computing projects require a more flexible and organised environment. For a start, they need mountains of data. As an example, consider a proposed project to analyse aircraft engine faults: it could prevent crashes and optimise maintenance schedules. If you estimate that there are 100,000 engines flying five commercial journeys a day, and each engine could provide around 3 gigabytes of data per flight, that's 1,500,000 gigabytes (1.5 ) a day. To transfer that amount of data between two computers would take years with today's Internet. Even if transfer speeds increase dramatically (and they will, but that's another story) they will still be too slow and data mountains are growing all the time.

If the data can't move, the analysis must. Complex programs must be executed on remote machines, wherever the data lives. This causes serious problems for the old Internet, because programs currently executed on remote machines are usually known as viruses. Today's Internet can't cope with remote execution: it's only designed for transferring data, so e-Science needs a new Internet with new protocols. It also needs a way of ensuring that programs are not viruses, and the only obvious way to do that is to restrict execution to a set of tried and tested programs. The NeSC, along with the and others, is working on these.

So the Grid requires new protocols and a complete set of bona fide programs. What else? Well, there's the question of data integration. To continue our example, different engines may provide data in different formats. For e-Science to work effectively, it must know how to translate data into a common format. There are already tools to do this: the best known is , the godfather of mark-up languages, of which HTML (the language of Internet browsers) is a poor relation. Applying XML to the types of data required by e-Scientists is a big project, but the NeSC is working on that too. Once all these problems are solved, everyone will be able to use the Grid as long as their project is approved, and they can find the money and expertise to make it happen.

Connecting to the Grid

Ultimately, the Grid should be accessible to commercial and domestic users. But that's a long way off. For the foreseeable future (about three years in Internet matters) the Grid will be the preserve of large corporations and national research projects. Over 300 organisations worldwide are already interested in exploiting this resource. NeSC's pilot projects involve several universities together with IBM, Oracle, Sun Microsystems, Hewlett-Packard and Microsoft. If you want to play with the big boys and girls, you can submit a proposal to : at the moment they're looking for solutions to the problems of remote execution and data integration, and pilot e-Science projects to test those solutions. Or you could develop your own Grid, if you have a few gigadollars to spare. Otherwise you'll just have to watch and wait.

Portals and Portlets 2003

Neuroinformatics Summer School

National eScience Centre

World-wide Prime Numbers Search

Genomics

What is a Petabyte?

Global Grid Forum

XML

Grid Net funding

Last updated 30 Nov 1999