C++ for science -- DAIMS overview

Wed Aug 31 04:30:09 AEST 1988

>[ So why don't you Scientists all use C++? ]

As an enthusiastic bystander, I promised a summary of the Oceanography
project.  Almost as an afterthought, I asked Bruce Eckel for a summary
of the project.  I got back a summary from Bruce and a second summary
from Tom Keffer.  Thanks, guys!  I've gotten a zillion e-mail requests,
so I'm posting this to comp.lang.c and comp.lang.c++, since it seems
to be of pretty general interest.  Here goes:
-----------------------------------------------------------------------
[ Bruce: ]

	The DAIMS acronym stands for Data Analysis and Interactive
Management System.  We are developing tools to manipulate and analyze
large data sets (e.g. Ocean measurements, sattelite data).  The tools
will include:

	1) A Data-Storage Standard
	2) Graphics for easy display of data, mapping, etc.
	3) A simple interpretive programming language to allow easy
	   access for non-programmer types.  The language will be 
	   extensible without too much trouble for programmer types.
	4) As many data-related classes as we can create (i.e.
	   matrices, vectors, oceans, etc.)
	5) Interfaces to existing Fortran libraries
	6) A model of the ocean programmed with C++

	We won't accomplish all this; thus we are trying to at least
establish a framework where we can't build the thing.

	We chose C++ because it is supposed to generate maintainable
and extensible code, and make programmers more efficient.  The latter
is only true, it seems, if the programmer already understands the
language and/or OO programming.  The learning curve has slowed us down
considerably. 

	This is more of an experiment and an example.  We can't really
hope to convert all the scientific Fortran programmers out there to
C++.  Their productivity would immediately go to Zero for the amount
of time it takes to learn the language.  (Using someone else's
classes, however, is remarkably easy -- this might be the saving
grace). 

	At the present, our code is designed to be freely distributed.
Assuming the university lawyers don't complain.  My understanding is
that this project is intended to create PD code.  You can download
stuff from sperm.ocean.washington.edu via anonymous ftp; it is by no
means a finished package but there are numerous useful C++ examples
and classes.

[ Note: get and read the READ_ME.  For the other files, be sure to put
  ftp in `binary' mode, as the files are compressed ]

	We would *love* to have contributions for the project; we are
actively seeking other groups/individuals to contribute code.  Since
there are really only two of us working on it (and we could probably
keep six or eight busy on a project this size) any contributions are
extremely welcome.
------------------------------------------------------------------------
[ Tom: ]

That seems like a reasonable summary.  

Another thing worth adding is that the framework/architecture we
develop should be useful for a variety of sciences, not just
oceanography.  For example, we have matrix classes that know how to
invert themselves --- that's something useful in a zillion different
fields.  

Pardo asks what packages we are building.  That would most easily be
answered in person.  There's a zillion useful tasks I can think of
(example: a C++ LINPACK interface; another example: a highlevel C++
graphics interface to InterViews / X Windows to do things such as draw
axes, arbitrary projections & views of data, etc. )

Finally, here is an abstract I prepared for the INO summer colloquium:

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
			A B S T R A C T

	The Data Analysis and Interactive Modeling System (DAIMS) is a
project led by Thomas Keffer (Univ. of Washington) and Dale Haidvogel
(Chesapeake Bay Institute) under the sponsorship of the Institute for
Naval Oceanography to develop a system of software with five main
goals:

	* to act as the user interface for an operational ocean
	forecasting system;

	* to serve as a productivity booster for the development of
	new models;

	* to verify the skill level of these models;

	* to assimilate, archive, and statistically analyze real-time
	data to be used with these models; and

	* to act as an educational tool for exploring models and large
	data sets.

	Our approach has been two fold.  First, to develop a
high-level interpreter to manipulate complex data structures and
perform a suite of analysis techniques on them.  Second, to develop
two highly interactive ocean general circulation models, a
quasi-geostrophic model and a primitive-equation model, that can be
easily modified and reconfigured, even at run-time.  The general goal
is to develop models that are robust and easy to reconfigure without
introducing errors.

	Operational constraints are system portability, efficiency,
extendability, and adherence to standards.  It is our intention to
keep the results of DAIMS in the public domain, distributing the
software through the Internet.  This requires using a minimum of
proprietary of software, eliminating licensing restrictions.  

	There have been two meetings of the DAIMS working group.  At
these meetings it was decided to adopt an object-oriented architecture
for the models and interpreter.  This is the only practical way of
managing a large project with a minimum of manpower, while ensuring
the goals of the project.  It is the goal of DAIMS to define this
object-oriented architecture in such a manner as to make the
construction of models and analysis software easy and dependable.

	The DAIMS project has chown C++ as its system programming
language.  This is an object-oriented language developed at AT&T that
runs on a wide variety of UNIX and MS-DOS machines.  It is a superset
of the popular "C" programming language.  Like other object-oriented
languages, it offers inheritance of objects, encapsulation of
information, and polymorphism (the functionality of a function call
depending on object type).  Other advantages are a systematic way of
organizing large programs, strong type-checking to reduce errors,
run-time efficiency, and easy interfacing with old C and FORTRAN code.
The intention is the develop the high-level abstract views of the
model using C++, but to call existing FORTRAN routines to do the
intricate numerics (e.g., elliptic equation solvers).

	We now have an interpretor up and running.  At this time, it
is capable of only a few operations on a few simple objects.  We will
be extending its repertoire of objects and methods in the future.

	While our eventual goal is to write a highly interactive
general circulation model, we have chosen to experiment with a simple
one-dimensional ocean spinup demonstration model that can be run
entirely on a workstation.  The intention was to explore the
object-oriented architecture and the user interface in an
easy-to-manage environment.  The model is a spectral model (using
Chebyshev polynomials) that solves for the depth of the thermocline in
zonal cross-section.  The type and shape of forcing, time-stepping,
boundary conditions, friction and other parameters can all be selected
at run-time, using a mouse.  It is designed to run on a Sun-3
workstation.  It makes a useful teaching aid for a course on planetary
waves and general circulation.  The model is available via anonymous
ftp from the host sperm.ocean.washington.edu (128.208.2.7).  A
technical report on the model is available directly from T. Keffer
(School of Oceanography; WB-10; Univ. of Washington; Seattle, WA
98195).

	Experience with this model has helped define the emerging
architecture of the system.  Different physical domains within the
system --- ocean turbulent boundary layers, atmospheres, basins, etc.
--- are incorporated as different "objects" that can be created,
manipulated, and displayed.  In turn, these objects incorporate other
objects that are more tuned to the numerics of the project --- grids
of temperature, u velocity, etc.  All objects include "state"
variables (e.g., temperature) and independent parameters (e.g.,
diffusivity).  Because of the object-oriented approach, each object is
responsible for its own privately controlled data.  Another object
learns about it only by interrogating it in a systematic way.  This
approach allows easy changes to the model, minimizing
"ripple-effects".  For example, an active turbulence closure model can
easily be substituted for parameterized boundary conditions.

------------------------------------------------------------------------

Bruce and Tom haven't given me permission to give out their names and
e-mail addresses, but I will anyway, so please be reasonable in what
you send them (reasonable enough that *I* don't get in trouble, anyway
:-).  Personally, I think this project is a really great thing and I'd
like to see a lot of people get involved.

    Bruce Eckel: eckel at sperm.ocean.washington.edu
    Tom Keffer: keffer at sperm.ocean.washington.edu

  ;-D on  ( Strange, I had that keyboard here just a minute ago... )  Pardo
-- 
		    pardo at cs.washington.edu
    {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo