LASC IDE Project

The LASC (Los Angeles Scientific Center) IDE (Integrated Development Environment) project was the focus of my work from 1978 to about 1984. The project defined a language which was intended for use at all stages of business system development, from specification to implementation, and combined equivalents of classic programming facilities with data base access.

In retrospect, this was a culmination of a sequence of projects I was involved in, beginning circa 1972. In the first few years of the sequence I learned about data base management systems, including the various ways of viewing and implementing the kinds of persistent data structures data in use at the time. Then I worked (primarily) with Bill Kent on designing a more convenient "data model" (i.e., a way of conceptualizing data). The work was intended for use in a "data dictionary", that is, for a repository for descriptions of structures in other data bases, because built-in provisions for such descriptions were often difficult to use. However, the primary results of our project were general ones not specific to data dictionaries, and the project was eventually cancelled in favor of an approach tailored specifically for describing IMS data bases. I then transferred to LASC, and, after a bit, began to incorporate a version of the data model and bits of the accessing language into the much more ambitious effort which is the subject of this chapter.

But before I get to IDE, I'll take up a little space introducing the IBM Scientific Centers in general, and the Los Angeles Center in particular. And after discussing the IDE project I'll talk a bit about a conference I was involved in organizing, and its results. So, to begin...

Place and People

The IBM Scientific Centers were planned and initiated in 1964 to help understand and meet the needs of customers in scientific areas. As explained by Harwood Kolsky and Richard Mackinnon [Kolsky_89], these goals were to be addressed "by establishing long-range contacts with leading scientific customers, understanding their problems, defining solutions, ...., and ensuring IBM's responsiveness to their needs." To do this, centers in the US often grew out of existing IBM facilities near universities and other organizations with serious scientific computing requirements. For example, the Los Angeles Scientific Center was an offshoot of the joint IBM/UCLA "Western Digital Data Processing Center", but was initially located near major aircraft firms.

The centers were intentionally small relative to other IBM research or development locations. This was for purposes of agility, based on "The idea of having a small, entrepreneurial organization within a much larger company", which "has appeared again and again in American industry"[ibid]. So projects were not the large, sometimes overplanned and overstaffed development lab efforts, and employees were not divided into the long-term established groups of the research labs. And the agile approach did have concrete, useful results (besides creating goodwill). For example, the Cambridge (Massachusetts) Scientific Center developed the CP67/CMS timesharing system, leading to VM/370 [Creasy_81], whose simple, straightforward interface was a boon to engineers and researchers for decades. Similarly, the Palo Alto Scientific Center made major contributions to IBM's Fortran optimizing compilers [Scar_80, Scar_86], the long-term mainstay of scientific programming. Scientific centers were also established outside the US; these seemed (at least to me) more concerned with IBM's image, tending to work on projects related to national interests. For example, the Madrid center participated in a project to provide scholars with display-based access to the "Archivo de Indias"... the original accounts of the 16th century Spanish voyages of exploration and conquest. And, later, NLP (natural language processing) work in the various national languages became a concern of a number of centers.

When I arrived at LASC in 1977, it was located in the IBM Aerospace Building, designed by Eliot Noyes in 1963, presumably to suggest a punch card. At the time I joined, a major organizational change had just occurred; a group developing CAD/CAM software products had been split off into a separate organization. Most of the remaining groups were working on topics unrelated to aircraft-industry-specific needs, and somewhat later there was a renewed focus on cultivating relationships with academia, to the extent that the US centers became part of "Academic Computing Information Systems" marketing. One reason for the shift was a then-alarming (to IBM) tendency for universities to acquire DEC (Digital Equipment Corporation) hardware, which in some cases seemed more easily expanded as computational needs grew. Rather than replacing a machine by a larger one, another machine could just be added, with the change invisible to users.

The People of LASC. In 1977 the center manager was someone named Lou Leeburg, who left shortly thereafter, and his replacement, named Bernie Rudin, did not remain long. For most of my time as LASC, the center manager was Jim Jordan. While we didn't have a particularly cordial relationship, for a possible reason mentioned further on, that wasn't the case with the other local managers, or with people at scientific center headquarters. Burt Whipple, my first manager at LA, was an older guy who commuted to work over the Santa Monica mountains on a motorcycle. John Kepler was a long-term Los Angeleno who occasionally moonlighted by acting in TV commercials. (A seemingly endless source of amusement was that John had portrayed a recovered alcoholic in an ad for Alcoholic Anonymous...). Arvid Schmalz was very perceptive and kind person. Finally, a manager who arrived somewhat later, Peter Woon, was a charming, intelligent guy whom I knew from compiler work, and considered a personal friend.

The small headquarters group, which oversaw all the scientific centers and reported to a larger marketing organization, was helpful not only in sponsoring projects they considered worthy, but in creating connections with other efforts within IBM. So, in general, LASC was a very pleasant place to work. It provided considerable intellectual freedom, was mostly staffed by capable, ethical people, and was largely free of the various inter-group conflicts that beset development organizations (and, as I'll talk about in a later chapter, also affected the large IBM Research Division as well).

A possible reason for my rather remote relationship with Jim Jordan, the center manager, was that he had moved from the Houston Scientific Center, which, like other US scientific centers (besides LASC), did not employ many women, so he might not have been accustomed to taking their work seriously. For example, in overviews of center activities, he tended to give more prominence to projects headed by men, even if projects headed by women might be more substantive. And at LASC women did play major roles. Rita Summers was the highly respected, (perhaps unofficial) leader of a group that focused on computer security, and was completing probably the first book on the subject [Fern_81]. Marilyn Parker led studies of business management relationships with computer systems, e.g., [Park_82]. And Nan Shu, originally from the research division, was working on and publishing extensively in the area of visual programming [e.g., Shu_85, Shu_88].

But I can't leave this overly short list of people without mentioning Al Inselberg. A serious mathematician, and a sort of "larger than life" person, he developed something called "parallel coordinates" for use in solving problems and visualizing data in multidimensional spaces [Inse_2009]; one application was for overhead displays for airline pilots. See [Schn_2020] for an appreciation of Al's personality and work.

IDE Beginnings

It took me a year or so to sort out the direction I wanted to pursue and how to go about it. In the meantime, I participated briefly in a few efforts; one of these is discussed in Footnote 1, because of its relationship to computer history. During that year I also resumed my (very part time) MBA studies, after transferring credits from the University of Santa Clara to USC.

Finding my own direction began with realizing that the LASC group concerned with business system specification was not directly related to my interests. Not that there was anything wrong with their work; it involved studies of relationships between business management and computer systems, and of high level tools intended to elicit and clarify business-system-related needs (see [Park_82] and [Saka_82]). Rather, by that time I had become almost obsessed by the complexities of building business systems, in particular those related to the multiple ways needed to express what a system was supposed to do. System requirements and basic designs were usually expressed (a) in prose and diagrams by management and systems analysts. Then these were translated (b) into programs which declared and accessed data in programming-language-specific ways, and also translated (c) into the declarations and accessing instructions needed by data management systems, which looked at the data differently from the programming languages. And all or part of that design/implementation process might iterate. There were the beginnings of automated management tools which elicited system requirements, but the results were generally just documents, for use as guidance to the people charged with actually implementing the systems.

Thus in mid-1978 I submitted a proposal for (initially) a limited-length project to specify a detailed direction for a VHLL, i.e., a Very High Level Language, subsequently called PL/IDE (Programming Language for an IDE), to address the above multiplicity of system descriptions. The proposal was accepted and I got to work. The project motivation and direction was eventually published [Newm_82] in an IBM Systems Journal Issue devoted to "Enterprise Analysis.

The actual work of the project sorted itself out into three phases. In the first phase I looked at what seemed to be the first priority. Because the goal was a programming language which could be used to not only specify, but also instantiate, a wide range of data-oriented business systems, I looked at how to syntactically combine manipulation of program-local and data base data. And, also, as the language began to take shape, a talented graduate student intern studied implementation issues. (I should mention that the scientific center interest in cultivating relationships with universities was helpful in allowing us to hire excellent part-time staff... all highly qualified PHD candidates.)

In the second phase of the project, I and another graduate student worked on cleaning up the core language definition (a major task) and then tackling the challenge of specifying the means of communicating among programming units. And in the final phase we got more down to earth, in the sense that we tackled the jobs of writing a language manual, and beginning to implement a prototype via a compiler generator.

Aspects of these phases are described in the next sections.

IDE Phase 1: Sources and Adaptations 1978-1980

To begin to define the proposed high level programming language, I looked to two major sources. One was my earlier work with Bill Kent on a data dictionary, as mentioned above, in which we defined a data model for a repository to contain descriptions of the data base structures being used by an enterprise (as contrasted with a repository for the data itself).

The data model we defined in that project was similar to others which were being explored by many at the time (see [Kent_79] for some references). Like those other models, it focused on individual facts, partly because increasingly capable random access devices made such models potentially practical; one could define the information of interest without worrying about grouping it into records, and into hierarchies of records, for purposes of access efficiency. So the data in our model consisted of "things" and their "relationships".

Things had types and names. Relationships had types and the names of the types they related. So if, for example, the dictionary were being used to describe, e.g., a simple file system, it might contain thing types "Record" and "Field", with instances such as "Record:Dept" and "Record:Employee". And it might contain a relationship type "ContainsField", with an instance like "ContainsField(Record:Dept, Field:DeptName)". Relationships could have more than two participants, and could relate not only things, but other relationhips.

These (thing and relationship) types were established via "meta-information", indicating the types of things and relationships that could be used. The meta-information consisted of definitions effectively establishing, for example: "ThingType:Record", and "RelationshipType:ContainsField". And therefore the vehicle could be used to include additional kinds of information, or could be used directly as a data base, having nothing to do with describing other data.

In the dictionary project, to accompany the model, we also defined a predicate-calculus-like query language which allowed various kinds of elisions. For example, if there were a database (rather than a dictionary) built using our model, and it contained information about employee salaries and employee managers, then references like:

"ALL ?x where GTR( EmpSalary(?x), EmpSalary(EmpManager(?x))) "

would refer to "all the ?x such that the second participant of the relationship "EmpSalary(?x, ...), namely a salary, would be greater than the salary of the manager of that ?x." (That ?x's had to be employees would have been established in the definitions of EmpSalary and EmpManager.) We also defined an update syntax which was less pretty, but one aspect was adapted to the new language.

The other major source for this phase of the PL/IDE work was a project originated and headed by Professor Jacob Schwartz at NYU's Courant Institute. I had worked with Professor Schwartz briefly in 1969, when he consulted to the ACS compiler project and wrote, with John Cocke, a book about compiling and optimizing programming languages [Cocke_70] . However, in the 1970s he and his students became deeply involved in the design and implementation of a language called SETL, for "SET language", in which the basic structures of the language were sets: sets of scalars and sets of tuples. SETL was an enormously powerful, mathematically-oriented language; a good introduction is [Kenn_75] and many publications by Jack and others have been collected and made available at the SETL archives of the Computer History Museum Software Preservation Group, headed by Paul McJones (who has collected the content of many of the archives).

While SETL as a whole did not look suitable to the purpose, as it would be rather forbidding to the non-mathematically-inclined, the basic idea was very intriguing; restricting the data manipulated by a program to scalars, sets of scalars, and sets of tuples, had the potential of merging traditional programming language data types with data base data types. Especially as one could then view references to sets of tuples with missing elements as function references, like the references to EmpSalary and EmpManager above.

Beyond that, a very innovative trademark of SETL was that termed "sinister calls", for invocation of a function on the left side of an assignment statement (the term "sinister" from the Latin for "on the left side"). Then one could update a set of tuples by references like:

EmpSal += <"Jones", "150000">
(to add the right-hand-side pair to the EmpSal set of 2-tuples)

EmpSal("Jones") = "5000"
(to remove any pair <"Jones",nn> from EmpSal and then add <"Jones", "5000">)

I was fascinated with the possibilities. Given the above, and one other device suggested by SETL, namely qualified names for data groupings (e.g., "MyDB.EmpSal += ...") one could imagine a language that minimally distinguished (syntactically) between accesses to program-local and external data.

It was not just that such a language could remove the gap between programming languages and provisions for accessing persistent data. I also had some confidence that it could be used by systems analysts for high-level business system specification as well. This confidence was inspired by a 1976 task force I had participated in with people from a number of different IBM research and applied research groups, where we easily worked out, in detail (but in prose and diagrams) the application functions and basic data structures (in that context, relational structures) of a reasonably complex business system.

Initial Language Results The initial form of the language was documented in a 1980 Scientific Center Report [Newm_80]. It began with a brief introduction to the model, then outlined a significant part of the language, and finally provided a deeper discussion of the model and its rationale as I then understood it. The part of the language covered included

expressions, including literals, set references, functional references to relationships, operators, and selection clauses
assignment statements, including "functional assignment" (i.e, the "sinister calls"), and "factored assignment" that could establish a group of set members and their relationships, for example:
Agent +=
    "Alice Epstein"
       (HasTerritory = "N.Y.", "N.J."
        HasCommmision = .10),
    "Steven Miller"
       (HasTerritory = etc.
provisions for reference to non-local "data groups" by name qualification
control statements and statement blocks, including repetition over sets, and various, somewhat innovative, forms of conditionals.

(See the report, here, for discussion of the model and examples of the initial language.)

A version of the report was submitted for presentation at the 1980 VLDB (Very Large Data Bases) Conference, but was not accepted. However, the lack of acceptance has an associated strange story. It relates to a few paragraphs of the paper describing the approach used for applying statistical functions to the data, adapted from the data dictionary work. For example, given a relationship between employees and their salaries, it provides a succinct way of requesting the average employee salary. An expression such as Avg(EmpSal(Emp)) doesn't work, because that asks for the average of a set of employee salaries, so duplicate salaries would be removed before taking the average....* In Footnote 2 I discuss the PL/IDE solution and then the evidence suggesting that a reviewer or conference committee member afterwards adopted and published the approach in a major publication. And an awful end to the story.

*(Note: There were two kinds of functional references in the language, and they had to be treated differently. If a reference was to a function represented by a program, the arguments would be submitted to the function as is, and the result type would be that described in the definition of the function. However, for references to sets of tuples (like EmpSal, above), the result would be the set of values of the missing arguments (here just the set of scalars representing the salaries of one or more employees).

Related Work. The above language work took place in parallel with considerable other research on using similar data models as a basis for data accessing syntax. Three of the five such efforts referenced in the initial paper on PL/IDE were languages which (a) were characterized as data base sublanguages, and (b) directly viewed relationships as functions. (The other two were intended for system specification alone.) In contrast, to smoothly integrate persistent data access into a full programming language, we simply permitted sets of tuples to be referenced functionally; they were not themselves considered functions. Our approach also had the useful property of permitting the use of relationships of degree greater than two. Examples of the data base sublanguages referenced in the first paper are given in Footnote 3, to illustrate the differences.

But I can't leave the subject of related work without mentioning that the use of sets as the basic data type for a programming language turned out to be much older than I had first assumed. It in fact had been specified long before, as part of the language SIMSCRIPT, which was initially designed by Harry Markowitz and early colleagues at RAND [Mark_62], and went through multiple versions. When I met Harry in the early 1980s he was at IBM, working on a new version of SIMSCRIPT called EAS-E [Mark_81]. And it is necessary to add that Harry's career alternated between economics, which was his academic concentration, and computer science. So a few years after he retired from IBM, in the mid-1980s, he shared the 1990 Nobel Prize for Economics, based on his work in the 1950s on portfolio theory.

Implementation Considerations. As mentioned earlier, while the initial definitional work was proceeding, a very capable graduate student intern, Farhad Arbab, looked at approaches to implementation. Because we understood the problem as one of defining a programming language which combined references to program-local and data base data with almost identical syntax, he first considered the semantics of the language statements as they would be expressed in their LISP equivalents, and then how such LISP equivalents could be optimized in programming language style; the result appeared as a Scientific Center Report [Arbab_80]. Then, because we knew any implementation would actually be split between implementation of program local operations and accesses to potentially large collections of persistent data, Farhad wrote a first-cut review of existing data base storage and access optimizations as another Scientific Center Report, and then further developed the review in a draft paper [Arbab_81]. The latter draft paper may be sufficiently comprehensive to serve as a useful introduction to the state of that art at the time. (Soon afterward Farhad obtained his doctorate in CS from UCLA, and soon after moved to the Netherlands, where he has been a Professor of Computer Science for most of his career.)

Travels. One other aspect of the first phase of the IDE project, and a very enjoyable one, was the opportunity to get involved in concerns beyond local ones, and to do quite a bit of traveling. Most of the travel was domestic, and related to IBM-internal and external programming language and software engineering workshops and conferences.

But one trip was unforgettable, because of the destination. I was asked to look into a project called FST (Functional Specification Technique)[Bert_78] because of its possible relationship to IDE. I unfortunately don't recall much about the project, largely because of the neighborhood in which it was being developed, in La Gaude, France, just inland from Nice, on the French Riviera. So I spent a week at the lab, and the two surrounding weekends pinching myself.. "this is work?". Driving an excellent rental car, one day I toured the famous Riviera towns... Nice, Cannes, St. Tropez, etc. Another day I traveled down the upper corniche to Monaco, visiting the medieval walled city of Eze on the way, mostly photographing cobblestones. And, finally I drove a bit inland, first to St. Paul de Vence, a another lovely medieval village. And somewhere in the vicinity I almost destroyed a bicycle race.... my French wasn't equal to what the guy on the side of the road was shouting (some equivalent of "pull over").

IDE Phase 2: Programs and Communication (1980-1982+)

In the second phase of the project I was assisted by another outstanding UCLA graduate student, Ingrid Zukerman. We expanded the core language especially in the area of control structures (which became the foundation for transaction specification), and sorted out syntactic details such as operator precedence as well as aspects of expression semantics. (See the informal writeup [Newm_83] for details of some of these, as well as a long external presentation [Newm_83a] that included examples drawn from an imagined adventure game.) We also specified provisions for communication among procedural units (and data bases) outlined below. Unfortunately, both kinds of developments introduced complexities that were not suited to a language for use in system specification at a high level, which was one of the initial goals. If the project had been extended, we probably would have realized that problem explicitly and defined a subset language for the specification task. (After Ingrid received her doctorate from UCLA, she moved to Melbourne, Australia and a long career teaching computer science at Monash University.)

Thus about inter-program communication. The period in which IDE was designed was one of considerable ferment in that area. Old methods of synchronous operation and associated communication methods (call statements and function references) persisted, but provisions for asynchronous, parallel operation and communication were being specified.

All of the above developments were justifiable. However, together they constituted a bewildering array of provisions. So the goal in identifying programming unit types and communication provisions for IDE was to coherently subsume existing and newer capabilities. To do this, we defined, as a base, the most general kind of program structure and communication method we could conceive of, namely one that accepted and sent typed "messages" from and to asynchronously executing programs. And then we defined more limited versions of these.

The approach was described in a Scientific Center report [Newm_84]. Programs were "modules" which could contain other modules. Any program could send a message to a process executing in parallel via a non-blocking "SEND" statement (actually "SENDL" or "SENDF", depending on whether the message type was defined by the sender or receiver). Some kinds of programs defined processes, which were executable units which, once created, executed continuously, and had an associated queue. Processes accepted typed messages from other modules via RECEIVE statements accessing their queue.

Thus one program might request an item from a process maintaining a stack, and accept it some time later via a sequence such as:

SENDF StackPtr.Pop()
....do something else...
....for a while...
RECEIVE
ON StackDef.Ok(TopOfStack) WHERE SENDER() = StackPtr THEN
ON StackDef.Empty () WHERE SENDER() = StackPtr THEN ...
END;

But the requesting program could abbreviate the sequence by one of several kinds of blocking requests, which included acceptance of a response or responses, specifically:

Request Structure (allows for typed responses)

REQUEST StackPtr.Pop ()
ON Ok (TopOfStack) THEN ....
ON Empty () THEN ....
Single Line Request + Response (CALL equivalent)

StackPtr.Pop ((null)//TopOfStack)
(sends nothing..i.e.null, receives top of stack, assumes empty stack is exception)
Function Request

X = StackPtr.Pop() + Y

Also, structurally constrained modules, called PROCs (procedures) were defined. The entire executable part of a PROC (aside from initialization) consisted of a single "SRCV" statement, which accepted messages in the order received, and processing them depending on type, via:

SRCV
    ON msgtype1 THEN DO
       all processing for message
       RETURN
       END DO
    ON msgtype2
       ....
       ....
END SRCV

Finally, we provided for "generated" modules, described in another LASC Report [Newm_84a]. The code of each such module specified a reference to a program generator, plus inputs to that generator, which was understood to translate the input to the core language. These were intended to obtain compatible modules implementing emerging display interfaces and supporting other program forms, as well as serving as the way of "explaining" persistent data bases consisting only of declarations (which were assumed to be handled by a built-in generator)

IDE Phase 3: Towards Implementation (1982-1984)

As summarized earlier, in this final phase, which actually overlapped the second phase, we began implementation-related activities. Another grad student, Alfonso Di Mare, from Costa Rica, tackled the job of writing a manual. And either he or someone else started to implement the language via a compiler generator. I say "either" because I don't recollect much of the work on PL/IDE in that period; certainly in late 1983 and early 1984 I was very busy organizing a major internal conference.

But I do have considerable saved documentation. The work on the manual is reflected in many drafts of different sections, and each draft includes the proposed text, plus my lengthy comments and questions. If ever time permits, it would be interesting to analyze that very long saved conversation.

Also, the beginnings of a prototype are evidenced in a formal grammar [PLIDE_84], written for use by a compiler generator called TWS (Translator Writing System) developed by Metaware (incorporated in 1979 in Santa Cruz, California). We intended to use the grammar plus TWS translation directives to translate PL/IDE programs to the language MainSail, which seemed particularly suited to the problem. MainSail was a modification, for purposes of portability, of SAIL, the "Stanford AI Language". And SAIL, in turn, was an Algol-like language extended to include sets and associations, which would have made it very useful for our purposes.

Language Technology Workshop

Now about that conference. In mid-1983 Robert Tabory, who was a program manager in the Information Systems Group Technology Programs and often worked closely with the Scientific Centers, asked me to co-organize (but I did much of the work), a corporate-wide workshop on programming languages and compilers as they were being pursued within IBM.

There was a concern, whose validity was underlined by discussions at the conference, that while IBM had significant talent and work in those areas, the work was far from well-planned and coordinated. As a generalization, in the years 1972-75, the FS project, discussed in a previous chapter, tried to centralize all hardware and software development. When that project was discontinued, decentralization seemed to become more the norm and that was reflected in the state of programming language work almost ten years later.

Organizing the conference was a big job. The workshop sponsors wanted to establish bases for discussions of how to improve the situation, and to begin those discussions. This meant getting everyone involved up to speed, via presentations on technical foundations and on current work, mostly from the major research groups, and getting sizable participation from development organizations. Which involved some scheming. One part of the scheme was the location and season. With many of the necessary people working in the Hudson Valley (of New York State), and even farther north (in Toronto), holding a conference in Los Angeles in the winter, near the beach, seemed like a good idea. (As I am writing this, in February 2022, the temperature in LA is 73 degrees Farenheit. In Toronto it is 36, with snow flurries.) Also, at least in initial invitations to research people, I possibly omitted to list all the other researchers who were also being so honored. (At least one researcher was annoyed at seeing all the others who appeared... and organized an impromptu discussion session on the sand.)

My responsibilities ranged from writing invitations and (cooperatively) designing the program to talking to the conference hotel about breakfast menus to assisting in the final reports. Fortunately, the conference, which was held in February 1984, was a success, attended by approximately 130 invitees. There were two and a half days of presentations in different areas related to language design, implementation, and business considerations. The presentations were interspersed with panels on area-relevant technical and strategic questions. Then there was a full afternoon of simultaneous, area-oriented discussions with all attendees about the questions raised. And on the final day the discussion leaders presented discussion results by area.

Many of the results related to ongoing problems of stemming from language and language processor proliferation. Language proliferation (which was correctly assumed to be ongoing) was illustrated by three separate sessions devoted to language design:

Procedural languages reflecting contemporary concepts. This session concerned languages integrating such concepts as abstraction, concurrency, and distribution. One talk was devoted to a survey of such languages. Another was devoted to ADA [Wegn_80], the language edicted for all new government contracts, because IBM had an entire division (Federal Systems) devoted to that work. And a third talk was on NIL [Parr_83], a language being developed for concurrent programming.
Languages related to data bases and visual presentation. These presentations covered interfaces to IBM data base offerings, new data-related languages such as EAS/E [Mark_84], and PL/IDE (this language), and a language for visual presentation, SUNDANCE.
"Non-conventional languages". This session included presentations about a list processing language (LISP concepts and IBM's YKTLISP [Albe_83]), a pattern matching language (SNOBOL [Gris_78]), a logic programming language (PROLOG [Colm_96]), and NIAL, an IBM evolution from APL and LISP.

Summarizing some discussion results:

Programming Language Design: Because there will continue to be new languages, a forum was needed to debate desirable language features and how they might be expressed. And, in general, planners should try to ensure language designs are based on the best of current work, both internal and external, and management should try to identify and nurture good language designers. There was also some consideration of the operating environment problems posed by new types of languages.
Language Processors (i.e., compilers): Not least because language proliferation will continue, it was necessary to increase emphasis on common automated techniques where applicable. Working against this were tendencies of groups to buy compilers from outside sources, and/or to decide on product implementations based on short-term ROI. These tendencies also created inconsistencies among the same language as packaged by different groups for different hardware contexts. The language-processor discussions also mentioned the need to better nurture talented personnel, as above, which included maintaining ongoing ad-tech groups, with some people periodically migrating to development.
Overall Language Strategy: The discussion focused on general goals, such as more streamlined product development, better technology transfer from research groups .... Also, some specific organizational recommendations were made, namely (a) to resume the maintenance of an overall corporate language strategy, and (b) to create a Programming Language ITL (Interdivisional Technical Liaison), to hold periodic conferences. ITLs were the established general venue for inter-organizational technical communication and collaboration.

The recommendation to create a language ITL was pursued immediately. The Toronto lab assumed initial responsibility, and the (initial, standing) organizing committee consisted of Nick Cooper (Toronto), Fran Allen (Yorktown Research), Barbara Meyers (Santa Teresa Lab), and myself. The first meeting was held in Tucson in 1987 (continuing the southwest "scheme") and had a number of good results. One was the transfer of some compiler optimization technology from Palo Alto Scientific Center to Toronto. (Another was the meeting of Randy Scarborough (Palo Alto) and Leslie Toomey (Kingston lab), leading to a marriage.)

And Then What Happened

Sometime in 1983-1984 the Scientific Centers were directed to devote some effort towards AI work and I was asked to refocus on some AI area of my choice. I guess I could have resisted, even though the likelihood of IDE becoming a product was minimal. I had made a number of successful IBM-internal presentations of IDE, and finally got a paper accepted at a recognized external conference (the 1985 Entity/Relationship Conference [Newm_85]) where my presentation was publicly lauded by an attendee as "brilliant" (rather an exaggeration). However, by that time, I had become very interested in Natural Language Processing (NLP), which qualified as AI. Dealing with the syntax and semantics of programming and data base languages was certainly interesting, but dealing with the same topics with respect to the languages we speak and write promised to be even more so. And so it turned out to be.

Footnotes

Footnote 1: OPD Consulting

Sometime in 1978, the people who were working on the computer security book were asked to visit the Austin (Texas) Office Products Division (OPD) laboratory to consult on a new development project, and I was asked to join them. The immediate results of the visit illustrate an aspect of computer history, and a subsequent related event exemplifies how computer history can be erased.

As far as an aspect of computer history. The OPD project was intended to replace IBM's current and planned secretarial workstations with a version of the 801, the Research Division's RISC (Reduced Instruction Set Computer). Considering how the replacement might be structured led to a new LASC project that worked out a (then-)innovative approach to multi-user local computing that illustrated the state of, and impending changes to, computer use at the time. In the evolved approach, for small offices, instead of terminals connected to "virtual machines" via shared minicomputers (which tended to be slow), the future lay in "RM", i.e., "real machines", sharing peripheral devices on a Local Area Network (LAN). Specifics of the architecture were published in internal and external publications [for example, Summ_83, Summ_87].

And as far as history erasure. The 801 computer, the machine that was to be adapted to OPD needs, existed at the time of the visit in 1978, and was a direct outgrowth of John Cocke's RISC project (see [Cocke_2000] for the history). However, the June, 2018 issue of the "Communications of the ACM", the flagship publication of the Association for Computing Machinery, devoted an article [Savage2018] to the winners of the 2018 Turing Award, David Patterson and John Henessey, claiming that they invented RISC in 1981, and did not mention the earlier work, even though it had been cited as one of the reasons for John Cocke's own, prior Turing award.. While the error may have been innocent, it is actually difficult to imagine that neither the author of the article nor a reviewer on the Communications editorial staff were not aware of the history. One or two "letters to the editor" were published in later issues correcting the error to some extent... but who reads "letters to the editor". This serious distortion of the origins of RISC in a major publication illustrates how computer history can be erased.

Footnote 2: Statistical Functions and Suspicions

This footnote, about an occurrence in the early 1980s, itself has a contemporary (2023) footnote. So, to begin. As mentioned in the main text, in 1980 (actually in about April of 1980) I submitted a paper on the language being designed for IDE. It included a description of an approach to conveniently referencing statistical functions of participants in relationships.

To repeat the problem addressed: Let's say you want to determine the average salaries of employees in some department, given a set of relationships EmpSal between employees (in the set Emp) and their salaries (e.g.,in the set DollarAmt). Unfortunately, the simple statement:

AvgSal = AVG(EmpSal(Emp))

doesn't work because the inner expression EmpSal(Emp) gives a set of unique salaries, i.e., it is not a multiset. Duplicates would be removed before taking the average. As another example, we would like a succinct way of completing the request:

CostOfRaises = .10 * sum_of_employee_salaries

in order to estimate the total cost of giving 10% raises to all employees.

To address the problem we divided it into two parts. Under the assumption that most relationships of interest here were binary, and the first participants were unique, we defined a set of statistical functions operating on the second participants of relationships, "AVG2", "SUM2". So our CostOfRaises statement would become:

CostOfRaises = .10 * SUM2[EmpSal]

(and later language refinement eliminated the need for the square brackets). Also, if the values of interest were not already the second participants of binary relationships with unique first participants, we defined a function reference modifier NONUNIQUE@, which added a first participant, consisting of an unique integer, to each individual result of the function reference.

For example, if there were a ternary relationship EmpSalHist(Emp,Year,Salary) and one wanted to know the average of all salaries for some year, one might use a "function modifier" and write

S = NONUNIQUE@EmpSalHist(Emp,2015,?)

producing a relationship set S consisting of <1, number>, <2, number> .... with one member for each different employee/salary pair for the year 2015. Then one could use SUM2(S) to obtain the desired information.

Now the suspicious follow-on. The paper was given two reviews and was rejected in both. One review was negative but brief and less than convincing. The other was quite long and rather condescending. The major subdivisions of the review were complaints (justified or not) about a lack of clarity, about the organization, and about similarities to other work. But the reviewer did not quite dismiss the paper. The introduction stated that

Too many ideas with no emphasis,. Some are well known, e.g.page 8,11,12, some are interesting, e.g, integrating functional aspects with "standard" language aspects, some are raised and not dealt with.

Then, after a year or two, I saw some published papers about a project at the University of Wisconsin focusing on statistical data base queries. The project was headed by a person from the conference program committee. The examples were familiar, but the query expressions were quite different from those in my paper... so no problem.

However, I later saw, in a paper by the same person, the introduction as new of a version of the PL/IDE statistical function provisions. That theoretical paper, entitled "Equivalence of Relational Algebra and Relational Calculus Query Languages Having Aggregate Functions" [Klug_82] was published in the Journal of the ACM, then the most prestigious of ACM publications.

To briefly explain the title: in the world of relational data base, in a "relational algebra" queries are stated as the result of (pseudo-physical) operations on relations, like JOIN and PROJECT, while in the relational calculus, queries are stated in terms of the characteristics of the intended results, like "?x where ...". Long before, E.F. Codd had demonstrated the equivalence of the two kinds of relational languages [Codd_72]. So the purpose of the paper was to add aggregate functions and then do the same.

In the paper, the author defined two query languages representing his versions of the relational algebra and relational calculus, respectively, with the addition of our statistical function syntax to the relational algebra, and a version to the relational calculus, and then proved the equivalence of the resulting two sorts of expressions. The reason he gives for introducing this syntax to the algebra was that the alternative would require operations returning duplicates, which would be problematic both from both theoretical and implementation perspectives. And thus, to quote:

we provide a parameterized family of sum functions: sum₁, sum₂, sum₃, . . . , sum_i, . . . . The function sum_i sums the numbers in the ith column of its input. Now there is no need for the notion of "duplicates." .... For example, to determine the sum of salaries (column 3) in the relation R ..... we would write sum₃(R).

Ahem. I had submitted the paper containing the SUM2, AVG2, etc. idea in April 1980, receiving the rejection notice in May 1980. Meanwhile, the statistically-related papers appeared in 1982, AND the author of the above 1982 paper was on the 1980 conference committee. So it seemed that this might not have been accidental, and I was determined to find out what had happened and take appropriate action (like writing to the editor of the journal and to the head of the author's academic department). I had kept the handwritten review, which could hve been used to determine whether the reviewer and the author of the ACM Journal article were the same person and, if not, whether there was some other connection.

And then. Later that year I opened a copy of a DB journal, probably "ACM Transactions on Database Systems" (TODS), and saw, on the first page, a black-outlined memorial for the author of the journal article, who had died in a bicycle accident while on vacation. Seeing this made me quite sad, of course because of the tragedy (he was quite young), but maybe a little because it made no sense to follow up on the problem.

And a footnote to the above footnote: in checking references now, with the advantage of the internet, I found that the author of the 1982 journal paper had written a differently organized earlier version, dated June 1980, as a local university report [Klug_1980]. So what did that mean? Was it an amazing coincidence? Or?

Footnote 3: Related Work Cited in First Paper

As mentioned in the body of the text, the PL/IDE language work took place in parallel with other research on using similar data models as a basis for referencing syntax. However, the languages referenced in my initial paper on PL/IDE intended for operational data base access (as contrasted with high level specification) differed from PL/IDE in two important ways. First, they were intended as data base sublanguages rather than full programming languages, and thus, presumably, had more limited applicability. Second, they viewed relationships directly as functions, rather than, as we did, allowing function-like reference to sets of tuples representing relationships. The view of ALL data in PL/IDE as made up of scalars, sets of scalars, and sets of tuples was the fundamental property of the language that allowed full integration of traditional programming language data types and data base data types. (So even arrays, which could be declared as such, were understood as relationships between tuples of integers, and values).

Looking at some examples of the cited related work, first, in TASL [Hous_79] the functional orientation was made very explicit by referencing the functions in terms of their DOMAINs and RANGEs. Thus if F2 was defined as a function between parts and their weights, one would access the weight of a part P by a reference

PWGT = RANGE(F2, P)

A query in another related work, FQL (Functional Query Language) [Bune_79], was not, to my eyes, very readable. For example, to produce the department names and salaries of each married employee one would write:

Ql 🠖* [CHAR,NUM] = !EMPLOYEE.|MARRIED.*[DEPT.DNAME,SAL];

The same expression in PL/IDE would be something like (under the assumption that the participant types of "Married" and InDept and "HasSalary" were separately declared.

ALL <.?e, ?d, ?s> where Married(?e,TRUE) AND InDept(?e, ?d) AND HasSalary(?e, ?s)*

*(Note: This would be used under ths assumption that the participant types of "Married" and "InDept" and "HasSalary" were separately declared and restricted to employees. If not, the query would add "?e ISIN Employee").

DAPLEX [Ship_77, Ship_81 was closer in spirit to PL/IDE. While, like those above, it directly viewed relationships as functions, and was focussed on describing data base accessing, it was more readable and looked more like a programming language than the others. However it was rather verbose. This was partly because it incorporated the idea that, because entities could be differently named in different applications, it distinguished between entities and their names. While this idea has been used in various data models, and has some justification, it can create complications where unnecessary.

Consider the DAPLEX way of adding a new student named Bill to the EE department and enrolling him in ‘Systems Analysis’ and ‘Semiconductor Physics’ courses, drawn from the reference.

FOR A NEW Student
     BEGIN
          LET Name (Student) = “Bill”
          LET Dept (Student) = THE Department SUCH THAT
              Name (Department) = “EE”
          LET Course (Student) =
              {THE Course SUCH THAT Name (Course) = “Systems Analysis”
,               THE Course SUCH THAT Name (Course) = “Semiconductor Physics”}
END

To see the difficulty with using "entities" of this type in an ordinary data base, consider that, in the "real world", university students would have student identifiers with a particular syntax, as well as names, and courses would have course numbers (like "CS45a") as well as names. So a similar operation in PL/IDE, recognizing that more pragmatic method of naming in that situation, and using the nested form to add several properties at once could be expressed as something like:

Students += NewStudent() (
StudentName("Bill")
StudentCourses("CS535", "Physics351"))

which would create a new student identifier by a function specified for the purpose, add that identifier (let's call it e) to the local data set "Students", and also (a) add the tuple <e, "Bill"> to the tuple set "StudentName", and the tuples <e, "CS535"> and <e, "Physics351"> to the tuple set StudentCourses.