A GUI for the Gurus



Regardless of where you think computer interfaces are going in the future (voice activation, artificially intelligent avatars, interactive 3d projections), the fundamental principles behind those interfaces will have to inherit a lot from current designs. Unless humanity undergoes a sudden and overwhelming psychological and neurological shift, the things that make sense now will continue to do so.

Of course, that's no reason not to correct some of the awful mistakes we have in place right now. But first, some history.

Some History

Personal Computer GUIs have come directly from a couple of places: the Macintosh, where the guts and innards of the computer are hidden away deliberately, and everything is controlled by the pervasive image of a desktop with folders and a trash can and papers, and Windows 95, which is slowly becoming more than a prettification of DOS. (The Original GUIs came from fun places like Xerox PARC.)

DOS, of course, was a stripped down, single-user ripoff of a real command line interface. Shell scripts on Unix are infinitely more powerful than .BAT files, for example. Unix comes with a treasure chest of small utilities you'd never think of until you need them -- then they're there. Even better, you can connect them together like little building blocks (or pipes, if you want to get into a plumbing metaphor), manipulating and massaging a stream of data until you have exactly what you want.

A GUI is supposed to make some of the difficult things of a command line easier. It would be pretty difficult to draw a logo with just the cursor keys (or the Emacs bindings or the vi navigation keys). Lynx and w3m are good for getting quick information off of the web, but they have their own limitations.

As a matter of fact, I'm typing this now in a vim session in an Eterm in my Linux GUI right now. It's rare, but I only have one other window open at the moment (Netscape). Usually, I have at least two terminal windows open as a matter of habit, with three (or four) open if I am doing some programming. That's a lot easier than switching back and forth between virtual consoles -- I can program in one and debug in another.

The point of a GUI is to let you get your work done more easily. Because most of us have visual leanings, GUI designers have come up with metaphors and icons and ways for us to move a little mouse around and explore the relationships between sometimes-pretty-sometimes-ugly pictures. There are some things you can do in a GUI that you can't do with a command line (presupposing that your command line interface isn't as broken and limiting as DOS is), but there aren't a lot.

In fact, there may be MORE limitations in the GUI, if you stop to think about it.

We Like Flexible

Suppose I want to count the number of words in this essay. All I have to do on the command line is 'wc -w GuruGUI.txt'. It's not even that hard. Because I'm using a shell (that's what gives me the command line) that supports tab completion, all I had to do was type 'wc -w Gu' and then hit the TAB key. Since there are no other files in that directory that start with 'Gu', it picked 'GuruGUI.txt' -- it did what I meant.

In a GUI, I probably could have dragged an icon representing this file to an icon representing wc -w (which refers to the WordCount program and tells it to count only Words, not Lines or Characters, which it also can do). Maybe that would have been faster or easier, if I didn't already know the secret, but probably not.

Now imagine if I wanted to count all of the lines with the word GUI in them. That's a little trickier. One command for that might be 'grep GUI GuruGUI.txt | wc -l'. (That means, "show me every line that contains 'GUI' in the file GuruGUI.txt and send them to the WordCount program, which needs to tell me how many lines there are.") Again, that's pretty straightforward if you're used to Unix. The funny vertical line is called the pipe character. I mentioned the idea of pipes before, and it's one of the best things everyone takes for granted in Unix. It connects two commands together -- that is, the output of the grep command becomes the input for the wc command.

Now how would I have done that in a traditional GUI? Again, I might have dragged the icon representing my file to another icon representing grep. I probably would have had to tell it that I wanted to find 'GUI'. Then it gets fuzzy. Would it have printed the results on the screen? To a file? Either way, I'd probably have to make an icon for the intermediary results (which I only care about in passing) and drag that over to the WordCount icon, telling it that I want the number of lines, or I would have had to count the lines myself. That's not so bad with only five lines to count, but what if I'd specified a whole directory full of files to search?

The classical response to that is, "That's why you have a terminal window. Just open up your pretty little Eterm window with the colorful backgrounds and type in your happy little command." That's a good answer until you think about it for a while.

You *could* create a 32-bit PNG with alpha-blending and anti-aliased fonts and lots of glorious gradients (that's a fairly complicated graphic, for those of you who aren't into the jargon) with a command line application. It may even be possible with a console program that supports mouse input and uses SVGA lib. You can also debug your nice program using only print statements instead of a debugger. Just because something is possible doesn't mean that it is the right way to do it. More on this later.

Besides that, one of the reasons there's such a push right now to make good desktop environments for GNU/Linux (such as GNOME and KDE) is to make the system easier to use for new users. That's a noble goal, but some of us have a question about the way those desktop environments are doing things.

Easy Isn't

It's no secret that the Windows 95 interface sits on the majority of PCs these days ('PC' being defined as 'an individual machine sitting on someone's desk'). Lots of people are familiar with it, whether they use Windows 95, Windows 95, Windows CE, or even Windows NT. To the extent that they have learned to work around its limitations and to see the world of their computers through its quirks, they consider it easy because it is familiar.

At the risk of oversimplifying things, the Windows 95 interface is not a whole lot more than a prettification of DOS. Yes, there are things like running multiple programs at once and automatically launching applications based on filenames and even sharing information between programs, but these weren't built in from the beginning. (Some of this doesn't specifically apply to NT, but humor me here.)

Compare this to Unix, which was designed as a multi-user, multi-tasking operating system from the very start, way back in the very late 1960's. Now ask yourself which is the more impressive example of engineering -- a huge Baroque cathedral built in the middle ages by people who didn't know what we know today but who took the time to do things right, or a skyscraper that started out as a straw-thatched hovel that's had lots of plywood stapled on to it over the years by a half-million worker ants? (Impressive, not "shocking in that it's still standing".)

So what's the point of a GUI? As above, it's to let you get your work done more easily. How does it do that? Abstraction.

Cars and Folders

Abstraction is one reason your car has a hood and an ignition switch. Instead of going out every morning, leaning into the exposed engine, and pressing two wires to the battery to get it to start, you can climb in the door, insert your key, and twist. Much nicer, no? Safer too.

In computer terms, it's the difference between being able to see a whole page of text at once as opposed to a line at a time. As mentioned above, I'm typing this essay in a program called vim. That stands for vi iMproved. vi (not capitalized -- that's a Unix convention) stands for VIsual editor. Once upon a time, back in the dark ages of Unix, the standard text editor was ed. It worked, but it only worked on a line at a time. As soon as you reached the end of a line, boom, you were on to the next one. That was (relatively) easy to program, but slightly less than intuitively easy to use, unless you were the Unix Buddha. Real programmers use CAT > filename, and so forth.

Then came Bill Joy (perhaps the Unix Kali), who wrote vi. Suddenly, you could see a whole page of lines at a time! You could scroll forwards and backwards. For the gurus (or anyone who wanted to save a file), you could even use your old friends, the ed commands! There's one layer of abstraction.

Now some people these days will tell you that vi is hard to learn. They're partially right -- it's different to learn. Does it seem intuitive to you that to save this file I have to hit the Escape key, then the colon and then the W key before hitting Enter? Probably not -- but after using vi for a while, it's second nature to me. How about erasing a single word? I just have to hit dw (for Delete Word) and it's gone. Is that easier than moving my hand to the mouse, moving the cursor to the start of the word, hitting the mouse button, moving the cursor to the end of the word, releasing the mouse button, and hitting the delete button on the menu bar? It's faster, anyway.

That doesn't work for everyone, and if vi didn't have some serious advantages, I wouldn't use it. I can type a command like this, though: .,$s/GUI/Graphical user Interface/g and change every instance of 'GUI' to (you guessed it) 'Graphical User Interface' -- and that's just the tip of the iceberg. (Emacs is another popular program, though it fills a different ecological niche. In my opinion, it would be easier to use if I had as many arms as Kali. I'd also be a better guitarist.)

For the bulk of people involved in computing today, vi doesn't meet their *perceived* needs. They're looking for something in a text editor that lets them see what their document is supposed to look like when they print it out as they are creating it. (That seems like a marketing point rather than a technical advantage to me). They're not interested in remembering that it is ESC-:wq-ENTER that saves a file and exits vi, not as long as they have a little picture that looks like a floppy disk and a big black X in the upper right corner of their screens. (So where's the picture of a hard drive? What if I don't want to save to a floppy? The "intuitive" user interface breaks down when you think about the symbols.)

On the other hand, I've seen lots of people writing proposals in Microsoft's PowerPoint, which, as far as I can tell, exists only to make bland-looking slides that can have inane animations and bad sound effects, as if meetings weren't bad enough already. I guess that just goes to show you that you can give people the wrong tool for the job, and if enough other people are misusing it, they'll misuse it.

Tangents aside, abstraction is important to a GUI, and to computers in general. It's a good thing for (at least) two reasons: first, it keeps us away from the gory details we may not need to know to get our jobs done, and second, it helps us work better by appealing to different parts of our brains.

How much work would you get done if, to save a file, you had to tell the read/write heads of your hard drive to move to a certain cylinder and write particular sectors? ESC-:wq-ENTER seems a lot easier compared to that, doesn't it? How much easier still is going to the File menu and selecting the Save command? Now imagine explaining how to save a file to someone who had never used a computer before. "Up there, on the left is the File menu. Things you can do with the whole file are in that menu. You can open a new one, save this one, save it as a different file, close it, print it, or exit the program."

That's not so bad, either. Everything is, basically, a file in that metaphor. And that's pretty good for what most people are likely to be doing.

What Would We Do Without Things To Do?

There's another thing people are likely to be doing, and that is running programs. There are four ways that users can do that. (The system can also run them automatically when the computer starts or at a specific time or when something asks for it... but those aren't really things most users even know about.)

First, they can type the name of the program. This presumes that there is some place where the users can type. It also assumes that the users know where the program is located, if it's not in their path (which is just a list of places where the programs they want to run are normally located) and that they can give the program the right options it might want. (As in the 'wc -l' example, 'wc' by itself gives the cryptic output '59 2220 12176 GuruGUI.txt' which is the number of lines, the number of words, and the number of characters of this file as of the last time I saved it. If you want just one field, you have to give the program those options.)

Second, the users can find the name of the program (or some mnemonic or nickname) in menus provided. Think of the mostly ubiquitous Windows Start menu, for one. KDE has a menu button with a pretty uppercase K superimposed on some shiny metal gears, and GNOME has a menu with a G-shaped footprint. It's all the same sort of thing. The users have to navigate through a hierarchy of menus which is supposed to be organized in an intuitive taxonomic fashion. In reality, any program you're likely to install on Windows wants to be at the top level menu, which means that your tree is way too top heavy. (GNOME and KDE seem to be better about this, for the most part). Assuming that things organize themselves properly, it's relatively easy for users to browse through the menus, finding things they want. (Want to play a CD? Start, Programs, Accessories, Multimedia, CD Player. My intuition isn't quite the same as the person who came up with that path for Windows NT.)

Third, the users can create shortcuts on their desktop. Desktop is just a fancy term for 'the bits of your screen that don't do anything when you click on them that you see when you don't have any programs running.' It's part of the metaphor again. All a shortcut is is a little bit of data that tells the system "Open up that program over there and maybe tell it something" when someone clicks on it. They're easy to find, which makes them good for programs people use all of the time (like a word processor or an e-mail program), but people only have so much screen space for them. Besides that, lots of programs people install like to create their own shortcuts. They do often have colorful little icons, but after you install a few programs and create a few documents, you'll be out of screen space.

One interesting point which illustrates a fundamental difference between Windows and Unix is the idea of shortcuts. Fun little beasies called 'symlinks' (symbolic links) exist under Unix. On the surface, they look the same, but they're much more convenient. Instead of saying, "Hey, you, go open up this file over here!" they say "Here is a file." I could edit my X11 settings with the command 'vi /etc/X11/XF86Config' or with '/usr/X11R6/lib/X11/XF86Config'. It turns out that the latter is actually the former -- the directory entry for the latter actually points to the first file. Subtle, and probably non-orthogonal to the entire argument, but just realize that as far as the operating system cares, /usr/X11R6/lib/X11/XF86Config is actually the real XF86Config file, instead of a little text file that says "open up that file over there instead, thank you!" -- fundamental difference being that Unix got it right instead of crufting it in years later.

The fourth way users can launch programs is by opening a file associated with that program. As you may be able to guess by now, Unix does this slightly differently from Windows. Under Windows, if I have a file named GuruGUI.txt and I open it (whether by double-clicking or hitting enter on its icon), Windows will launch a program which knows how to read txt files. (That's what is known as the associated program -- some of these are set up by default. Also, as you can probably tell, lots of programs associate themselves with particular files when they're installed. That can be a headache, if the program you just installed associates itself with files for another program which you prefer.)

Of course, if I take a file called GuruGUI.png (which might be a graphic I create later, for some reason) and rename it to GuruGUI.txt, Windows will try to open it as a text file, even though it is really a graphic. Yes, Windows has a shallow association -- it goes by just the file name. Unix, on the other hand, actually looks inside of the file for a magic number, and then performs the appropriate action based on that. (A magic number is just the numerical value of a couple of the first characters in a file that's associated with a type of file. For example, a Perl program probably starts out with the line: "#!/usr/bin/perl -w". The hash-bang -- meaning the #! -- tell Unix that the file is a script and that it needs to launch the perl program in the /usr/bin directory and use the -w option. Not too complicated.)

There are magic numbers for executable programs like those in a.out format (one type of Linux program), ELF format (a newer type of Linux program) and even Java class files (not the coffee, the programming language from Sun). It's not particularly deep magic, because somewhere, someone had to say 'This number goes with this program and that number goes with that program,' but it's better than 'Part of the name of this file goes with this program.' (That somewhere is in the kernel, which is why it works on the Unix command line and in the GUI. I don't know for sure how Unix desktop environments like KDE and GNOME handle associations.)

Of these four ways, which do you think people use the most? Probably the shortcuts on their desktop. You can expect that the three or four programs people use the most will have happy little icons hanging around somewhere on the root window (that's what a Unix GUI calls the 'desktop').

Now Help Me Find My Keys

Now I'm a pretty sophisticated fellow, with an arguably pretty desktop. I have eight icons down the right side of my screen on this machine. The first is shaped like a computer. If I click on it with the left mouse button, it brings up a shiny new Eterm. The middle mouse button brings up rxvt, and the right mouse button brings up the Old Faithful of terminals, Xterm. Did you catch that? One icon, one click, but I can have my choice of three different terminals depending on which button I click. (To Microsoft's credit, you can go for years without even realizing that you have more than one mouse button. You can also drive for miles without releasing the emergency brake.)

Some readers might object that it's too hard to remember things like which mouse button on which icon summons on which program. There's solution for that, too. If I rest my cursor over an icon for a little while ('little while' being a number of milliseconds between zero and 'the amount of time before you give up completely'), a fuzzy little thought bubble pops up, telling me that the computer icon represents Terminal Emulators (which is exactly what Eterm, rxvt, and Xterm are, if you didn't already know) and which button summons which term.

Unfortunately, most of the recent incarnations of this particular style of desktop (simplifying what it really is for the sake of the argument) have given up those nifty little icons in favor of the GNOME or KDE bars. If you're not familiar with either, think of the happy little Windows Start bar, except not nearly as ugly and slightly more customizable. Once, I made a GNOME bar look like it was covered in green Swiss moon cheese.

While both GNOME and KDE provide desktop icons, they're not as flexible as the Enlightenment icons I've described -- while there is probably a menu and a happy little green Swiss moon cheese looking dialog somewhere I can configure them, I wasn't exactly thrilled with their usability. They also don't open the programs I want.

You open programs with GNOME and KDE, for the most part, by clicking on a happy little not-a-Start-button Start button (as I mentioned earlier, either a metallic K or a G-shaped foot) and navigating through a hierarchy of menus. They are arranged better than the Windows menus, strangely enough, but I can get to something like Eterm faster by left clicking once on the Terminal Emulators icon-button than I can by clicking G-shaped foot, Utilities, Eterm.

Also, what if I wanted to run a utility that shipped with KDE instead of GNOME? The two play together to some degree -- but I'd have to go to G-shaped foot, KDE Menus, Applications, Multimedia, CD Player, for example. Meanwhile, there's a reasonable place for a CD Player under G-shaped foot, Applications, Multimedia, CD Player. Which one do I want? Do I know the differences between the two? (Answer -- I think the one GNOME knows about probably looks a little prettier, but that's not a big deal. Both play CDs.)

Why the trade off? If it's easier to click on an icon or a button to launch a program, why go to a hierarchical menu?

There are two answers. The first is, there is a limited amount of screen space available for those icons -- especially with the desktop metaphor. If the screen is supposed to be a desktop, shouldn't you have folders and files and papers all over it? (Where do the programs fit in that metaphor -- in a coffee cup pencil holder? In a drawer? Shaped like a typewriter?) The second answer is, it's supposed to be easier to find things.

There's a big assumption here, namely that the way you organize things in your head is the same way the menu and application and desktop environment designers organize things in their heads and is also the way computers can store and display and manipulate things. (If you count to ten, do you start at zero or one? Your computer starts at zero, if it speaks the C programming language or some derivative.)

The bigger assumption, of course, is that your computer desktop has anything in common with what your real desk looks like. Mine certainly doesn't. I only have one file folder on my desk, and I have a whole pile of books on most of my shelves. If I want to open the file, I don't move my hand over it and tap it. But I guess some abstractions are worse than others.

Taking Things To Task

Some interfaces have taken to make things easier. For example, if you want to write a letter with pen and paper, do you think to yourself, "Sit down at the desk, launch the pen, summon a piece of blank paper, and brush a talking paperclip out of the way"? Or do you think of it in terms of the task? "I'm going to write a letter."

Simple interface solution, yes? Just name things after their tasks, and everyone will be happy. Everyone, that is, who only uses the particular tasks you set up and means the same things you do by them.

If I created a task to check for new files in a particular directory, run them through a Perl script, and upload the resulting files to my web server, what would I call it? Would you find it useful? Would my Grandmother?

While there is a subset of tasks everyone is likely to use (log on, log off, fetch e-mail, buy me jellybeans from a website), you're either going to stop defining them after you get a small amount of generic tasks defined, or you're going to go crazy trying to satisfy everyone.

Now don't get me wrong -- the nifty GUIs are configurable. There are nice control panels and control centers where you can change your Green Swiss Moon Cheese theme to a Starry Starry Night theme. (If that's not a theme, it should be.) But where do you go to make a fundamental change in its behavior?

Suppose that I want to use my 'How many times have I written the word GUI?' example all over the place -- I use it five or six times per day. In a strictly command-line environment, I would write a script to do it. Instead of typing it in from scratch every time, I'd type it into vim once and save it. I'd give it a good name like GUIcount and perform the magic required to make it executable and accessible. That's a two minute job, if you include the time it's taken to write this paragraph.

Most Of The Power, All Of The Eye Candy

Here is where things get abstract. Take a deep breath.

What if I want to do the same thing in my GUI?

Where would I go? I could use the same script and put a shortcut to it on my desktop. I could put a shortcut to that script in the little G-shaped foot menu, or the metallic K menu. That would probably also be a two minute job. (I'd be more likely to put it under one of the half-dozen slick looking metal buttons I have on my home machine -- there's a certain amount of style there that comes from not imitating the Start menu.)

That's all well and good for someone like me, who knows how to do this, but what if my Grandma wanted to do the same thing? (Humor me on this, all of you who know much about her.) One of the benefits of computers is that they can do and even *like* doing repetitive tasks. But Grandma doesn't know how to write scripts. She, in my example, just knows that she wants to count how many times I have used the word 'GUI' in this essay.

As far as I can tell, she's out of luck, unless she wants to muck about on the command line. She's welcome to borrow my Unix in a Nutshell book anytime, but should she have to? Again, she just knows that she wants to Count Words.

One of the interesting developments in programming has been visual development. (Yes, this is going somewhere, but there's yet more background to absorb.) GUIs for programmers have sprung up from somewhere, allowing aspiring coders to reach into a visual toolbox, dragging buttons and fields all over the place to create more GUIs.

Suppose I wanted to create a small survey. I could use a visual programming environment! I would draw a small frame to hold my stuff. I would drag some checkboxes onto my frame, and give them the appropriate labels. For each of those, I could give them a separate key ('what kind of data are you?') and value ('what data do you hold?'). Finally, I could add the magic GO button, and tell it that, when pressed, it writes the keys and values to a file. Whee, there's ten minutes worth of work there (presuming you're familiar with the tools -- otherwise, give yourself another ten minutes).

Sun's Java has taken this one step further. It supports something called JavaBeans (look, abstraction, metaphor, and a silly pun! That's good coding!) which are simply software components that conform to a specific standard. A JavaBeans programming environment reads data from these Beans and lets you know what they do. You can then drag them out of the BeanBox (what, not the roaster?) and hook them together and make a program that does something more than the sum of its parts. And when I say 'drag', I actually mean that. You click on one, hold down the button, and move the mouse and the Bean to where you want it. Then let go of the mouse.

Back to Grandma. What if her Linux GUI had a little toolbox in the corner and a little workbench? She would probably be able to find a little Tool labelled 'Count Words' there. She could drag it to the work bench, where it might prompt her for additional information. She could also drag a little icon from whatever program she's using to read my essay to the workbench, and attach it to the Count Words tool. Bingo, she's just duplicated my quick and easy 'wc -l GuruGui.txt', without having to call me and ask me how to do it. (Any Grandma who actually *does* this without having to call is substantially cooler than most.)

Suppose she wants to save this. No problem! The work bench supports that -- and it puts the new tool into her toolbox. Nifty.

These tools absolutely must support piping -- that is, if I wanted to duplicate the 'count the number of lines with this word on them' behavior, I could connect the tool for Find Word to the Count Lines tool. If I wanted to put the results of that into the current document, I could add the little file icon for Redirect Output Here. It doesn't take much imagination to see that this is just the tip of the iceberg.

The Joy Is In The Destination

Instead of working within the limitations of the programs written to take advantage of our GUIs, we'd be able to surpass them by connecting together lots and lots of little programs that each do one thing well. That's the Unix command-line philosophy, and it really ought to be the philosophy of our Unix GUIs too.

From a programming standpoint, it's really not that hard. All it takes is some sort of standard (the beautiful thing is that there are so many from which to choose!) of getting information from one place to another. But we're spending more time copying the look and feel of GUIs which don't really have any power underneath them, instead of taking advantage of the fact that our Unix systems are fully functional even without pretty icons. (Last month, I set up a webserver on a Debian GNU/Linux box that has never ever even had a GUI installed. Try that from DOS or Windows NT's DOS emulator.)

I'll know we've arrived at the point where our GUIs are Guru friendly when I have the choice of doing something with the toolbox or writing a shell script with vim in my Eterm window. Let's make that day come sooner.

---
version 1.01
copyright 19 December 1999, chromatic
thanks to tmr and Chilli for suggestions

For the most recent version of this document, please visit: http://snafu.wgz.org/chromatic/essays/

Please send suggestions and corrections to chromatic@snafu.wgz.org

This work may be redistributed in whole, provided that the copyright notice remains. This work may also be modified in whole or part, provided that the original copyright notice is preserved and the original is provided, or a link to the original is preserved, and provided that the derivative is clearly labeled as a derivative work.