Core: noun, the most important part of a thing, the essence; from the Latin cor, meaning heart.

Click for Front Page of Current Issue (Home)

 Volume 1.17  This View’s Column June 3, 2002 

Seeing is Believing

And It Always Will Be

Video technology has been advancing at a dizzying pace. Computer-generated (CG) imagery enhances many movies in ways we (some of us, anyway) have come to expect. It also enhances live television broadcasts, including news and sports, in ways to which we have become accustomed — assuming we are aware of them. The technologies will continue to advance and will converge, some day, with momentous consequences.

Computer-Generated Imagery in the Movies

You could swear the Tyrannosaurus rex in Jurassic Park is real. Even close up. When it looks for the children in the car, during the rainstorm, its eyes seem to be the real searching eyes of a real gigantic dinosaur. It snorts, and the people and things in the shot respond as if a real animal has really snorted. It roars, and the actors cover their ears; it chases, and they flee. Or get gobbled up.

But, of course, that T. rex is not even there. Well, it wasn’t there when the scene was originally filmed. The actors and things in the shot are reacting to simulations, by other persons or things, of what a real dinosaur would have done — and of what a special-effects dinosaur will do in the scene as it eventually plays in the finished movie.

Industrial Light & Magic

Many months, if not years, of work go into making a movie with a lot of complicated CG effects. An interesting and informative article at the HowStuffWorks website lists the different kinds of specialists that are employed at one of the more famous and successful CG special-effects companies, George Lucas’ Industrial Light & Magic (ILM):

  • Visual effects supervisors
  • Technical directors
  • Software developers
  • Scientists
  • Art directors
  • Producers
  • Modelmakers
  • Animators
  • Editors
  • Camera operators
  • Stage technicians

Centropolis FX

Another interesting article at the same website, on another CG special-effects company called Centropolis FX, describes some of the high-tech hardware required by the company, and explains why it is all so necessary:

  • The scanned film and the different layers that the team creates require gigantic amounts of disk space. A single frame of a film, once scanned and stored on a disk, consumes on the order of 10 megabytes of disk space. All of the shots of The Patriot together consume 1.6 terabytes (trillions of bytes) of disk space.
  • Individual artists need high-end desktop machines to work on and render their individual models and layers.
  • Rendering requires massive CPU resources. To render any animated 3-D figure or any effect like water or smoke, the CPU must generate millions of polygons, lines, points, etc and then light them correctly. And it must do this over and over again for each frame of the shot! For example, in The Patriot certain scenes incorporate hundreds of soldiers as well as things like boats, tents, flags, and so on. Each one moves independently, according to mathematical models expressed in the form of thousands or millions of pixels that are each calculated specifically.
  • Compositing — Compositing combines dozens of layers into a single shot. Because of the resolution involved — millions of pixels and tens of millions of bytes per frame — and the layering, both the CPU workload and the storage requirements are immense.

Ever Improving Results

The results of the special-effects wizardry get better and better — that is, more and more realistic — as time goes by. For instance, some of the scenes in the original Jurassic Park don’t strike me as being quite so realistic as similar scenes in the sequels: I am thinking of the scenes with animals running in herds, or even of any scene with a very large number of animals, running or not. In such scenes in the original movie, some of the animals have some faint, almost indescribable quality of looking as if they are not really there — of appearing to be the fakes they really are.

I hardly notice this at all in the sequels. The technology improves, and the professionals gain in skill with experience, giving us better, more realistic, results.

Computer-Generated Imagery on Television

For reasons similar to those already discussed, you could swear the English-speaking Stenonychosaurus librarian in ABC’s Dinotopia is real, though not all the characters are quite so realistic.

But some television broadcasts also present an entirely different mode of CG effect: making the viewer see something in a live broadcast that is not really there. This kind of special effect falls largely, so far, into two categories: (1) advertising and “branding”, and (2) visual enhancement of sporting events.

Advertising and “Branding”

Over the past few years, insertion of digital advertising images into live broadcasts of sporting events has become more and more common. Called virtual ads, they are seen only by the TV viewing audience.

Their use is not confined to sports, according to a New York Times article, January 12, 2000:

CBS News is using the technology as part of a broad agreement the network signed last year with a technology company, Princeton Video Image, to provide branding services for a variety of CBS programs. The technology has been used regularly on The Early Show and the news magazine 48 Hours and was used on the Evening News on Dec. 30 and 31, according to CBS news executives. The Early Show has been using it almost every day since the show’s debut on Nov. 1.

News show logos that appear real are being inserted on the sides of structures, like the General Motors building, on the back of a horse-drawn carriage in Central Park, in the fountain outside the Plaza Hotel and, yesterday, in the center of Wollman Rink. In some instances, the logo clearly resembles a large billboard advertising CBS News.

The First-Down Line

Nor is their use in sports confined to advertising. During some professional hockey games in the 1996-1997 time frame, a digitally-enhanced image of the puck was broadcast. Used by Fox TV, it was officially called FoxTrax — but was also dubbed The Blue Blob, which gives us some idea of what it must have looked like, at least to its critics. As far as I can tell (not being at all a hockey fan), the practice was controversial among the viewing audience, so it was soon discontinued.

Another digital enhancement of the playing field has been much more successful: the first-down line in broadcasts of football games.

The first-down yard line can be especially difficult to spot by TV viewers. Enter SporTVision, which has been providing a special-effects service to Fox Sports and ESPN since 1998 that “draws” a yellow or orange line across the field to mark the first-down line for the TV audience.

Think of it as a giant virtual highlighter. HowStuffWorks provides some idea of just how complicated an affair it is to draw with this virtual highlighter:

  • The system has to know the orientation of the field with respect to the camera so that it can paint the first-down line with the correct perspective from that camera's point of view.
  • The system has to know, in that same perspective framework, exactly where every yard line is.
  • Given that the cameraperson can move the camera, the system has to be able to sense the camera’s movement (tilt, pan, zoom, focus) and understand the perspective change that results from the movement.
  • Given that the camera can pan while viewing the field, the system has to be able to recalculate the perspective at a rate of 30 frames per second as the camera moves.
  • A football field is not flat — it crests very gently in the middle to help rainwater run off. So the line calculated by the system has to appropriately follow the curve of the field.
  • A football game is filmed by multiple cameras at different places in the stadium, so the system has to do all of this work for several cameras.
  • The system has to be able to sense when players, referees or the ball cross over the first-down line so it does not paint the line right on top of them.
  • The system has to also be aware of superimposed graphics that the network might overlay on the scene.

Eight computers are required to do all that, and four people to run the system:

  • A spotter and an operator work together to manually input the correct yard line into the system. The spotter is in the press box and the operator is in the production truck physically keying in the correct number.
  • Two other SporTVision operators are on hand to make any adjustments or corrections necessary during the course of the game. These adjustments might include adding colors to the color palettes due to changing field conditions, such as snow or mud.

Ever Improving Results?

I cannot say that live-broadcast CG special effects have been improved dramatically over the years, as have the effects in movies. (I cannot say, because I do not know.) Apparently, the first version of FoxTrax was very much in need of improving; as reported in an article at Canadian Online Explore (CANOE), January 15, 1997:

“I think that the biggest criticism, from our perspective as we look back at last season, was the stability of the system, that regardless of color or size it would be jumpy to the point of distraction,” Goren said Wednesday. [Ed Goren was executive producer for Fox network sports.] Goren said engineers have redesigned the system, and tests have shown a much more stable, smaller dot around the puck. “I think that is a major, major improvement,” Goren said. “You won’t see that dot jumping around the way you did last year — unless the puck jumps.”

I am sure, though, that a great deal of research, and trial and error, and continuing improvement has gone into in-house test, and beta, and original production versions of live-broadcast effects: how else does anything really get accomplished? Perhaps there have not been any dramatic improvements because the effects themselves have been inherently simpler and more subtle than those desired in movies.

Computer-Generated Imagery in the Future

The implications of these technological trends have been floating around my mind for quite some time now. A column by Fred Reed, “Surveillance in Digital Times” was a catalyst:

A few days ago, on the web site of The Register, a British site that covers developments in computers, I discovered the following story, also in many US papers:

“Super Bowl 2001 fans were secretly treated to a mass biometric scan in which video cameras tied to a temporary law-enforcement command center digitized their faces and compared them against photographic lists of known malefactors.”

Bingo. Not good, not good at all.

But Fred, you you might say, what a convenient way to catch bad guys. It sure is. Hidden cameras could be put in all manner of public places. If a wanted criminal, or missing child, or suspected terrorist walked past, an alarm would go off, and the gendarmes would appear. Note the words, “fans were secretly treated” in the Register’s story. The public needn’t — apparently didn’t in Tampa — know it was being watched. After a while, we would get used to it.

I have learned since that this technology is employed widely in casinos, to watch for known cheats. And the New York Times reported, May 25, that a trial run of such technology was being undertaken at one point of entry (of the two) to Liberty Island, where the Statue of Liberty is:

In response to a warning of a potential terrorist attack on the Statue of Liberty, the National Park Service activated a face recognition surveillance system yesterday that takes pictures of visitors and compares them with a database of terror suspects.

Just in time for the crush of visitors on the [Memorial Day] holiday weekend, federal parks officials installed two cameras, mounted on tripods, at the ferry dock in Battery Park, where visitors leave Manhattan for Liberty Island. The cameras are focused on the line of tourists waiting to board the ferry, immediately before they pass through a bank of metal detectors.

After the pictures are taken, the images are checked against a photographic database of terror suspects compiled by the federal government. If the system finds a match, a United States Park Police officer will be notified and the visitor will be detained. The officer will then decide if the visitor’s face matches the database image, and if the officer decides it does, the visitor will be questioned further.

Naturally, this has civil libertarians up in arms, so to speak, for reasons I will not get into here. (If you really want some thought-provoking reading on this, have a look at Fred Reed’s article “Just Because They Aren’t Out To Get You” in conjunction with the one quoted above.)

This is what really caught my attention about the high-tech photography used at the Super Bowl in 2001: they can program computers, and machines controlled by computers, to photograph faces, then scan a database looking for matches — and they can accomplish this search-and-find almost instantaneously.

Wow. Do you see the implications?

Consider (1) the lightning speed at which this digital process (scan and search) happens and (2) the digital special-effects capabilities in movies and TV. These technologies will only improve. But they will not only improve: they will converge.

The day will come, I thought, when somebody will be able to take real, live imagery (such as a politician giving a speech) and substitute phony, digital imagery so quickly and seamlessly that nobody seeing the broadcast will be able to tell.

I suppose that may seem to be a pretty big leap. I never thought it was. But if it was a big leap, a very large step has already been taken towards getting us to where somebody can make it. So reports The Boston Globe, May 15:

Scientists at the Massachusetts Institute of Technology have created the first realistic videos of people saying things they never said — a scientific leap that raises unsettling questions about falsifying the moving image. In one demonstration, the researchers taped a woman speaking into a camera, and then reprocessed the footage into a new video that showed her speaking entirely new sentences, and even mouthing words to a song in Japanese, a language she does not speak. The results were enough to fool viewers consistently, the researchers report....

“This is really groundbreaking work,” said Demetri Terzopoulos, a leading specialist in facial animation who is a professor of computer science and mathematics at New York University. But “we are on a collision course with ethics. If you can make people say things they didn’t say, then potentially all hell breaks loose.” ....

Currently, the MIT method is limited: It works only on video of a person facing a camera and not moving much, like a newscaster. The technique only generates new video, not new audio. But it should not be difficult to extend the discovery to work on a moving head at any angle, according to Tomaso Poggio, a neuroscientist at the McGovern Institute for Brain Research, who is on the MIT team and runs the lab where the work is being done. And while state-of-the-art audio simulations are not as convincing as the MIT software, that barrier is likely to fall soon, researchers say.

What Are the Implications?

I might as well tell you outright: I do not know what the implications are. Who could? But I have some observations that might help us to have some idea how to think about these issues.

Computing capabilities — memory, storage, speed — have advanced with astonishing rapidity over the past 15 years. I remember (I think) my first PC: it was an 8 MHz 80286 with 2 megabytes of RAM and a 40 Mbyte hard drive. And Windows 2.0. I remember when I got its replacement: a 66 MHz 80486 with 16 Mbytes of RAM and a 512 Mbyte hard drive. And Windows 3.1. Honestly, it was so much faster than my first machine, it would perform the same operations so much more quickly, I sometimes thought something was wrong: I couldn’t see it performing the operations on the monitor, or hear the activity of the disk drive, so I worried that it hadn’t done them at all! And that machine would be considered a slow-as-molasses-in-January old dog compared to the 400-MHz machine I’m using now. (And I have files on this machine that would not fit on that 512-Mbyte disk!) And this 3-year-old machine is a slow old dog compared to the machines being sold today. And computing capabilities — memory, storage, speed — will continue to advance with astonishing rapidity.

Computers can learn from people. Not as people learn from people — student from teacher, child from parent, young from old, anybody from books. No. But programmers can “teach” computers how to do what only people could have done before. You want some cash? Forty years ago, you went to the bank and got some from the teller. Today, you can go to a machine and get some from the withdrawal slot: people taught machines how to do what only people had done before. (Of course, computers aren’t the only machines involved here. But they are all, ultimately, computer controlled.) Computers will, eventually, learn from people to do what people have done in running computer-generated special-effects imagery. And they will do it much, much faster than people have been doing it.

Human beings will continue to be... human. We will continue to engage in politics of all kinds: strictly political, religious, racial, ethnic, cultural, etc. And some of us will not scruple for a moment to resort to ruthless strategies and deceptive tactics to advance our goals. When — I do say when, not if — when the technology exists to make (for instance) Hillary Rodham Clinton look to all the world in a live broadcast as if she is praising Karl Marx and Josef Stalin to the highest heavens, or to make (for another instance) Cubans turn to one another and say “Gee, it’s amazing that Castro is still going so strong for a man of 97 years of age” — well, somebody, somewhere will be happy to do their best to make it so. Do you doubt that? Unscrupulous persons, no matter their political or other persuasions, will be able to convince thousands, if not millions, if not hundreds of millions of people, simultaneously, that they have seen with their own eyes what in reality never happened at all.

And seeing is believing. And it always will be.

ELC 2002


 Volume 1.17 This View’s Column June 3, 2002 

The View from the Core, and all original material, © E. L. Core 2002. All rights reserved.

Cor ad cor loquitur J. H. Newman — “Heart speaks to heart”