Compression
This was written over twenty years ago, but on seeing it again I was struck by how much it anticipated ChatGPT and other LLMs. See what you think ...
Sunday 03 September 2000
The Wheatgrass conference room was on the third floor, a large low ceilinged room overlooking the main parking lot. Being Saturday, the lot was half deserted, its yellow dividing lines aggressive in their unaccustomed visibility. Children on bicycles circled and shouted in a far corner. The opposite wall of the room had two large windows on to a corridor. Few people passed, and they did not look in. Both doors were closed.
There were twenty three people there, men and women, aged between low thirties and mid fifties. All were casually dressed in the style of computer professionals in the informal Californian environment. When they had seated themselves and fallen silent, the chairman, John Armstrong, rose. A tall, thin man in a plaid shirt and jeans, he divided his gaze between his audience and the table in front of him as he spoke.
“Thanks for coming, everyone. I gather those of you who haven’t had personal experience of this effect have been partially briefed by those who have, so I’ll keep this introduction short and general, call on Sylvia for a more detailed description, then we’ll break for a while. As you all know, this meeting has been called to discuss anomalous results from the use of MPg7.
“MPg7 is our latest proprietary compression format. It has been developed over the last nine months by the advanced products division in Mountain View. Its purpose, given the increasing use of electronic books and journals, was to create very highly compressed files for the electronic, which is to say Web based, distribution of textual intellectual property, particularly books.”
Armstrong paused and drank briefly from the glass of water by his notes. He did not look up, and there was no sound or movement from the people around him.
“MPg7, was, as I say, created over the last nine months. For the first six nothing unusual happened; the team was carrying accepted compression techniques further in the light of faster hardware and new mathematical discoveries. It was good, if uninspired, work that made existing compression a little better. But in January one member of the team made a remarkable suggestion – documented in the reprint you have in front of you – which promised to be so effective that work on the previous track was halted and all efforts turned to basing MPg7 on this new technique. We will discuss its details later, but for the present it suffices to say that it appeared to offer incredible compression of text. A lossy compression, as it turns out, from which comes the problem before us.”
Armstrong coughed, drank again, and nodded to the woman by his side. “Ann, perhaps you would take over.” He sat down abruptly. A slim, blonde woman in white blouse and chinos rose, glanced around her colleagues uncertainly, and spoke. Her voice was clear and educated, and strengthened as she proceeded.
“Current compression techniques go back fifty years, to the middle of the twentieth century, when there was first the need to send large amounts of information rapidly over simple phone lines,” she said. “The original large scale hardware implementation was the common modem in the household PC. We can divide compression formats broadly into two groups: those that use lossless compression, in which the compressed file contains and can rebuild all information in the original, and lossy compression, in which information considered inessential or unnoticeable is discarded in the compression process. In the second the decompressed file is close to the original, but degraded in subtle ways that are designed to be unnoticeable in use. Lossy compression has been limited to images and sound files, as fuzziness in a picture or background noise in a song does not, within certain limits, affect its value. That is an acceptable trade-off for receiving it over the Web, or for storing it in a space saving format. The more lossy the format, the more degraded the image or sound, but the essential information is preserved by the process.
“What John Lim proposed was a lossy compression algorithm for text, a completely new concept. Until then, lossy compression for a book or newspaper article was impossible. Text that is not exactly the same as the original is garbage, random letters on a screen. A book has to be recreated precisely as it was made, and from this has come the problem with rapid distribution and storage of digitized books and journals.
“The details of Lim’s process are in front of you. In essence, his technique is a translation to the textual world of artificial intelligence techniques already used to compress images: pattern recognition, redundancy elimination, style acquisition, and inference. This is not to understate the unprecedented originality or practicality of his approach. It is rigorous, measured, and successful. However, in its implementation we have encountered a curious problem, one not foreseen and not – so far as we can tell – easily overcome.
Her voice slowed, as if approaching a part of her story not easily told. Perhaps sensing this, the audience shifted in its seats. When the faint breeze of movement had ceased she continued.
“Early trials of MPg7 in January were successful, and it appeared to recreate book length texts from files that were an order of magnitude smaller than those from the best current compression algorithms. The texts were coherent and at first glance identical to the originals. In fact the team was so delighted with its success they had to be restrained from early announcement of the technology to our internal news service. Then one of the team members with a degree in English literature compared a rebuilt story with its original word by word. This had not previously been done, a general reading and spell checking being felt sufficient to find the kind of textual noise typical of lossy compression. This person found deviations in the text – a word here and there, not enough to alter sense or affect grammar – that caused her to raise the subject at the next scheduled development meeting. It then turned out that the team had not considered the technology complex enough to create a logically and grammatically valid file that still differed from the original. They reexamined the process and found the program rapidly built very complex and short lived database files that were deleted when the process was complete. Clearly it was by no means as simple as they had assumed.
“Using as their original The Library of Babel, a twenty page story by Borges, they increased the compression ratio by stages to its maximum, and created a series of six files of decreasing size. The shortest is the equivalent of no more than twenty words of average length. These files exist on the CD you have, and the stories rebuilt from them are in the folders before you. A copy of the original is also provided. I recommend you take time to study and compare them. They are, in my experience, quite unprecedented in this field.
“At lower compression settings, levels five and six, the decompressed files have a few words on each page altered to synonyms consistent with the writer’s vocabulary and style. Occasionally a phrase is reworded. But all with perfect spelling, grammar, and sense. When the compression is increased to level four more phrases start to change; paragraphs move around. A character’s actions may change, but always consistent with the story’s style and intent.
“It is at three that we see the real effects of MPg7. Not only words and phrases, but entire scenes and characters are gone or transformed. New ones appear, but none so you could pick them out as being new if you did not know the original. Which can be discerned, but as if through a mist or a faded memory. What is preserved is the writer’s style and something more than his style. Perhaps it could be called intent.
“After rebuilding from the highest compression settings, two and one, nothing of the original remains. MPg7 creates completely new texts in which not a word matches the original. They are wonderful, fascinating stories – but they are not Borges. And yet… in some curious way, they are. No-one else could have written them; they are undeniably his. His character and genius have been preserved through the compression process, as intended. But it is hard to know what to call the result; perhaps there is no word for it.”
There was silence round the table. Someone cleared their throat, then fell silent. Pages turned.
Armstrong rose again, nodded at the woman, who resumed her seat. Again he stared around the people in front of him, then down at the papers before him.
“Thank you, Sylvia. So there you have it. We are here to decide what to do… with what we have. Clearly a compression process that writes new material is not suitable for the distribution of copyright intellectual property. And yet… have we not discovered something of equal if not greater value? Some essence of the creative process distilled and preserved.”
A bald man halfway down the table shook his head vigorously. “No, I don’t agree. Is not creative process, any more than low res JPG is artistic process. These are decompression artifacts, nothing more.”
“Have you read the stories, Viktor?” Sylvia asked. When the bald man shook his head again she added. “Do that, and then say if these are just compression artifacts. There is something more at work here. There is new meaning, new purpose.”
“Purpose without intelligence?” he said. “You contradict yourself.”
Sylvia looked at Armstrong, who spoke slowly. “I think we all need to re-examine our prejudices – no, let me call them our assumptions – in the light of this development. I suggest we break for one hour to read this material and reconvene at three.”
The meeting broke up, its members carrying their thick binders, each with Strictly Confidential printed across its blue cover. In a few minutes only Sylvia and Armstrong were left. They gathered their papers slowly, appreciating each other’s presence.
“What does it mean, John?” she asked. “Are we moving to a new plane of development, one in which creativity appears in the computing process?”
“I think that’s inevitable,” he said. “But I didn’t reckon on it appearing – if it has – so soon. I should be happy, but something about this disturbs me. Perhaps our unique humanity is being diluted.”
“Diluted or shared?”
He shrugged. “Let’s just say my assumptions about what it is to be human have been shaken yet again. Much more of this and they may collapse completely.”
Smiling, each carrying a blue binder, they left the conference room to its afternoon sunlight and the distant shouts of children.