Internet Archive Raises Copyright Concerns

It started out as a run-of-the-mill trade secret dispute. In a 2003 lawsuit, a small Philadelphia-based consumer advocacy company called Healthcare Advocates Inc. (HAS) alleged that employees of a rival company called Health Advocate Inc. (HA) fraudulently entered into discussions of a merger or joint-marketing agreement with HAS to gain a competitive advantage by stealing its trade secrets.

A key piece of evidence in the case was the contents of an old version of HAS' Web site. The defendant argued that the information it gained in the merger talks, such as marketing materials and business strategies, was not protected as trade secret information because it was once freely available to the public on HAS' Web site.

Although the information was not present on the current version of the HAS Web site, HA's attorneys at the firm Harding, Earley, Fuller & Frailey found an exact copy of the HAS site as it had looked when the two companies were in merger talks through the Internet Archive. The Internet Archive is a non-profit organization that preserves archival copies of Web sites that may later be removed or modified by their owners. HA's lawyers sought to use the archived versions of the site to defend the trade secret case. Ultimately, that never came to pass because the trade secret claims were dropped.

But that wasn't the end of HAS's case. It turned around and filed suit against two new defendants: Harding Earley and the Internet Archive.

Filed in 2005 in the Eastern District of Pennsylvania and now in the early stages of discovery, that suit is the first to challenge the legality of the Internet Archive. The result will have a significant impact on who gets to control companies' online identities.

Information Dump

Right now, the Internet Archive has significant sway over what information remains on the Web. Located at, the site maintains a freely searchable database of 55 billion archival copies of Web sites. This means that if a company updates or deletes a portion of its site, the old version may be out there for anyone to access.

HAS learned that the hard way when opposing counsel accessed archival copies of its site.

Harding Earley's activities, HAS alleges, constitute violation of the Digital Millennium Copyright Act, violation of the Computer Fraud and Abuse Act and common law copyright infringement. In facilitating that access, Internet Archive broke the law as well, HAS contends.

"Internet Archive makes copies of Web pages whether or not they are copyrighted," says Scott Christie, a partner at McCarter & English in Newark, N.J., who represents HAS. "Making a copy of copyright protected material, storing it, and making it publicly available to anyone with access to the Internet violates the Copyright Act. We take issue with that, as do many others."

While many companies may not like the idea of unauthorized copies of old versions of their Web sites floating around cyberspace, there is another competing policy interest at hand: The Internet Archive's database of Web pages is a valuable resource for researchers and historians from which no one turns a profit. Many people believe that activity is fair use.

"Internet Archive has not been involved in much infringement litigation, presumably because most people understand the legality of its business model and the value of its mission," says Stefani Shanberg, a partner at Perkins Coie who represents the archive.

She says her client plans to ask for summary judgment and cites specifically Field v. Google and Parker v. Google, in which two federal courts recently found that the cache of Web sites Google maintains does not infringe copyrights.

"The Google cache is analogous to the Internet Archive collection," Shanberg says.

Many outside observers agree the copyright claim is unlikely to prevail.

"The Internet Archive is essentially an educational resource," says Ross Dannenberg, an IP partner at Banner & Witcoff in Washington, D.C. "And I don't see how it has any effect on the market value of the copyrighted sites. This favors a determination that this is fair use."

Protecting Yourself

If the archive is found to be fair use, there are still a few things companies can do--short of testing the uncertain waters of an infringement suit--to try to keep old copies of their sites off the Internet.

The most common method of blocking sites such as the Internet Archive from indexing and copying the site is inserting a simple text file into the directory that contains the Web page on the Internet. That file, usually a robots.txt file, tells Internet Archive's automated crawlers not to access that domain name. Internet Archive provides detailed instructions for how to do this on its site.

HAS says it followed those instructions, but Internet Archive made copies of its sites anyway. Internet Archive doesn't dispute this. However, it says it never guaranteed the efficacy of using a robots.txt exclusion. Nor does it have to, it contends.

"Internet Archive is under no legal obligation to provide the ability for site owners ?? 1/2 to deny the public future access to this once freely available information," Shanberg says. "Nonetheless, Internet Archive does its best to accommodate exclusion requests."

If HAS has its way, the Internet Archive will become a slightly less scary place for corporations. Christie says his client advocates that Internet Archive take out licenses to use companies' copyright Web content. Alternatively, Christie would like to see Internet Archive adopt a system in which a company would have to opt in before the archive makes copies of any Web sites.

"We simply want Internet Archive to respect copyright owners' rights to work product," he says.

Whether it will be forced to do so now lies in the hands of the federal courts.

Staff Writer

Bio and more articles

Join the Conversation

Advertisement. Closing in 15 seconds.