(This is an excerpt of an Osterman Research Inc. White Paper sponsored by Reed Tech Web Archiving Services and LexisNexis. Please click here to download the entire White Paper.)
The web has become the primary communication and commerce channel for businesses and government agencies. Digital media (web sites and other web-based content) has all but replaced print media as the primary mode of communication with customers, constituents, prospects, investors and others. The web is also becoming the primary channel for transacting business, managing commerce for everything from online purchases to tax payments.
However, business and governments do not yet understand that they are liable for everything they publish online. Organizations that do not archive web content run the risk of not preserving a record of their claims, offers and other content posted on their web sites. Retaining this content has become both a legal and regulatory requirement, and so the question is not if web content should be retained, but only how much and for how long it should be preserved.
Web archiving has been going on for quite some time, but enterprise-class solutions have only recently become available. New, state-of-the-art technology is now available to manage web archiving and it has the power and flexibility to meet existing and emerging web archiving requirements. As a result, any organization that uses the web to communicate or manage commerce should consider developing a web archiving policy and deploy the appropriate technology to support that policy.
The fundamental message of this white paper is:
ABOUT THIS WHITE PAPER
This white paper discusses the importance and benefits of web archiving and various use cases for it. It also briefly discusses the sponsor of this white paper and their relevant offerings in the space.
Why the Web Represents the Next Phase of Archiving
WHAT IS WEB ARCHIVING?
Web archiving is what its name implies: the capture and archival storage of web-based content. This can include individual web pages, entire web sites, content from web 2.0 applications like social networking sites, and other web-based content that is important to capture and retain, normally for long periods.
The concept of web archiving is not new. For example, the Wayback Machine - a web archiving service maintained by the non-profit organization Internet Archive based in San Francisco, California - has been archiving web content since 1996[i]. However, the Wayback Machine has several limitations for use in a business context:
As a result, while the Wayback Machine is a good first step toward archiving web content, more sophisticated - and enterprise-class - web archiving is becoming a necessity for a growing number of applications, as discussed below.
WHAT DRIVES THE NEED FOR WEB ARCHIVING?
Many of the drivers for web archiving are fundamentally the same as those for email and other electronic content archiving:
WEB ARCHIVING vs. SERVER BACKUPS
There are some significant differences between web backups and web archives:
WEB ARCHIVING: THE NEXT STEP
Web archiving can rightly be considered the next logical extension of an organization's traditional archiving of email, files and other electronic content. While email and other types of electronic content archiving tend to focus on internal content - emails sent to and from employees and business, word processing files and presentations created for internal uses, and so forth - web archiving trends to focus much more on publicly available content. Because the web - including static web sites, web applications, social networking content, etc. - is primarily public-facing in nature, web archiving focuses primarily on content that the public has already seen or has had the opportunity to see.
As a result, web archiving is focused to a greater degree than traditional electronic content archiving on issues like brand protection; reputation management; policy enforcement; protection of content based on when it is created, posted and taken down; business continuity and corporate memory.
Archiving Is Already an Established Best Practice
THE WEB IS GROWING RAPIDLY
The amount of content on the web has ballooned exponentially in recent years. For example, as of December 2009, there were 234 million web sites, 47 million of which were added just in 2009[ii] - an average of nearly 129,000 web sites added every day. Further, even as far back as 2008 there were well in excess of one trillion unique URLs on the web and the number continues to grow at a rapid pace.
Growth of the web is being driven by a number of factors, including the ubiquity of web access, the ease and low cost with which content can be published and updated, and greater cultural acceptance of the web as a medium of information-sharing and commerce. For these reasons, both business and government are increasingly reliant on the web as their primary means of communications and process management.
Consequently, the market for web archiving - as well as archiving of email, files, SharePoint content and other information - is growing at a healthy pace. Web archiving, currently a small segment of the total content archiving market, is poised to become an enormous area of growth, driven by the issues discussed in this white paper.
GROWTH IN THE MARKET IS DRIVEN BY A VARIETY OF FACTORS
For just about any company, government agency or educational institution, there are four primary drivers for archiving their electronic content. However, the importance of these drivers will vary by an organization's size, the industry(ies) in which it participates, the advice of its internal and external legal counsel or compliance officers, and the locales in which it operates:
Electronic content stores, including web sites, contain a growing proportion of business records that must be preserved for long periods of time. Further, this content is frequently requested during discovery proceedings because of the Federal Rules of Civil Procedure (FRCP) and state versions of the FRCP. As a result, it is critical that all relevant electronic content be made available for e-discovery purposes.
Further, when a hold on data is required, it is imperative that an organization immediately be able to begin preserving all relevant data. For example, if a dispute arises because of a claim made on a page of a company's web site, that content must be preserved for as long as a court, regulator or other authorized entity may deem necessary. An enterprise-class web archiving system allows organizations to immediately place a hold on data when requested by a court or on the advice of legal counsel.
If an organization is not able to adequately place a hold on data when it is obligated to do so, it can suffer a variety of serious consequences, ranging from embarrassment to major legal sanctions or heavy fines. Litigants that fail to preserve electronic content properly are subject to a wide variety of consequences, including brand damage, additional costs for third-parties to review or search for data, court sanctions, directed verdicts or instructions to a jury that it can view a defendant's failure to produce data as evidence of culpability.
In addition to the e-discovery and legal hold benefits, an enterprise-class web archiving system allows an organization to perform either formal or informal early case assessment activities. For example, if a customer makes a claim against a company based on a statement made on the company's web site, senior managers can search the archive for information that will help them determine the potential liability they face. If this assessment of the potential lawsuit results in a determination that the company was indeed wrong in making the claim, they can instruct legal counsel to pursue a quick legal settlement. If, on the other hand, the assessment results in the discovery of information that supports the company's position, that information can be used to convince the customer to drop the case or it can help win the case if it goes to trial. In either case, an archiving system can help the organization to understand its position early on, either avoiding unnecessary legal fees or an adverse judgment, or reducing its costs by proving the sufficiency of its case.
For just about every organization, there are a large and growing number of regulatory obligations to preserve electronic content. Some of the more important requirements are:
HIPAA violations have been expanded dramatically. For example, if a covered entity or one of their business associates loses 500 or more patient records, it must notify HHS and a "prominent media outlet" to let them know what has occurred. Section 13402 of HITECH requires that if a "covered entity has insufficient or out-of-date contact information for 10 or more individuals, the covered entity must provide substitute individual notice by either posting the notice on the home page of its web site or by providing the notice in major print or broadcast media where the affected individuals likely reside."Fines for HIPAA violations can now reach as high as $1.5 million per calendar year.
Recent FINRA Disciplinary Actions Related to Web Content
Securities and Exchange Commission RulesMembers of national securities exchanges, brokers and dealers are obliged to preserve all records for a minimum of six years, the first two years in an easily accessible place (SEC Rule 17a-4). The affected records are broad and encompass originals of communications generated and received by individuals within financial institutions, including inter-office memoranda and internal audit working papers. Also included are automated messages sent to all customers, which could include email blasts. The records may be "immediately produced or reproduced on 'micrographic media' [microfilm, microfiche or similar] or by means of 'electronic storage media'. As noted above the Securities and Exchange Act of 1934 has been amended to specifically include the requirement to post certain types of content on the web.
(Please click here to download the entire White Paper.)