University of California, San Francisco.
Legacy Tobacco Documents Library.
About the Data
   
   
   
   
Data Origins  
 
In 1998, a Master Settlement Agreement (MSA) was signed by the Attorneys General of 46 states and the nation's seven major tobacco industry organizations: Philip Morris, R.J. Reynolds, Brown & Williamson, the American Tobacco Company, the Council for Tobacco Research, and the Tobacco Institute.

As part of this agreement, the industry agreed to release documents on their own websites, and to provide a "digital snapshot" of these sites as they existed in July 1999. Initially, data was provided by the individual tobacco companies to the National Association of Attorneys General (NAAG) and given to the UCSF Library to create the Legacy Tobacco Documents Library (LTDL). Since that time, UCSF has collected index records and document images directly from industry documents websites through the use of spidering software. The Master Settlement Agreement only mandated that the industry maintain and update these documents websites until 2010 but through support from the American Legacy Foundation, the LTDL will maintain this data in a permanent, stable interface.

In addition to the MSA-mandated documents, the LTDL contains additional document collections: UCSF Brown & Williamson, Mangini ("Joe Camel") Documents, a Multimedia Collection, and the Tobacco Depositions and Trial Testimony Archive (DATTA). For more information about the origins of these collections, please see our About the Collections page.
 
Data Format   Back to Top
 
Most of the documents in the LTDL have an index record which was created by the tobacco industry. This index record contains the metadata for each document; descriptive information about the content such as—title, author, date, and persons mentioned, as well as information regarding litigation usage and case names. Because the index records were created by different companies, different formats of this data were provided to the LTDL. In the interest of supporting understandable search options for users, like data fields were "mapped" together. Information on the fields and the data they contain can be found in Fields in Each Collection.

When the LTDL began, users were able to search the metadata (title, author, date, etc.) associated with the documents but they were not able to search the full text within the document pages. In 2004, the Library conducted a large OCR project which rendered approximately 98% of the LTDL documents full-text searchable and led to the revision of the LTDL website into a more robust document research tool.

 
Data Policy   Back to Top
 
To enhance usability or to improve the quality of information, original indexing records may be changed under the following circumstances:

  • To correct data that has been corrected on industry documents websites
  • To augment data that has been augmented on industry documents websites
  • To correct widespread indexing errors to conform to information in documents
  • To systematically augment indexing records to enhance usability
The Library will not remove document records from the LTDL. To protect privacy, the Library may remove document images if they contain confidential medical information.

The Library will not alter any document images except to "redact" Social Security numbers or personal bank account numbers. In such cases, the Library will note redactions prominently both in the record and on the document.

The following notice appears in the indexing record of any document that has been so altered:
SOCIAL SECURITY NUMBERS & PERSONAL BANK ACCOUNT NUMBERS REMOVED

In addition, these documents are marked REDACTED wherever information has been removed.
 
UCSF Library   Contact Us | Legal | Privacy | Site Map | Sponsors | UCSF Tobacco Control Archives    American Legacy Foundation

University of California, San Francisco ©2007 The Regents of the University of California