...understand the system of standards and methods used to control and create information structures and apply basic principles involved in the organization and representation of knowledge
Meaning and Importance of Competency
Information pervades our environment. As sensing beings, we are constantly bombarded by information in many forms, from speech and bird songs to the daily headlines and incoming email. One of the primal tendencies of the human mind is to organize this ceaseless flow of information in a meaningful way. One of the fundamental ways in which we organize is by aggregating, or grouping like things with like. Even our homes and offices are designed to make it easier to sort and store objects such as clothes, books, food, and office and cleaning supplies together in designated areas. The criteria by which we judge the likeness of things, including units of information, vary depending upon contextual factors, such as the intended user group, the habits of the user group, and the amount in need of organization, but whenever we categorize, classify, or group things together, it is based upon some assertion of likeness. In identifying shared qualities, we simultaneously discriminate unlike from unlike; aggregation and discrimination are two faces of the same organizing process. In his book Everything Is Miscellaneous: The Power of the New Digital Disorder (2007), David Weinberger gives a name to this grouping of things: the first order of order. “In the first order of order, we organize things themselves—we put silverware into drawers, books on shelves, photos into albums” (pp. 17-18).
The deep-seated human drive to organize information in order to make it more useful and accessible leads us to describe, analyze, and create representations of information. These representations are known in the field of library and information science as surrogates and can include line items on a list, rows on a spreadsheet, or cards in a card catalog. Weinberger calls this type of systematic representation the second order of order (p. 18). The creation of surrogates involves both description and analysis. When assigning values to different attributes of a particular information resource, some attributes (e.g. title, author, or publication date) can be taken directly from the resource itself and are therefore descriptive, while others (e.g. subject headings, difficulty level), are the result of independent judgment and are therefore analytical. Analysis of information requires the use of subjective criteria such as previous knowledge, research, or opinion and introduces the possibility of bias into the process of organizing and describing information.
Unfortunately, second-order organizational systems are not always sufficient for large and complex collections of information. Retrieval times can be slow, and users may require a more diverse set of access points than is possible in such a system. For example, a list of titles and authors may suffice for organizing a small personal library, but in a multi-branch public library, users may need to be able to search for resources by keyword, topic, ISBN, or format. Thus arises what Weinberger calls the third order of order, which, unlike the first and second orders, is made up not of “atoms” but rather of “bits” (p. 19). The third order of order is accomplished by computers and involves the organization of information into databases which break down units of information into records and fields which can be described, organized, and searched individually. Examples of third-order information structures include library OPACs, research databases, and Web-based thesauri.
The third order of order—the creation of digital information structures—benefits from adherence to standards and brings about the need to exert control. Adherence to standards helps to ensure that the systems created are effective and user-friendly, and exerting control through the use of thesauri, subject headings, and standard encoding schemes helps to ensure consistency and appropriate levels of specificity and exhaustivity in information retrieval for specified user groups. Some standards that play an important role in the field of library and information science today include MAchine Readable Cataloging (MARC), the Library of Congress Subject Headings, classification systems such as Dewey Decimal Classification and Library of Congress Classification, and the Anglo-American Cataloging Rules. Other standards of importance to our field include Web standards such as Standard Generalized Markup Language (SGML), Hypertext Markup Language (HTML), and Extensible Markup Language (XML). The use of controlled vocabularies such as the Library of Congress Subject Headings (LCSH) and Faceted Access to Subject Terminology (FAST) help to ensure effective precision and recall. Adherence to standards and rules and the use of control measures even allow for the sharing of resources between libraries and other organizations through shared cataloging, bibliographic utilities such as OCLC, and federated search.
Preparation and Evidence
To demonstrate my understanding of the system of standards and methods used to control and create information structures and my ability to apply principles involved in the organization and representation of knowledge, I present two assignments that I completed for courses in the online School of Library and Information Science (SLIS) at San Jose State University (SJSU). My first piece of evidence is a data structure with attributes, values, and rules that I completed for LIBR 202, Information Retrieval. My second piece of evidence is a thesaurus construction assignment I completed for LIBR 247, Vocabulary Design, in which I developed a small controlled vocabulary based on established standards.
First Piece of Evidence: Data Structure with Attributes, Values, and Rules, LIBR 202
In Fall 2009, I took the course “Information Retrieval.” Some of the major topics of the course included the organization, representation, and description of information as well as classification, indexing, subject access, controlled vocabularies, and data structures. For one assignment, students in the course were asked to analyze a small collection of images using descriptive terms, to create a simple data structure for a specific user group, to assign values to attributes based on perceived user needs, and to write a narrative description of the process followed. Students were provided with a collection of ten images of both well-known buildings such as the Taj Mahal and Taliesin West and representative examples of common building types such as barns and churches. Students were also provided with a template to use in constructing the data structure. The assignment thus enabled students to understand a system of standards and methods used to control and create an information structure and to apply basic principles involved in the organization and representation of knowledge.
Students were permitted to work in teams but each student was responsible for turning in an individual assignment. The assignment, which was divided into two parts, specified that the user group consisted of students in a university extension course on architecture appreciation. Major topics of their course included architecture over time and across cultures and the relationship between building function and architectural style. For Part 1 of the LIBR 202 assignment, students were instructed to examine the images in the collection carefully and make a list of words and short phrases describing each one. We were also instructed to include terms describing the distinctions and commonalities among the collection of images as a whole that would be of interest to an architecture student. After compiling this initial list of terms describing each image, we were instructed to analyze our term choices with consideration to granularity, non-subject-related terms (such as those describing the quality or kind of photograph), geographic descriptors, architectural categories (e.g. houses of worship or castles), and how to account for multiple images of the same building. From this analysis, we were asked to select and record on a spreadsheet template at least five words or terms best describing each image for the given user group.
Part 2 of the assignment aimed to give students experience in examining how language is used to describe information and how data interacts with the data structure to improve information retrieval. We worked in teams to merge individual term choices and compile master lists of terms. After this point, I worked independently to complete the rest of the assignment. First, I edited my team’s master list of terms. The assignment instructed us to pay attention to several factors, including how frequently particular terms were used, variations in use of language, syntax and spelling to describe the same thing, level of granularity, which terms were common or distinctive among the various images, and whether any important terms had been omitted; we were also instructed to edit and organize the list into a structure of only five database fields and associated data. In my narrative, I describe how I used the description of the user group to distinguish between relevant and non-relevant data. The five fields I chose were 1) Building Name or Image Title, 2) Location, 3) Time Period Completed, 4) Building Functions, and 5) Architectural Features. This process involved reducing the number of terms, placing terms into categories, evaluating and selecting the best terms given the user group and purpose, and being consistent with grammar, syntax, and use of singular and plural forms of nouns.
My next step was to define rules for each database field. For this exercise, students were given two rules to define for each database field: 1) Repeatable/Non-Repeatable, to allow for unique identifiers or for multiple terms to be entered, and 2) Optional/Required, to allow a field to be left blank or to force data entry. In my narrative, I describe my rationale for defining each field as I did. For example, I defined the Building Name or Image Title field as “non-repeatable and required because I wanted each image in the data structure to have a unique identifier, which would allow students from the user group to search for particular buildings by name” (p. 5) and defined the Location field as repeatable “because I wanted to allow for entries at various levels of granularity, such as city, state, region, country, or continent, since users might search the structure using different location terms” (p. 5).
Once I had defined the rules for each field, I entered data into my template, following my own rules. This involved the use of judgment in selecting terms and in the number of terms entered. The final step of the assignment was to write a narrative explaining the decisions I made and why and describing how my data structure would benefit the user group. My final paper, which I submit here as evidence of my mastery of this competency, includes my completed data structure and narrative. For this assignment, I received a perfect score based on my choice of attributes and rules, my choice of terms for describing each image, my narrative, my attention to detail, and my demonstration of in-depth understanding of the assignment. This assignment therefore demonstrates my understanding of the system of standards and methods used to control and create information structures and my ability to apply basic principles involved in the organization and representation of knowledge.
Second Piece of Evidence: Thesaurus Construction Assignment, LIBR 247
In Fall 2011, I took the course “Vocabulary Design.” Thesauri are one of the major types of controlled vocabularies in use in the field of library and information science, and one of the major topics of this course was thesaurus construction. One of the major assignments of the course was to develop a small controlled vocabulary based on well-established standards. The process involved in completing the assignment included term selection and control, facet analysis, and the construction of a thesaurus using TheW32 software program. This assignment demonstrates my understanding of a system of standards and methods used to control and create an information structure and my application of basic principles involved in the organization and representation of knowledge.
In the assignment, students were given 15 subject statements for facet analysis. The statements were supposed to represent concepts from a library and information science collection and included such statements as, “The evolution of evidence-based librarianship in Britain” and “Story-time ideas for toddlers, moms and dads.” The full list of subject statements is included in Appendix 1 of my final report. The user group was defined as including library and information science students, faculty members, librarians, etc. Students were instructed to use the 15 subject statements as the basis for term selection and term relationships, including broader terms (BTs), narrower terms (NTs), related terms (RTs), and both preferred and non-preferred terms, and to use two Web-based thesauri—the ASIS&T Thesaurus and the thesaurus of the Library Literature and Information Science Full Text database, available from SJSU’s King Library—to add additional terms and relationships. Students were instructed to include about 100 to 140 lead-in terms in their thesauri and to submit a final report with several appendices demonstrating the various steps in the process.
For Step 1 of the assignment, following the model presented in our course instructional materials by Dr. Ali Shiri, I performed facet analysis on the 15 subject statements, placing terms into such fundamental categories as “Abstract entities,” “Artifacts,” “Agents,” and “Operations.” My list of main facets and sub-facets is included as Appendix 2 of my final report. In my report, I describe the steps I completed and justify the choices I made, referring to the guidelines discussed in class and to examples from my finished product. My decisions for this step included rules for noun forms, spelling, the inclusion of slang terms, and the use of hyphens.
Step 2 of the assignment involved the use of TheW32 thesaurus construction software as well as the two Web-based thesauri mentioned above. The objective of Step 2 was to construct relationships between terms and to identify BTs, NTs, and RTs. As I consulted the two online thesauri and entered terms and relationships into TheW32 software, I also began the process of identifying preferred and non-preferred terms, making note of instances where I found a preferred term for one I had selected in Step 1. I also made decisions about the number of terms to admit into my thesaurus and about factoring terms and modified or discarded some of the terms that I had selected in Step 1 based on what I found in the two Web-based thesauri; in my report, I discuss decisions I made when I found discrepancies between the two. The final list of relations I constructed is included as Appendix 3 of my final report.
For Step 3 of the assignment, I moved on to entering scope notes and lead-in terms with “Use” and “UF” (Use For) notes. Again here, I relied heavily upon the two Web-based thesauri to select preferred terms, to find additional lead-in terms, and to identify terms requiring scope notes. In my report, I describe an example in which I started with the term “seniors” in Step 1: “while ‘seniors’ was on my list in Step 1, based on what I found in the Library Literature and Information Science Full Text database thesaurus, I made the preferred term ‘aged,’ used ‘senior citizens’ rather than ‘seniors’…as a lead-in term, and found the additional lead-in terms ‘elderly’ and ‘older people,’ which seemed important to include based on variances in individual users’ choice of search terms” (p. 4). I also describe other examples which demonstrate my thought process in making decisions about terms and relationships. My final thesaurus, which shows all relationships between terms, is included as Appendix 4 of my final report.
I received a grade of A on this assignment. Dr. Shiri’s comments affirmed that I demonstrated understanding of the principles behind facet analysis and created an “excellent” and “very well developed” thesaurus. Because of its emphasis on selecting terms, carrying out facet analysis, controlling terms, and using a software application to construct a thesaurus based on well-established standards, this assignment demonstrates my understanding of the system of standards and methods used to control and create information structures and my ability to apply basic principles involved in the organization and representation of knowledge.
The knowledge I have gained about the organization, description, and analysis of information as well as standards and methods used to create and control information structures is sure to be useful to me as I continue on my journey as an information professional. Because information organization and retrieval are such a fundamental part of what we do as librarians, the principles that I have learned and the standards with which I have become familiar will enable me to be more effective in my work. As cooperation and collaboration become increasingly important in our field, familiarity with and adherence to standards such as MARC and XML and the use of controlled vocabularies such as LCSH and FAST will grow in importance as well. I am therefore grateful to have received excellent instruction in this area through my studies in the SLIS program.
Weinberger, D. (2007). Everything is miscellaneous: The power of the new digital disorder. New York: Holt Paperbacks.