![]() |
||||||||||||||||||||||||||||||
![]() |
|
|||||||||||||||||||||||||||||
|
Schedule Digital Libraries as Phenotypes for Digital
Societies The research and development community has been actively creating and deploying digital libraries for more than two decades and many digital libraries have become indispensable tools in the daily life of people around the world. Today's digital libraries include interactive multimedia and powerful tools for searching and sharing content and experience. As such, digital libraries are moving beyond personal intellectual prostheses to become much more participative and reflective of social history. Digital libraries not only acquire, preserve, and make available informational objects, but also invite annotation, interaction, and leverage usage patterns to better serve patron needs. These various kinds of usage patterns serve two purposes: first, they serve as context for finding and understanding content, and second, they themselves become content that digital libraries must manage and preserve. Thus, digital library research has expanded beyond technical and informational challenges to consider new opportunities for recommendations, support of affinity groups, social awareness, and cross-cultural understanding, as well as new challenges related to personal and group identity, privacy and trust, and curating and preserving ephemeral interactions. This trend makes digital libraries cultural institutions that reveal and hopefully preserve the phenotypes of societies as they evolve. Session 2A: Interaction Hear it is: Enhancing Rapid Document Browsing with Sound CuesParisa Eslambochilar, George Buchanan and Fernando Loizides Document navigation has become increasingly commonplace as the use of electronic documents has grown. Speed-Dependent Automatic Zooming (SDAZ) is one popular method for providing rapid movement within a digital text. However, there is evidence that details of the document are overlooked as the pace of navigation rises. We produced a document reader software where sound is used to complement the visual cues that a user searches for visually. This software was then evaluated in a user study that provides strong supportive evidence that non-visual cues can improve user performance in visual seeking tasks. Creating Visualisations for Digital Document Indexing Indexes are a well established method of locating information in printed literature just as find is a popular technique when searching in digital documents. However, document reader software has seldom adopted the concept of an index in a systematic manner. This paper describes an implemented system that not only facilitates user created digital indexes but also uses colour and size as key factors in their visual presentation. We report a pilot study that was conducted to test the validity of each visualisation and analyses the results of both the quantitative analysis and subjective user reviews. Document Word Clouds: Visualising Web Documents as Tag Clouds
to Aid Users in Relevance Decisions Information Retrieval systems spend a great effort on determining the significant terms in a document. When, instead, a user is looking at a document he cannot benefit from such information. He has to read the text to understand which words are important. In this paper we take a look at the idea of enhancing the perception of web documents with visualisation techniques borrowed from the tag clouds of Web 2.0. Highlighting the important words in a document by using a larger font size allows to get a quick impression of the relevant concepts in a text. As this process does not depend on a user query it can also be used for explorative search. A user study showed, that already simple TF-IDF values used as a notion of word importance helped the users to decide quicker, whether or not a document is relevant to a topic. Session 2B: Knowledge Organization Exploratory Web Searching with Dynamic Taxonomies and Results
Clustering This paper proposes exploiting both explicit and mined meta- data for enriching Web searching with exploration services. On-line results clustering is useful for providing users with overviews of the results and thus allowing them to restrict their focus to the desired parts. On the other hand, the various metadata that are available to a WSE (Web Search Engine), e.g. domain/language/date/document information, are commonly exploited only through the advanced (form-based) search facilities that some WSE offer (and users rarely use). We propose an approach that combines both kinds of metadata by adopting the interaction paradigm of dynamic taxonomies and faceted exploration. This combination results to an effective, flexible and efficient exploration experience. Developing Query Patterns Query patterns enable effective information tools and provide guidance to users interested in posing complex questions about objects. Semantically, query patterns represent important questions, while syntactically they impose the correct formulation of queries. In this paper we address the development of query patterns at successive representation layers so as to expose dominant information requirements on one hand, and structures that can support effective user interaction and efficient implementation of query processing on the other. An empirical study for the domain of cultural heritage reveals an initial set of recurrent questions, which are then reduced to a modestly sized set of query patterns. A set of Datalog rules is developed in order to formally define these patterns which are also expressed as SPARQL queries. Matching Multi-lingual Subject Vocabularies Most libraries and other cultural heritage institutions use controlled knowledge organisation systems, such as thesauri, to describe their collections. Unfortunately, as most of these institutions use different such systems, unified access to heterogeneous collections is difficult. Things are even worse in an international context when concepts have labels in different languages. In order to overcome the multi-lingual interoperability problem between European Libraries, extensive work has been done to manually map concepts from different knowledge organisation systems, which is a tedious and expensive process. Session 3: Special Session - Services Leveraging the Legacy of Conventional Libraries for Organizing Digital LibrariesArash Joorabchi and Abdulhussain E. Mahdi With the significant growth in the number of available electronic documents on the Internet, intranets, and digital libraries, the need for developing effective methods and systems to index and organize E-documents is felt more than ever. In this paper we introduce a new method for automatic text classification for categorizing E-documents by utilizing classification metadata of books, journals and other library holdings, that already exists in online catalogues of libraries. The method is based on identifying all references cited in a given document and, using the classification metadata of these references as catalogued in a physical library, devising an appropriate class for the document itself according to a standard library classification scheme with the help of a weighting mechanism. We have demonstrated the application of the proposed method and assessed its performance by developing a prototype classification system for classifying electronic syllabus documents archived in the Irish National Syllabus Repository according to the well-known Dewey Decimal Classification (DDC) scheme. Annotation Search: The FAST Way This paper discusses how annotations can be exploited to develop information access and retrieval algorithms that take them into account. The paper proposes a general framework for developing such algorithms that specifically deals with the problem of accessing and retrieving topical information from annotations and annotated documents. wikiSearch – From Access to UseElaine G. Toms, Lori McCay-Peet & R. Tayze Mackenzie A digital library (DL) facilitates a search workflow process. Yet many DLs hide much of the user activity involved in the process from the user. In this research we developed an interface, wikiSearch, to support that process. This interface flattened the typical multi-page implementation into a single layer that provided multiple memory aids. The interface was tested by 96 people who used the system in a laboratory to resolve multiple tasks. Assessment was through use, usability testing and closed and open perception questions. In general participants found that the interface enabled them to stay on track with their task providing a bird's eye view of the events – queries entered, pages viewed, and pertinent pages identified. An Empirical Study of User Navigation During Document Triage Document triage is the moment in the information seeking process when the user first decides the relevance of a document to their information need. This paper reports a study of user behaviour during document triage. The study reveals two main findings: first, that there is a small set of common navigational patterns; second, that certain document features strongly influence users' navigation. Improving OCR Accuracy for Classical Critical Editions This paper describes a work-flow designed to populate a digital library of ancient Greek critical editions with OCR scanned text that reflects a high accuracy score. The most recently available OCR engines, after suitable training, are now able to deal with polytonic Greek fonts used in 19th and 20th century editions, but further improvements can also be achieved with postprocessing. In particular, the progressive multiple alignment method applied to different OCR outputs based on the same images is discussed in this paper. A Visualization Technique for Quality Control
of Massive Digitization Programs Massive digitization programs need massive visualization techniques for quality control. We describe the functional prototype of a 3D interactive environment enabling a rapid inspection of pages conformity for large batches of digitized books. Digital Libraries, Personalisation, and Network Effects - Unpicking the Paradoxes Session 5: Special Session - Infrastructures Adding Quality-Awareness to Evaluate Migration
Web-Services and Remote Emulation for Digital
Preservation Digital libraries are increasingly relying on distributed services to support increasingly complex tasks such as retrieval or preservation. While there is a growing body of services for migrating digital objects into safer formats to ensure their long-term accessability, the quality of these services is often unknown. Moreover, emulation as the major alternative preservation strategy is often neglected due to the complex setup procedures that are necessary for testing emulation. However, thorough evaluation of the complete set of potential strategies in a quantified and repeatable way is considered of vital importance for trustworthy decision making in digital preservation planning. This paper presents a preservation action monitoring infrastructure that combines provider-side service instrumentation and quality measurement of migration web services with remote access to emulation. Tools are monitored during execution, and both their runtime characteristics and the quality of their results are measured transparently. We present the architecture of the presented framework and discuss results from experiments on migration and emulation services. Functional Adaptivity for Digital Library Services in
e-Infrastructures: the gCube Approach We consider the problem of e-Infrastructures that wish to reconcile the generality of their services with the bespoke requirements of diverse user communities. We motivate the requirement of functional adaptivity in the context of gCube, a service-based system that integrates Grid and Digital Library technologies to deploy, operate, and monitor Virtual Research Environments defined over infrastructural resources. We argue that adaptivity requires mapping service interfaces onto multiple implementations, truly alternative interpretations of the same functionality. We then analyse two design solutions in which the alternative implementations are, respectively, full-fledged services and local components of a single service. We associate the latter with lower development costs and increased binding flexibility, and outline a strategy to deploy them dynamically as the payload of service plugins. The result is an infrastructure in which services exhibit multiple behaviours, know how to select the most appropriate behaviour, and can seamlessly learn new behaviours. Managing the Knowledge Creation Process of
Large-Scale Evaluation Campaigns This paper discusses the evolution of large-scale evaluation campaigns and the corresponding evaluation infrastructures needed to carry them out. We present the next challenges for these initiatives and show how digital library systems can play a relevant role in supporting the research conducted in these fora by acting as virtual research environments. Session 6A: Resource Discovery Using Semantic Technologies in Digital Libraries - A Roadmap to Quality Evaluation In digital libraries semantic techniques are often deployed to reduce the expensive manual overhead for indexing documents and maintaining metadata, as well as later information searches. However, using such techniques may cause a decrease in a collection's quality due to their statistical nature. Since data quality is a major concern in digital libraries, it is important to be able to measure the (loss of) quality of metadata automatically generated by semantic techniques. In this paper we present a user study based on a typical semantic technique used for automatic metadata creation, namely taxonomies of author keywords and tag clouds. We observed experts assessing typical relations between keywords and documents over a small corpus in the field of chemistry. Based on the evaluation of this experiment we focused on communalities between the experts' perception and thus draw a first roadmap on how to evaluate semantic techniques by proposing some preliminary metrics. Supporting the Creation of Scholarly Bibliographies by Communities through Online Reputation based Social Collaboration Bibliographic digital libraries play a significant role in conducting research and, in the past few years, have started to move from closed to more open social platforms. However, in this, they have faced challenges (e.g., from Web spam) in maintaining the level of scholarly precision - the ratio of relevant citations retrieved by search. This paper describes a hybrid approach that uses online social collaboration and reputation based social moderation to reduce the cost and to speed up the construction of scholarly bibliographies that are comprehensive, have better quality citations and higher precision. We implemented selected social features for an established digital humanities project (the Cervantes Project) and compared the results with a number of closed and open current bibliographies. We found this can help in building scholarly bibliographies and significantly improve precision outcomes. Chance Encounters in the Digital Library While many digital libraries focus on supporting defined tasks that require targeted searching, there is potential for enabling serendipitous discovery that can serve multiple purposes from aiding with the targeted search to suggesting new approaches, methods and ideas. In this research we embedded a tool in a novel interface to suggest other pages to examine in order to assess how that tool might be used while doing focused searching. While only 40% of the participants used the tool, all assessed its usefulness or perceived usefulness. Most participants used it as a source of new terms and concepts to support their current tasks; a few noted the novelty and perceived its potential value in serving as a stimulant. DL Education in the EU and in the US: Where Are We?, Where Are We Going? Session 7A - Architectures Stress-testing General Purpose Digital Library
Software DSpace, Fedora, and Greenstone are three widely used open source digital library systems. In this paper we report on scalability tests performed on these tools by ourselves and others. These range from repositories populated with synthetically produced data to real world deployment with content measured in millions of items. A case study is presented that details how one of the systems performed when used to produce fully-searchable newspaper collections containing in excess of 20 GB of raw text (2 billion words, with 60 million unique terms), 50 GB of metadata, and 570 GB of images. Lessons learnt from the case study are articulated; most are also relevant to other DL software. The NESTOR Framework: How to Handle Hierarchical Data
Structures In this paper we study the problem of representing, managing and exchanging hierarchically structured data in the context of a Digital Library (DL). We present the NEsted SeTs for Object hieRarchies (NESTOR) framework defining two set data models that we call: the "Nested Set Model (NS-M)" and the "Inverse Nested Set Model (INS-M)" based on the organization of nested sets which enable the representation of hierarchical data structures. eSciDoc
Infrastructure: a Fedora-based e-Research Framework eSciDoc is the open-source e-Research framework jointly created by the German Max Planck Society and FIZ Karlsruhe. It consists of a generic set of basic services ("eSciDoc Infrastructure") and various applications built on top of this infrastructure ("eSciDoc Solutions"). This paper focuses on the eSciDoc Infrastructure, highlight the differences to the underlying Fedora repository, and demonstrate its powerful und application-centric programming model. Challenges for e-Research Infrastructures are presented and the approaches we took on the eSciDoc project to addresses them are discussed. Collaborative Ownership in Cross-Cultural Educational
Digital Library Design This paper details research into building a Collaborative Educational Resource Design (CERD) model by investigating two contrasting Kenyan / UK design case-studies with an evaluation of end-users and designers perceptions of these digital libraries and their usage patterns. The two case-studies compared are; case study 1 based on formal learning in an established African university digital library researched with students, lecturers and support staff over 2 months. Case study 2 is centered on informal learning in an ongoing rural community digital library system which has a collaborative design model that is being designed, developed and reviewed within the UK and Africa. A small scale in-depth evaluation was done with 21 participants primarily of the traditional digital library case-study but indirectly related to and with implications for the collaborative digital library case-study. In-depth user issues of: access / ownership, control and collaboration are detailed and reviewed in relation to design implications. Adams & Blandford's 'information journey' framework is used to evaluate high-level design effects on usage patterns. Digital library design support roles and cultural issues are discussed further. Session 7B - Information Retrieval A Hybrid Distributed Architecture
for Indexing This paper presents a hybrid scavenger grid as an underlying hardware architecture for search services within digital libraries. The hybrid scavenger grid consists of both dedicated servers and dynamic resources in the form of idle workstations to handle medium- to large-scale search engine workloads. The dedicated resources are expected to have reliable and predictable behaviour. The dynamic resources are used opportunistically without any guarantees of availability. Test results confirmed that indexing performance is directly related to the size of the hybrid grid and intranet networking does not play a major role. A system-efficiency and cost-effectiveness comparison of a grid and a multiprocessor machine showed that for workloads of modest to large sizes, the grid architecture delivers better throughput per unit cost than the multiprocessor, at a system efficiency that is comparable to that of the multiprocessor. A Concept for Using Combined
Multimodal Queries in Digital Music Libraries In this paper, we propose a concept for using combined multimodal queries in the context of digital music libraries. Whereas usual mechanisms for content-based music retrieval only consider a single query mode, such as query-by-humming, full-text lyrics-search or query-by-example using short audio snippets, our proposed concept allows to combine those different modalities into one integrated query. Our particular contributions consist of concepts for query formulation, combined content-based retrieval and presentation of a suitably ranked result list. The proposed concepts have been realized within the context of the PROBADO Music Repository and allow for music retrieval based on combining full-text lyrics search and score-based query-by-example search. A Compressed Self-Indexed Representation
of XML Documents This paper presents a structure we call XML Wavelet Tree (XWT) to represent any XML document in a compressed (using only about 35% of its original size) and self-indexed form. Therefore, any query or procedure that could be performed over the original document can be performed more efficiently over the XWT representation because it is shorter and has some indexing properties. In fact, XWT permits to answer XPath queries more efficiently than using the uncompressed version of the documents. XWT is also competitive when comparing it with inverted indexes over the XML document (if both structures use the same space). Superimposed Image Description and Retrieval for
Fish Species Identification Fish species identification is critical to the study of fish ecology and management of fisheries. Traditionally, dichotomous keys are used for fish identification. They ask questions about the observed specimen, and then based on the answer, ask more questions till the reader identifies the specimen. However, such keys can be rigid in their approach and often do not focus upon distinguishing characteristics favored by many field ecologists and more user-friendly field guides. This makes learning to identify fish difficult for Ichthyology students. Students usually supplement the use of the key with other methods such as making personal notes, drawings, annotated fish images, and more recently, fish information websites, such as Fishbase. Although these approaches provide useful additional content, it is dispersed across heterogeneous sources and can be tedious to access. Also, most of the existing electronic tools have limited support to manage user created content, especially that related to parts of images such as markings on drawings and images and associated notes. We present SuperIDR, a superimposed image description and retrieval tool, developed to address some of these issues. It allows users to associate parts of images with text annotations. Later, they can retrieve images, parts of images, annotations, and image descriptions through text- and content-based image retrieval. We evaluated SuperIDR in an undergraduate Ichthyology class as an aid to fish species identification and found that the use of SuperIDR yielded a higher likelihood of success in species identification than using traditional methods, including the dichotomous key, fish web sites, notes, etc. Curated Databases Most of our research and scholarship now relies on curated databases - traditional databases or ontologies that are created and updated with a great deal of human effort. Most reference works that one traditionally found on the reference shelves of libraries such as (dictionaries, encyclopedias, gazetteers, etc.) are now curated databases; and because it is now so easy to publish databases on the web, there has been an explosion in the number of new curated databases used in scientific research. The catalogue or metadata for a digital library is very likely to be a curated database. The value of curated databases lies in the organisation and the quality of the data they contain. Like the paper reference works they have replaced, they usually represent the efforts of a dedicated group of people to produce a definitive description of some subject area. Given their importance to our work it is surprising that so little attention has been given to the general problems of curated databases. How do we archive them? How do we cite them? And because much of the data in one curated database is often extracted from other databases, how do we understand the provenance of the data we find in the database? Curated databases raise challenging problems not only in computer science but also in intellectual property and the economics of publishing. I shall attempt to describe these. Session 9A - Preservation Significance is in the Eye of the Stakeholder Custodians of digital content take action when the material that they are responsible for are threatened by, for example, obsolescence or deterioration. At first glance, ideal preservation actions retain every aspect of the original objects with the highest level of fidelity. Achieving this goal, however, can be costly, infeasible, and sometimes even undesirable. As a result, custodians must focus their attention on preserving the most significant characteristics of the content, even at the cost of sacrificing less important ones. The concept of significant characteristics has become prominent within the digital preservation community to capture this key goal [9]. As is often the case in an emerging field, however, the term has become over-loaded and remains ill-defined. In this paper, we unpack the meaning that lies behind the phrase, analyze the domain, and introduce clear terminology. User Engagement in Research Data Curation In recent years information systems such as digital repositories, built to support research practice, have struggled to encourage participation partly due to inadequate analysis of the requirements of the user communities. This paper argues that engagement of users in research data curation through an understanding of their processes, constraints and culture is a key component in the development of the data repositories that will ultimately serve them. In order to maximize the effectiveness of such technologies curation activities need to start early in the research lifecycle and therefore strong links with researchers are necessary. Moreover, this paper promotes the adoption of a pragmatic approach with the result that the use of open data as a mechanism to engage researchers may not be appropriate for all disciplinary research environments. Just
One Bit in a Million: On the Effects of Data Corruption in Files So far less attention has been paid to file format robustness, i.e., a file formats capability to keeping its information as safe as possible in spite of data corruption. The paper on hand reports on the first comprehensive research on this topic. The research work is based on a study on the status quo of file format robustness for various file formats from the image domain. A controlled test corpus was built which comprise files with different format characteristics. The files are the basis for data corruption experiments which are reported and discussed. Formalising a Model for Digital Rights Clearance Due to the increasing complexity and world-wide distribution of digital objects, identification and enforcement of digital rights have become too complex to be carried out manually. It is necessary to take into account the case-specific applicable laws, the complete creation history of a work and the existing licenses. However, no formal generic model has been presented so far integrating these aspects. This paper presents an innovative domain ontology of the Intellectual Property Rights. It distinguishes four levels of abstraction or control: (1) the legal framework, (2) the individual rights people hold, (3) the individual usage agreements right holders and others may issue, and (4) the particular actions that are restricted by IPR regulations or bring particular rights into existence. The ontology has the potential to enable wide semantic interoperability of digital repositories for identifying existing rights on digital objects and tracing the impact of particular actions on rights and regulations. Session 9B - Evaluation Evaluation in Context All search happens in a particular context - such as the particular collection of a digital library, its associated search tasks, and its associated users. Information retrieval researchers usually agree on the importance of context, but they rarely address the issue. In particular, evaluation in the Cranfield tradition requires abstracting away from individual differences between users. This paper investigates if we can bring some of this context into the Cranfield paradigm. Our approach is the following: we will attempt to record the "context" of the humans already in the loop - the topic authors/assessors - by designing targeted questionnaires. The questionnaire data becomes part of the evaluation test-suite as valuable data on the context of the search requests. We have experimented with this questionnaire approach during the evaluation campaign of the INitiative for the Evaluation of XML Retrieval (INEX). The results of this case study demonstrate the viability of the questionnaire approach as a means to capture context in evaluation. This can help explain and control some of the user or topic variation in the test collection. Moreover, it allows to break down the set of topics in various meaningful categories, e.g. those that suit a particular task scenario, and zoom in on the relative performance for such a group of topics. Comparing Google to Ask-a-Librarian Service for Answering
Factual and Topical Questions How People Read Books Online: Mining and Visualizing Web Logs for Use
Information This paper explores how people read books online using the International Children's Digital Library (ICDL). We analyzed usage of the ICDL in an attempt to understand how people read books from websites. We propose a definition of reading a book (in contrast to others who visit the website), and report a number of observations about the use of the library in question. Usability Evaluation of a Multimedia Archive: B@beleRoberta Caccialupi, Licia Calvi, Maria Cassella, Georgia Conte In institutional repositories, simple discovery and submission interfaces help increase documents deposit as scholars have very little time to self-archive. So far, however, usability evaluation of such interfaces has been limited. In this paper, we present the usability evaluation of a repository interface, i.e., the interface of B@bele, the DSpace installation of the Multimedia Production Centre (CPM) of the University Milano-Bicocca. The results of this evaluation point out the most important shortcomings of the present DSpace interface: difficulties with browsing within communities and collections; problems with the submission interface due to scarcely familiar terminology (metadata) or terms that are not relevant in the specific academic context (community); problems in the submission process due to some ambiguous buttons, to the lack of authority files, and to the lack of clearly marked compulsory fields. In this way, this study will help improve not only B@bele, but also all other installations of DSpace currently available. --- top --- |
|||||||||||||||||||||||||||||
|
Best Viewed With |
|||||||||||||||||||||||||||||