Information Storage and Retrieval
CMPS 561/EECE 561 (Fall 2007)

 

Please Click here for Important Announcements

 

Instructor : Dr. Vijay Raghavan

Office: ACTR 305
Office Hours: Tue 1:30 - 2:30PM,   Thu 2:00 - 3:00PM

Phone: (337) 482-6603
E-mail: raghavan@cacs.louisiana.edu

TA : Shixian Chu

Room:  ACTR 327
Office Hours : MW 10:00PM - 12:00PM

E-mail: shixianchu@gmail.com

Time :  MW, 3:00PM - 4:15 PM
Place :   ACTR 117


Page Content


Class Roster

Click here to view the Class Roster.

Please check and let me know if your name is not in this list.




Prerequisites

CMPS 460 or consent of the instructor.

Some background on WWW protocols for database access from Web browsers is assumed.




Course Outline

Modern retrieval systems that operate on text databases can provide interactive, user-customizable techniques for retrieval. In contrast, many tools available for accessing text databases on the Web use techniques that are quite primitive. Thus, there is a need to make state-of-the-art search algorithms available over the Internet. In this context, we will explore intelligent information retrieval techniques and protocols associated with the implementation of Web-browser based interfaces to document database servers. It is also important to extend the search algorithms to heterogeneous (multimedia) data. This aspect requires the development of appropriate indexing schemes in order that the search algorithms applicable to text databases can be extended to other types (e.g., pictures, video, sound) of data. The course will consider research issues in this context. We will also look at some aspects of database and text mining.




References

  • * Salton, "Automatic Text Processing", Addison-Wesley, 1989.
  •    Salton and McGill, "Introduction to Modern Information Retrieval", McGraw Hill, 1983.
  •    C. J. van Rijsbergen, "Information Retrieval", Second Edition, Butterworths, 1979.
    http://www.dcs.gla.ac.uk/Keith/Preface.html
  • * R. Baeza-Yates and B. Ribiero-Neto, "Modern InformationRetrieval", Textbook Paperback, May 1999,   ISBN: 020139829X.
  • * Mark T. Maybury, "Intelligent Multimedia Information Retrieval", MIT
    Press, 1997.
  •    Alistair Moffat, Justin Zobel and David Hawking, "Recommended Reading for IR Research Students",SIGIR Forum 39, 2005.

Note: Lots of relevant materials can be obtained from the Internet. Also, visit my Web home (www.cacs.louisiana.edu/~raghavan) and click on ''Some URLs of Interest to my Students'' and other links that interest you.

Note: The starred books are available for overnight use from the Reserve Section of the Dupre Library. Lingras' book(Ch.2) is also available from the Reserve Section.




Grading Policy

    • Term Project- 30- 40%
    • Homework Assignments- 25-30%
    • Term Test 10-15%
    • Final Examination- 20-30%

      *Typically, a term project involves the design and implementation of search and indexing algorithms or interface requirements or other system components.



      Policies on Cheating

      Cheating: It should be strictly noted that any sort of cheating will NOT be tolerated. All work you submitted must be entirely your own. If any student is found cheating in an assigment (either programming or non-programming), he/she will be given a 0 for that assignment. This includes both the person showing their work and the person involved in copying. If any student is found cheating in a test, he/she will be given either a grade of 'C' or 'F' or in some cases will also be brought to the attention of Dean (Again includes both the person showing their work and the person involved in copying).




      Notes


      Assignments

      A note of assignment submission

      1. All non-programming assignments should be written legibly (Please check Policies on Cheating).
      2. Before submission a photo-copy of the assignment should be made (for reference).
      3. Only the original should be submitted.
      4. Retain the photocopy. DO NOT submit it.
      5. Please staple the question paper on top of the answer sheet.
      6. Answer sheets that are not stapled properly will not be graded.
      7. All assignments should be done individually unless otherwise stated.
      8. Academic dishonesty will be prosecuted in accordance with the rules and regulations specified by the university.
      9. All answer sheets should be numbered.
      10. While answering questions please begin answering individual questions on separate pages.
      11. Please provide an index, stating each question number and the corresponding page number where its answer can be found.

      Projects

      Format for a preliminary report (To be submitted just before the project is started)
      Format for a final report (To be submitted at the end of the project)


      Additional Links for projects:

      1. CGIC libraries can be found at http://www.boutell.com/cgic/
      2. Additional resource for CGI http://cgi.resourceindex.com/
      3. A sample shell script that calls a java class file javafromshell.html
      4. Here is an example of how a CGI script is called within a Java Applet appletcgi.html
      5. An example PERL script illustrating how to connect to a database onthe local machine dbconnection.html
      6. A PERL example from Xiaoyang He for remotely connecting to the Oracle database onCypress. The key is to set the environment variables and usethe right port. For other languages it should work in a similarfashion.
      7. The previous code fraction modified to work on UCS. The sample scriptcreates a table called "employee" and lists all tables.
      8. Intro to JDBC http://developer.java.sun.com/developer/onlineTraining/Database/JDBCShortCourse/contents.html
      9. Intro to Servlet usage with Tomcat and Jrun servlet.htm
      10. A tutorial on Java Servlets and Java Server Pages (JSP) http://www.apl.jhu.edu/~hall/java/Servlet-Tutorial/
      11. A simple example of Oracleconnectivity using Java. In order to run the example, you need toinclude this following line to your .cshrc:
        /ora01/login_901
      12. A useful link for learning how to build web pages usingdifferent programming and scripting languages http://www.webdeveloper.com/library/doit.html
      13. Include the following code to kill programs stuck ininfinite loops:CGIC,CGI - Perl
      14. First line to be included in CGI - Perl programs:

        For UCS :
        #!/opt/usl/bin/perl 
        For cypress/cacs :
        #!/usr/local/bin/perl
      15. CGI Programming 101

      Useful Reading Material




      Information Retrieval Systems

      • MG: public domain indexing and retrieval system (source code) for text, images, and textual images, from book "Managing Gigabytes"
      • Prise: a indexing and search engines (PRISE) developed by NIST
      • Lemur: an information retrieval system supporting "language model"
      • SMART: an well-known IR test bed (source code) by Prof. Salton
      • Bow: a useful library of source code useful for writing information retrieval programs



      Term & Final Exam

      These are previous year term/final exam papers




      Last updated: Aug 27th, 2007