Class Roster
Click
here to view the Class Roster.
Please check and let me know if your name is not
in this list.
Prerequisites
CMPS 460 or consent of the
instructor.
Some background on WWW protocols for
database access from Web browsers is assumed.
Course Outline
Modern retrieval systems
that operate on text databases can provide interactive, user-customizable
techniques for retrieval. In contrast, many tools available for accessing text
databases on the Web use techniques that are quite primitive. Thus, there is a
need to make state-of-the-art search algorithms available over the Internet. In
this context, we will explore intelligent information retrieval techniques and
protocols associated with the implementation of Web-browser based interfaces to
document database servers. It is also important to extend the search algorithms
to heterogeneous (multimedia) data. This aspect requires the development of
appropriate indexing schemes in order that the search algorithms applicable to
text databases can be extended to other types (e.g., pictures, video, sound) of data. The course will consider research issues in
this context. We will also look at some aspects of database and text
mining.
References
- * Salton, "Automatic Text Processing", Addison-Wesley,
1989.
- Salton and McGill, "Introduction to Modern
Information Retrieval", McGraw Hill, 1983.
- C. J. van Rijsbergen,
"Information Retrieval", Second Edition, Butterworths, 1979.
http://www.dcs.gla.ac.uk/Keith/Preface.html
- * R. Baeza-Yates and B.
Ribiero-Neto, "Modern InformationRetrieval", Textbook Paperback, May 1999,
ISBN:
020139829X.
- * Mark T. Maybury, "Intelligent Multimedia
Information Retrieval", MIT
Press, 1997.
- Alistair Moffat, Justin Zobel and
David Hawking, "Recommended
Reading for IR Research Students",SIGIR Forum 39, 2005.
Note: Lots of relevant materials
can be obtained from the Internet. Also, visit my Web home (www.cacs.louisiana.edu/~raghavan)
and click on ''Some URLs of Interest to my Students'' and other links that
interest you.
Note: The starred
books are available for overnight use from the Reserve Section of
the Dupre Library. Lingras' book(Ch.2) is also available from the
Reserve Section.
Grading Policy
*Typically, a term project
involves the design and implementation of search and indexing algorithms or
interface requirements or other system components.
Policies on Cheating
Cheating: It should
be strictly noted that any sort of cheating will NOT be tolerated.
All work you submitted must be entirely your own. If any student is found
cheating in an assigment (either programming or
non-programming), he/she will be given a 0 for that assignment. This
includes both the person showing their work and the person involved in copying.
If any student is found cheating in a test, he/she will be given either a grade
of 'C' or 'F' or in some cases will also be brought to the attention of Dean
(Again includes both the person showing their work and the person involved in
copying).
Notes
Assignments
A note of assignment submission
- All non-programming assignments should be written legibly
(Please check Policies on
Cheating).
- Before submission a photo-copy of the assignment should be
made (for reference).
- Only the original should be submitted.
- Retain the photocopy. DO NOT submit it.
- Please staple the question paper on top of the answer
sheet.
- Answer sheets that are not stapled properly will not be
graded.
- All assignments should be done individually unless
otherwise stated.
- Academic dishonesty will be prosecuted in accordance with
the rules and regulations specified by the university.
- All answer sheets should be numbered.
- While answering questions please begin answering
individual questions on separate pages.
- Please provide an index, stating each question number and
the corresponding page number where its answer can be found.
Projects
Format
for a preliminary report (To be submitted just before the project is
started) Format
for a final report (To be submitted at the end of the project)
Additional Links for projects:
- CGIC libraries can be found at http://www.boutell.com/cgic/
- Additional resource for CGI http://cgi.resourceindex.com/
- A sample shell script that calls a java class file javafromshell.html
- Here is an example of how a CGI script is called within a
Java Applet appletcgi.html
- An example PERL script illustrating how to connect to a
database onthe local machine dbconnection.html
- A PERL
example from Xiaoyang He for remotely connecting to the Oracle database
onCypress. The key is to set the environment
variables and usethe right port. For other languages
it should work in a similarfashion.
- The previous code fraction modified to work on UCS. The sample scriptcreates a table called "employee" and lists
all tables.
- Intro to JDBC http://developer.java.sun.com/developer/onlineTraining/Database/JDBCShortCourse/contents.html
- Intro to Servlet usage with Tomcat and Jrun servlet.htm
- A tutorial on Java Servlets and Java Server Pages (JSP) http://www.apl.jhu.edu/~hall/java/Servlet-Tutorial/
- A simple example
of Oracleconnectivity using Java. In order to run the example, you need
toinclude this following line to your .cshrc:
/ora01/login_901
- A useful link for learning how to build web pages usingdifferent
programming and scripting languages http://www.webdeveloper.com/library/doit.html
- Include the following code to kill programs stuck ininfinite loops:CGIC,CGI
- Perl
- First line to be included in CGI - Perl programs:
For UCS : #!/opt/usl/bin/perl For cypress/cacs : #!/usr/local/bin/perl
- CGI Programming 101
- Chapter
1 from Salton's Book (scan version pdf): Page #
1,
2,
3,
4,
5,
6,
7,
8,
9.
- What do you say
after you say, 'I work in IR'? (pdf)
- Information
Retrieval on the World-wide Web (pdf)
- A General
Mathematical Model For Information Retrieval Systems(scan version pdf)
- Boolean
Retrieval Model (scan version pdf)
- Fuzzy Set
Theory to Document Retrieval (scan version pdf)
- A Critical
Analysis of Vector Space Model for Information Retrival(scan version pdf)
- On Modeling of
Information Retrieval Concept in Vector Spaces (pdf)
- RUBRIC:
A Rule System for information Retrieval (pdf)
- A Critical
Investigation of Recall and Precision as Measures of RetrievalSystem
Performance (pdf)
- Linear
Structure for Information Retrieval (scan version pdf)
- The Shape of the
Web and Its Implications for Searching the Web (pdf)
- Meta
Search (1)
- Meta
Search (2)
- Enhancing
Internet Search Engines to Achieve Concept-based Retrieval
- Content
and Link Structure Analysis for Searching the Web
- Recovering
from Disasters.
- Crawling the hidden
Web
- Personalized
Search
- Pattern Recognition: Statistical, Structural, and Neural Approaches: Page
# 1,
2,
3,
4,
5,
6
- Text
Retrieval Quality: A Primer
Information Retrieval Systems
- MG: public domain indexing and
retrieval system (source code) for text, images, and textual images, from book
"Managing Gigabytes"
- Prise:
a indexing and search engines (PRISE) developed by NIST
- Lemur: an information retrieval
system supporting "language model"
- SMART: an well-known IR
test bed (source code) by Prof. Salton
- Bow: a useful library
of source code useful for writing information retrieval programs
These are previous year term/final exam papers
Last updated: Aug 27th, 2007
|