CMPS 150/151 HOMEWORK ASSIGNMENT #7 Date Assigned: Wednesday, November 5, 1997 Due Date: Wednesday, November 19, 1997 Due Time: 10:00pm CMPS 150: -------- Program file name to be submitted: hmwk7.cc In addition to the standard initial documentation, your program file should include the Certification of Authenticity described in hmwk4. The availability of computers with text manipulation capabilities has resulted in some rather interesting approaches to analyzing the writings of great authors. Much attention has been focused on whether William Shakespeare ever lived. Some scholars believe there is substantial evidence indicating that Christopher Marlowe actually penned the masterpieces attributed to Shakespeare. Researchers have used computers to find similarities in the writings of these two authors, as well as other authors. Your assignment is to write a program that reads several lines of text and outputs the number of occurrences of each letter of the alphabet in the text. For example, the phrase "To be, or not to be: that is the question:" contains one 'a', two 'b''s, no 'c''s, ..., six 't''s, etc. (Hint: You may use the function tolower(ch) included in which returns the lowercase equivalent of ch, if ch is an uppercase letter, or ch otherwise.) Additionally, your program should calculate the relative frequency of each letter. The relative frequency of a letter is calculated by dividing the number of occurrences of this letter in the text by the total number of letters in the text. You should modularize your program by introducing several functions as follows: + a function to be used in initializing the variable objects used in your program. + a function used to calculate the number of letter occurrences. + a function to calculate the relative frequencies. + a function to output the results in a readable format. You should not use global variables. 3. Compile, and test your program using your own test data. 4. Once you are satisfied that your program works, create a hmwk7.test file using the command: script hmwk7.test Run your program against the files Shakespeare:-Hamlet.txt Shakespeare:-King-Lear.txt Shakespeare:-Macbeth.txt Shakespeare:-Othello.txt Shakespeare:-Romeo-and-Juliet.txt Then run your program against the files Mark-Twain:-A-Tramp-Abroad.txt Mark-Twain:-The-Tragedy-of-Puddnhead-Wilson.txt Mark-Twain:-Tom-Sawyer-Abroad.txt Mark-Twain:-Tom-Sawyer-Detective.txt These files will appear on the class webpage a few days before the due date. 5. As a final preparation of the hmwk7.test file, edit it. At the bottom of the file, discuss your conclusions regarding your program's potential in author attribution. In other words, is it possible to tell the difference between works written by two different authors simply based on your program's output? Discuss how this program could be improved. 6. Submit the two files (hmwk7.cc and hmwk7.test) by the due date and time by changing to the working directory in which the files reside and issuing the appropriate submit command (based on your section). CMPS 151 only: ------------- Instead of the above functionality, your program should output a table indicating the relative frequency of one-letter words, two-letter words, three-letter words, etc. appearing in the text. For example, the phrase "Whether 'tis nobler in the mind to suffer" contains zero one-letter words, two two-letter words, two three-letter words, two four-letter words (including "'tis"), zero five-letter words, two six-letter words, and one seven-letter word. You should consider all word sizes up to ten. Any words whose size is greater than ten should still contribute to the calculation of relative frequencies of other word sizes. A word may be delimited by whitespace, or any of the following characters: comma, period, exclamation mark, semicolon, and colon. You should modularize your program by introducing several functions as follows: + a function to be used in initializing the variable objects used in your program. + a function to count words; this function should use two help functions: + one to skip word delimiters (one or more consecutive word-delimiting characters), and + another one to read a word from the input -- actually since you are only interested in knowing what the size of the word is (as opposed to what the actual word is), a good name for this help function is GetSizeOfNextWord). + a function to output the results in a readable format.