Text Analysis of the Iliad and the Odyssey
View the original report here.
Data analysis and visualization done in R Studio with R.
Report made with HTML and CSS.
The Most Used Words in the Iliad and the Odyssey and their Sentiments
The Iliad, sometimes referred to as the Song of Ilion or Song
of Ilium, is an ancient Greek epic poem traditionally attributed to
Homer. Set during the Trojan War, the ten-year siege of the city of Troy
(Ilium) by a coalition of Greek states, it tells of the battles and
events during the weeks of a quarrel between King Agamemnon and the
Although the story covers only a few weeks in the final year of the war, the Iliad mentions or alludes to many of the Greek legends about the siege; the earlier events, such as the gathering of warriors for the siege, the cause of the war, and related concerns tend to appear near the beginning. Then the epic narrative takes up events prophesied for the future, such as Achilles’ imminent death and the fall of Troy, although the narrative ends before these events take place. However, as these events are prefigured and alluded to more and more vividly, when it reaches an end the poem has told a more or less complete tale of the Trojan War. Summary from Wikipedia
The Odyssey is one of two major ancient Greek epic poems
attributed to Homer. It is, in part, a sequel to the Iliad. The Odyssey
is fundamental to the modern Western canon; it is the second-oldest
extant work of Western literature, while the Iliad is the oldest.
The poem mainly focuses on the Greek hero Odysseus (known as Ulysses in Roman myths), king of Ithaca, and his journey home after the fall of Troy. It takes Odysseus ten years to reach Ithaca after the ten-year Trojan War. In his absence, it is assumed Odysseus has died, and his wife Penelope and son Telemachus must deal with a group of unruly suitors, the Mnesteres or Proci, who compete for Penelope’s hand in marriage. Summary from Wikipedia
I hypothesized that, due to the plot of the Iliad largely
featuring the Trojan War, the most frequent words would largely be
associated with war, having to do with soliders, fighting, weapons, etc.
I expected the Odyssey’s most frequent words to, in contrast, be more associated with travel, seafaring, and homesickness, reflecting the plot’s focus on Odysseus’ long journey home.
Because the Iliad and Odyssey are Greek epic poems, I further hypothesized that a secondary subset of frequent words to be associated with Greek mythology.
In regards to the sentiment, I hypothesized that both poems would have an overall negative sentiment, as they deal with themes of war, battle, and turmoil.
These hypotheses were influenced by my experience translating the Aeneid and the Iliad from Latin to English, as well as reading the Odyssey during high school. I’ve also taken a number of college courses regarding art history, Greek art, and Roman art, mythology, and culture. My attempts to remember details from any of these individual experiences are inevitably muddled with details from the others.
To establish the most frequently used words in both texts, I chose
to exclude stop words and character or god names. These would show up
more often than other words for no other reason than how the English
language is structured, and stop words are innately uninteresting and
I chose to leave in the names of locations, such as “Troy”, and references to groups of people belonging to those locations, such as “Greeks”. Including these was important, in my opinion, to establish where Homer’s focus was and how it was distributed.
For the frequency graphs, I chose to only display the 40 most-used words in both poems simply because I found the size and density of graphs containing 50 or more words to be overwhelming and difficult to read. The word clouds I generated were a much easier-to-digest presentation of the same data with all words included, so I consider the frequency graphs to be an aid to the word clouds, which gives context and clearer values to the most frequent words, rather than a substitute.
I chose to use the AFINN sentiment lexicon because, although it has the fewest number of words which it can evaluate, I found that a simple positive/negative association was easier to read, and having the degree to which a word was positive or negative was better for comparison.
AFINN’s lack of words became apparent when I first tried to obtain values for the top 40 words in each poem. I knew that I would need a larger set of values to find meaningful trends and draw comparisons, but due to the nature of the poems, running the same number of most frequent words between the two gave a different number of words with sentiments per poem. I decided that it would be manipulating the data too much if I increased the total number of words analyzed in order to get an equal number of words in each sentiment graph.
With these things in mind, I tested total-word values at reasonable increments in order to get at least 10 words with sentiment scores per poem so that patterns could be found and conclusions could be drawn, and using the top 100 words in each poem achieved this goal.
Getting the Books and Pulling Out their Words
First, I downloaded the Plain Text UTF-8 versions of The Iliad and The Odyssey by Homer and saved them to the same directory as my RStudio project.
Next, I imported both files and converted them into data frames.
illiad <- readLines("illiad.txt") illiadDF <- data.frame(text = illiad) od <- readLines("odyssey.txt") odDF <- data.frame(text = od)
Next, the data frames were organized into individual sentences.
illiadDocumentLines <- unnest_tokens(illiadDF, input = text, output = line, token = "sentences", to_lower = F) illiadDocumentLines$lineNo <- seq_along(illiadDocumentLines$line) odDocLines <- unnest_tokens(odDF, input = text, output = line, token = "sentences", to_lower = F) odDocLines$lineNo <- seq_along(odDocLines$line)
Then to individual words.
illiadWords <- illiadDocumentLines %>% unnest_tokens(word, line, token = "words") odWords <- odDocLines %>% unnest_tokens(word, line, token = "words")
Filtering the Results
Next, I removed the stop words that are recognized by the gutenbergr package.
illiadWords2 <- illiadWords %>% anti_join(stop_words) %>% count(word, sort = TRUE) odWords2 <- odWords %>% anti_join(stop_words) %>% count(word, sort = TRUE)
After checking the most frequent remaining words, I had to manually exclude additional words — some were more old-fashioned versions of stop words not included in gutenbergr’s stop word filtering, some were names.
illiadWords2 %>% filter(!word %in% c("thy", "thou", "thee", "ye", "tis", "hector", "jove", "achilles", "juno","greece", "nestor")) -> illiadWordsFiltered odWords2 %>% filter(!word %in% c("thy", "thou", "thee", "ye", "tis", "ulysses", "telemachus", "jove", "till", "minerva", "penelope", "eumaeus", "achilles", "helen")) -> odWordsFiltered
At this point, the data was was ready to be put into word
clouds. I wanted for these clouds to take the shape of objects relevant
to the stories they’re meant to represent.
I downloaded simplified silhouettes of the Trojan horse and an ancient Greek ship and saved them to the same directory as my R project.
I then set these images as values so that they could then be used as a figPath for the wordcloud2 function.
figPath <- "horse.png" figPath2 <- "boat.png" horseWords <- wordcloud2(illiadWordsFiltered, size = 0.5, figPath=figPath) boatWords <- wordcloud2(odWordsFiltered, size = 0.5, figPath=figPath2)
Most Frequent Words and Their Sentiment Scores
First, I created a new dataset which only contains the top 40 most-used words for both poems.
Then I reordered these data sets so that they would be graphed in the correct order.
illiadWordsFiltered %>% head(40) -> illiadWordsTop40 odWordsFiltered %>% head(40) -> odWordsTop40 illiadWordsTop40 %>% mutate(word = reorder(word, n)) -> illiadWordsTop40Sorted odWordsTop40 %>% mutate(word = reorder(word, n)) -> odWordsTop40Sorted
This is the code I used to generate the frequency graphs:
ggplot(illiadWordsTop40Sorted, aes(word, n, fill = n)) + geom_col() + ggtitle("Frequency of the Top 40 Words in the Iliad") + theme(legend.position = "none", axis.title.x=element_blank(), axis.title.y=element_blank()) + coord_flip() -> illiadFrequencyGraph ggplot(odWordsTop40Sorted, aes(word, n, fill = n)) + geom_col() + ggtitle("Frequency of the Top 40 Words in the Odyssey") + theme(legend.position = "none", axis.title.x=element_blank(), axis.title.y=element_blank()) + coord_flip() -> odFrequencyGraph
Because of the need for a larger dataset to analyze the sentiment of
the poems, as is noted in the methodology section, I repeated the
process I used at the start of the frequency graphing process to create a
dataset with the top 100 words rather than the top 40.
Following this, I added the sentiments from the AFINN lexicon to my narrowed dataset.
illiadWordsFiltered %>% head(100) -> illiadWordsTop100 odWordsFiltered %>% head(100) -> odWordsTop100 illiadSentiment100 <- illiadWordsTop100 %>% inner_join(get_sentiments("afinn")) odSentiment100 <- odWordsTop100 %>% inner_join(get_sentiments("afinn"))
I also repeated the process of rearranging them in order of most-to-least frequent to ensure that the graph would be output in this order.
illiadSentiment100 %>% arrange(desc(n)) -> illiadSentiment100Arranged odSentiment100 %>% arrange(desc(n)) -> odSentiment100Arranged
This is the code I used to generate the sentiment graphs:
ggplot(illiadSentiment100Arranged, aes(x = reorder(word, n), score, fill = score)) + geom_col(show.legend = FALSE) + ggtitle("Sentiment of the top 100 Words in the Iliad") + theme(legend.position = "none", axis.title.x=element_blank(), axis.title.y=element_blank()) + coord_flip() ->illiadSentimentGraph100 ggplot(odSentiment100Arranged, aes(x = reorder(word, n), score, fill = score)) + geom_col(show.legend = FALSE) + ggtitle("Sentiment of the top 100 Words in the Odyssey") + theme(legend.position = "none", axis.title.x=element_blank(), axis.title.y=element_blank()) + coord_flip() ->odSentimentGraph100
Only 20 of the 100 most frequent words in the Iliad had sentiments.
Only 12 of the 100 most frequent words in the Odyssey had sentiments.
My hypotheses in regards to the most common words in both works were proven correct.
With words like “arms”, “war”, and “fight” among the top 10 words, in addition to many other violence-based words in the frequency graph, the Iliad lived up to my expectations of war-based rhetoric being the most used.
The Odyssey’s most frequent words also matched my expectations of matching themes of travel, seafaring, and homesickness. The words “house”, “son”, “home”, “ship”, “father”, and “sea” all work in these contexts, with the only exception among the words of their frequencies being “suitors”, which is important to the plot but not necessarily reflecting these themes.
The secondary themes of both texts also proved to match with my hypothesis of having to do with Greek mythology. The words “god”, “gods”, and “heaven” appeared among the top 40 words of both poems, and the word “fate” was used heavily in the Iliad.
My hypothesis on the sentiment of the poems was only partially correct. While the Odyssey matched my expectations of having a sentiment which is more negative than positive, the Iliad proved to be more positive in tone.