Text Analysis of the Iliad and the Odyssey
View the original report here.
Data analysis and visualization done in R Studio with R.
Report made with HTML and CSS.
The Most Used Words in the Iliad and the Odyssey and their Sentiments
The Iliad
The Iliad, sometimes referred to as the Song of Ilion or Song
of Ilium, is an ancient Greek epic poem traditionally attributed to
Homer. Set during the Trojan War, the ten-year siege of the city of Troy
(Ilium) by a coalition of Greek states, it tells of the battles and
events during the weeks of a quarrel between King Agamemnon and the
warrior Achilles.
Although the story covers only a few weeks in the final year
of the war, the Iliad mentions or alludes to many of the Greek legends
about the siege; the earlier events, such as the gathering of warriors
for the siege, the cause of the war, and related concerns tend to appear
near the beginning. Then the epic narrative takes up events prophesied
for the future, such as Achilles’ imminent death and the fall of Troy,
although the narrative ends before these events take place. However, as
these events are prefigured and alluded to more and more vividly, when
it reaches an end the poem has told a more or less complete tale of the
Trojan War.
Summary from Wikipedia
Image found from the Museum of Fine Arts Boston
The Odyssey
The Odyssey is one of two major ancient Greek epic poems
attributed to Homer. It is, in part, a sequel to the Iliad. The Odyssey
is fundamental to the modern Western canon; it is the second-oldest
extant work of Western literature, while the Iliad is the oldest.
The poem mainly focuses on the Greek hero Odysseus (known as
Ulysses in Roman myths), king of Ithaca, and his journey home after the
fall of Troy. It takes Odysseus ten years to reach Ithaca after the
ten-year Trojan War. In his absence, it is assumed Odysseus has died,
and his wife Penelope and son Telemachus must deal with a group of
unruly suitors, the Mnesteres or Proci, who compete for Penelope’s hand
in marriage.
Summary from Wikipedia
My Hypothesis
I hypothesized that, due to the plot of the Iliad largely
featuring the Trojan War, the most frequent words would largely be
associated with war, having to do with soliders, fighting, weapons, etc.
I expected the Odyssey’s most frequent words to, in contrast, be
more associated with travel, seafaring, and homesickness, reflecting
the plot’s focus on Odysseus’ long journey home.
Because the Iliad and Odyssey are Greek epic poems, I further
hypothesized that a secondary subset of frequent words to be associated
with Greek mythology.
In regards to the sentiment, I hypothesized that both poems
would have an overall negative sentiment, as they deal with themes of
war, battle, and turmoil.
Disclaimer:
These hypotheses were influenced by my experience
translating the Aeneid and the Iliad from Latin to English, as well as
reading the Odyssey during high school. I’ve also taken a number of
college courses regarding art history, Greek art, and Roman art,
mythology, and culture. My attempts to remember details from any of
these individual experiences are inevitably muddled with details from
the others.
Methodology
To establish the most frequently used words in both texts, I chose
to exclude stop words and character or god names. These would show up
more often than other words for no other reason than how the English
language is structured, and stop words are innately uninteresting and
meaningless.
I chose to leave in the names of locations, such as “Troy”, and
references to groups of people belonging to those locations, such as
“Greeks”. Including these was important, in my opinion, to establish
where Homer’s focus was and how it was distributed.
For the frequency graphs, I chose to only display the 40 most-used
words in both poems simply because I found the size and density of
graphs containing 50 or more words to be overwhelming and difficult to
read. The word clouds I generated were a much easier-to-digest
presentation of the same data with all words included, so I consider the
frequency graphs to be an aid to the word clouds, which gives context
and clearer values to the most frequent words, rather than a substitute.
I chose to use the AFINN sentiment lexicon because, although it
has the fewest number of words which it can evaluate, I found that a
simple positive/negative association was easier to read, and having the
degree to which a word was positive or negative was better for
comparison.
AFINN’s lack of words became apparent when I first tried to obtain
values for the top 40 words in each poem. I knew that I would need a
larger set of values to find meaningful trends and draw comparisons, but
due to the nature of the poems, running the same number of most
frequent words between the two gave a different number of words with
sentiments per poem. I decided that it would be manipulating the data
too much if I increased the total number of words analyzed in order to
get an equal number of words in each sentiment graph.
With these things in mind, I tested total-word values at
reasonable increments in order to get at least 10 words with sentiment
scores per poem so that patterns could be found and conclusions could be
drawn, and using the top 100 words in each poem achieved this goal.
Process
Getting the Books and Pulling Out their Words
First, I downloaded the Plain Text UTF-8 versions of The Iliad and The Odyssey by Homer and saved them to the same directory as my RStudio project.
Next, I imported both files and converted them into data frames.
illiad <- readLines("illiad.txt")
illiadDF <- data.frame(text = illiad)
od <- readLines("odyssey.txt")
odDF <- data.frame(text = od)
Next, the data frames were organized into individual sentences.
illiadDocumentLines <- unnest_tokens(illiadDF, input = text, output = line, token = "sentences", to_lower = F)
illiadDocumentLines$lineNo <- seq_along(illiadDocumentLines$line)
odDocLines <- unnest_tokens(odDF, input = text, output = line, token = "sentences", to_lower = F)
odDocLines$lineNo <- seq_along(odDocLines$line)
Then to individual words.
illiadWords <- illiadDocumentLines %>% unnest_tokens(word, line, token = "words")
odWords <- odDocLines %>% unnest_tokens(word, line, token = "words")
Filtering the Results
Next, I removed the stop words that are recognized by the gutenbergr package.
illiadWords2 <- illiadWords %>% anti_join(stop_words) %>% count(word, sort = TRUE)
odWords2 <- odWords %>% anti_join(stop_words) %>% count(word, sort = TRUE)
Exemptions
After checking the most frequent remaining words, I had to manually exclude additional words — some were more old-fashioned versions of stop words not included in gutenbergr’s stop word filtering, some were names.
illiadWords2 %>% filter(!word %in% c("thy", "thou", "thee", "ye", "tis", "hector", "jove", "achilles", "juno","greece", "nestor")) -> illiadWordsFiltered
odWords2 %>% filter(!word %in% c("thy", "thou", "thee", "ye", "tis", "ulysses", "telemachus", "jove", "till", "minerva", "penelope", "eumaeus", "achilles", "helen")) -> odWordsFiltered
Word Clouds
At this point, the data was was ready to be put into word
clouds. I wanted for these clouds to take the shape of objects relevant
to the stories they’re meant to represent.
I downloaded simplified silhouettes of the Trojan horse and an ancient Greek ship and saved them to the same directory as my R project.
I then set these images as values so that they could then be used as a figPath for the wordcloud2 function.
figPath <- "horse.png"
figPath2 <- "boat.png"
horseWords <- wordcloud2(illiadWordsFiltered, size = 0.5, figPath=figPath)
boatWords <- wordcloud2(odWordsFiltered, size = 0.5, figPath=figPath2)
Most Frequent Words and Their Sentiment Scores
Frequency
First, I created a new dataset which only contains the top 40 most-used words for both poems.
Then I reordered these data sets so that they would be graphed in the correct order.
illiadWordsFiltered %>% head(40) -> illiadWordsTop40
odWordsFiltered %>% head(40) -> odWordsTop40
illiadWordsTop40 %>% mutate(word = reorder(word, n)) -> illiadWordsTop40Sorted
odWordsTop40 %>% mutate(word = reorder(word, n)) -> odWordsTop40Sorted
This is the code I used to generate the frequency graphs:
ggplot(illiadWordsTop40Sorted, aes(word, n, fill = n)) +
geom_col() +
ggtitle("Frequency of the Top 40 Words in the Iliad") +
theme(legend.position = "none", axis.title.x=element_blank(), axis.title.y=element_blank()) +
coord_flip() -> illiadFrequencyGraph
ggplot(odWordsTop40Sorted, aes(word, n, fill = n)) +
geom_col() +
ggtitle("Frequency of the Top 40 Words in the Odyssey") +
theme(legend.position = "none", axis.title.x=element_blank(), axis.title.y=element_blank()) +
coord_flip() -> odFrequencyGraph
Sentiment
Because of the need for a larger dataset to analyze the sentiment of
the poems, as is noted in the methodology section, I repeated the
process I used at the start of the frequency graphing process to create a
dataset with the top 100 words rather than the top 40.
Following this, I added the sentiments from the AFINN lexicon to my narrowed dataset.
illiadWordsFiltered %>% head(100) -> illiadWordsTop100
odWordsFiltered %>% head(100) -> odWordsTop100
illiadSentiment100 <- illiadWordsTop100 %>% inner_join(get_sentiments("afinn"))
odSentiment100 <- odWordsTop100 %>% inner_join(get_sentiments("afinn"))
I also repeated the process of rearranging them in order of most-to-least frequent to ensure that the graph would be output in this order.
illiadSentiment100 %>% arrange(desc(n)) -> illiadSentiment100Arranged
odSentiment100 %>% arrange(desc(n)) -> odSentiment100Arranged
This is the code I used to generate the sentiment graphs:
ggplot(illiadSentiment100Arranged, aes(x = reorder(word, n), score, fill = score)) +
geom_col(show.legend = FALSE) +
ggtitle("Sentiment of the top 100 Words in the Iliad") +
theme(legend.position = "none", axis.title.x=element_blank(), axis.title.y=element_blank()) +
coord_flip() ->illiadSentimentGraph100
ggplot(odSentiment100Arranged, aes(x = reorder(word, n), score, fill = score)) +
geom_col(show.legend = FALSE) +
ggtitle("Sentiment of the top 100 Words in the Odyssey") +
theme(legend.position = "none", axis.title.x=element_blank(), axis.title.y=element_blank()) +
coord_flip() ->odSentimentGraph100
Only 20 of the 100 most frequent words in the Iliad had sentiments.
Only 12 of the 100 most frequent words in the Odyssey had sentiments.
Conclusion
My hypotheses in regards to the most common words in both works were proven correct.
With words like “arms”, “war”, and “fight” among the top 10
words, in addition to many other violence-based words in the frequency
graph, the Iliad lived up to my expectations of war-based rhetoric being
the most used.
The Odyssey’s most frequent words also matched my expectations
of matching themes of travel, seafaring, and homesickness. The words
“house”, “son”, “home”, “ship”, “father”, and “sea” all work in these
contexts, with the only exception among the words of their frequencies
being “suitors”, which is important to the plot but not necessarily
reflecting these themes.
The secondary themes of both texts also proved to match with
my hypothesis of having to do with Greek mythology. The words “god”,
“gods”, and “heaven” appeared among the top 40 words of both poems, and
the word “fate” was used heavily in the Iliad.
My hypothesis on the sentiment of the poems was only partially
correct. While the Odyssey matched my expectations of having a
sentiment which is more negative than positive, the Iliad proved to be
more positive in tone.