Frequency analysis of Pyu corpus text
The simplest quantitative analysis of the Pyu corpus would be the frequency analysis of features. We will use the dfm created from the previous analysis.
library(quanteda)
library(quanteda.textstats)
library(quanteda.textplots)
library(RColorBrewer)
# first we change the docnames text1, text2 etc., to the Inscription number of the Pyu inscriptions
docnames(pyu_dfm) <- x1_df.1$InscriptionNumber
print(pyu_dfm)
Document-feature matrix of: 196 documents, 1,893 features (99.01% sparse) and 0 docvars.
features
docs @|| ḅay·ṁḥ dak·ṃ viy·ṃṁ tim·ṁ mlik· °o saḥ tgaṃ knon·
001 1 2 1 1 1 1 3 1 1 1
002 0 0 0 0 0 0 0 0 0 0
003 1 0 0 0 0 0 4 0 0 0
004 1 0 0 0 0 0 3 0 0 0
005 1 0 0 0 0 0 3 0 0 0
006 1 0 0 0 0 0 3 0 0 0
[ reached max_ndoc ... 190 more documents, reached max_nfeat ... 1,883 more features ]
# get frequencies of features
tstat_freq <- textstat_frequency(pyu_dfm)
# view 30 most frequent features
library(kableExtra)
tstat_freq[1:30, ] %>%
kbl() %>%
kable_styling(full_width = F, font_size = 10) %>%
column_spec(1, width = "2") %>%
row_spec(0, background = "lightgrey") %>%
row_spec(1:30, background = "lightblue") %>%
kable_styling(bootstrap_options = "condensed")
feature | frequency | rank | docfreq | group |
---|---|---|---|---|
°o | 320 | 1 | 48 | all |
tiṁ | 190 | 2 | 32 | all |
ḅaṁḥ | 93 | 3 | 27 | all |
ta | 93 | 3 | 24 | all |
yaṁ | 85 | 5 | 35 | all |
ḅiṁḥ | 81 | 6 | 16 | all |
tin·ṁ | 73 | 7 | 11 | all |
ḅay·ṁḥ | 61 | 8 | 11 | all |
// | 58 | 9 | 3 | all |
|| | 55 | 10 | 35 | all |
tar· | 55 | 10 | 10 | all |
/// | 53 | 12 | 18 | all |
ḅin·ṁḥ | 47 | 13 | 10 | all |
ḅa | 45 | 14 | 19 | all |
saḥ | 44 | 15 | 19 | all |
gi | 42 | 16 | 6 | all |
tim·ṁ | 41 | 17 | 11 | all |
ma | 39 | 18 | 13 | all |
pau | 37 | 19 | 6 | all |
tdav·ṃḥ | 34 | 20 | 10 | all |
ḅaḥ | 33 | 21 | 11 | all |
kdaṅ· | 33 | 21 | 12 | all |
dav·ṃḥ | 29 | 23 | 8 | all |
mra | 28 | 24 | 16 | all |
ḅiṁ | 28 | 24 | 8 | all |
tir·ṁ | 27 | 26 | 6 | all |
priṅ·ḥ | 26 | 27 | 8 | all |
traḥ | 25 | 28 | 3 | all |
pay·ṁḥ | 25 | 28 | 9 | all |
/ | 24 | 30 | 2 | all |
# create wordcloud of features
set.seed(132)
textplot_wordcloud(pyu_dfm, max_size = 14, max_words = 200, color = rev(RColorBrewer::brewer.pal(10, "Spectral")))
No comments:
Post a Comment