Bayanathi Technology

Frequency analysis of Pyu corpus text

The simplest quantitative analysis of the Pyu corpus would be the frequency analysis of features. We will use the dfm created from the previous analysis.

library(quanteda)
library(quanteda.textstats)
library(quanteda.textplots)
library(RColorBrewer)

# first we change the docnames text1, text2 etc., to the Inscription number of the Pyu inscriptions  
docnames(pyu_dfm) <- x1_df.1$InscriptionNumber
print(pyu_dfm)

Document-feature matrix of: 196 documents, 1,893 features (99.01% sparse) and 0 docvars.
     features
docs  @|| ḅay·ṁḥ dak·ṃ viy·ṃṁ tim·ṁ mlik· °o saḥ tgaṃ knon·
  001   1      2     1      1     1     1  3   1    1     1
  002   0      0     0      0     0     0  0   0    0     0
  003   1      0     0      0     0     0  4   0    0     0
  004   1      0     0      0     0     0  3   0    0     0
  005   1      0     0      0     0     0  3   0    0     0
  006   1      0     0      0     0     0  3   0    0     0
[ reached max_ndoc ... 190 more documents, reached max_nfeat ... 1,883 more features ]

# get frequencies of features  
tstat_freq <- textstat_frequency(pyu_dfm) 

#  view 30 most frequent features
 library(kableExtra)
 tstat_freq[1:30, ] %>%
  kbl() %>%
    kable_styling(full_width = F, font_size = 10) %>%
    column_spec(1, width = "2") %>%
    row_spec(0, background = "lightgrey") %>%
    row_spec(1:30, background = "lightblue") %>%  
    kable_styling(bootstrap_options = "condensed")

feature	frequency	rank	docfreq	group
°o	320	1	48	all
tiṁ	190	2	32	all
ḅaṁḥ	93	3	27	all
ta	93	3	24	all
yaṁ	85	5	35	all
ḅiṁḥ	81	6	16	all
tin·ṁ	73	7	11	all
ḅay·ṁḥ	61	8	11	all
//	58	9	3	all
\|\|	55	10	35	all
tar·	55	10	10	all
///	53	12	18	all
ḅin·ṁḥ	47	13	10	all
ḅa	45	14	19	all
saḥ	44	15	19	all
gi	42	16	6	all
tim·ṁ	41	17	11	all
ma	39	18	13	all
pau	37	19	6	all
tdav·ṃḥ	34	20	10	all
ḅaḥ	33	21	11	all
kdaṅ·	33	21	12	all
dav·ṃḥ	29	23	8	all
mra	28	24	16	all
ḅiṁ	28	24	8	all
tir·ṁ	27	26	6	all
priṅ·ḥ	26	27	8	all
traḥ	25	28	3	all
pay·ṁḥ	25	28	9	all
/	24	30	2	all

 
# create wordcloud of features   
set.seed(132)
textplot_wordcloud(pyu_dfm, max_size = 14, max_words = 200, color = rev(RColorBrewer::brewer.pal(10, "Spectral")))

Saturday, October 4, 2025

No comments:

Post a Comment

Blog Archive