Package 'malaytextr' reference manual

Title:	Text Mining for Bahasa Malaysia
Description:	It is designed to work with text written in Bahasa Malaysia. We provide functions and data sets that will make working with Bahasa Malaysia text much easier. For word stemming in particular, we will look up the Malay words in a dictionary and then proceed to remove "extra suffix" as explained in Khan, Rehman Ullah, Fitri Suraya Mohamad, Muh Inam UlHaq, Shahren Ahmad Zadi Adruce, Philip Nuli Anding, Sajjad Nawaz Khan, and Abdulrazak Yahya Saleh Al-Hababi (2017) <https://ijrest.net/vol-4-issue-12.html> . This package includes a dictionary of Malay words that may be used to perform word stemming, a dataset of Malay stop words, a dataset of sentiment words and a dataset of normalized words.
Authors:	Zahier Nasrudin [aut, cre]
Maintainer:	Zahier Nasrudin <[email protected]>
License:	MIT + file LICENSE
Version:	0.1.3
Built:	2025-02-13 04:58:26 UTC
Source:	https://github.com/zahiernasrudin/malaytextr

Data of Malay root words

Description

Data of Malay root words

Usage

malayrootwords
malayrootwords

Format

A tibble with 4310 rows and 2 variables:

⁠Col Word⁠: dbl Malay Word
⁠Root Word⁠: dbl Malay Root Word

Malaysia Politic Tweets Sentiment Dataset (Positive, Negative or Neutral)

Description

Malaysia Politic Tweets Sentiment Dataset (Positive, Negative or Neutral)

Usage

malaysia_politic_sentiment
malaysia_politic_sentiment

Format

A tibble with 71 rows and 3 variables:

id: dbl Represents a unique identifier assigned to each tweet
text: dbl Tweet related to Malaysia politics
Sentiment: dbl The sentiment classification assigned to each tweet

Data of Malay stop words

Description

Data of Malay stop words

Usage

malaystopwords
malaystopwords

Format

A tibble with 512 rows and 1 variable:

stopwords: dbl Malay stop words

Data of Malay normalized words

Description

Data of Malay normalized words

Usage

normalized
normalized

Format

A tibble with 153 rows and 2 variables:

⁠Col Word⁠: dbl Word
⁠Normalized Word⁠: dbl Normalized Word

Remove URL links

Description

Remove URL links

Usage

remove_url(string)
remove_url(string)

Arguments

string

String to change

Details

remove_url() is an approach to remove link(s) from a string

Value

Returns a string with URL links removed

Examples

x <- c("test https://t.co/fkQC2dXwnc", "another one https://www.google.com/ to try")
remove_url(x)
x <- c("test https://t.co/fkQC2dXwnc", "another one https://www.google.com/ to try")
remove_url(x)

Data of Sentiment Words (Positive or Negative)

Description

Data of Sentiment Words (Positive or Negative)

Usage

sentiment_general
sentiment_general

Format

A tibble with 1428 rows and 2 variables:

Word: dbl Sentiment Word
Sentiment: dbl Sentiment

Stemming Malay words

Description

Malaytextr function to stem Malay words

Usage

stem_malay(word,
  dictionary,
  col_feature1,
  col_dict1,
  col_dict2,
  Word)
stem_malay(word,
  dictionary,
  col_feature1,
  col_dict1,
  col_dict2,
  Word)

Arguments

`word`	A data frame, or a character vector
`dictionary`	A data frame with a column of words to be stemmed and a column of root words
`col_feature1`	Column that contains words to be stemmed from `word`
`col_dict1`	Column that will be used to match with `col_feature1` from `word`
`col_dict2`	Column that contains the root words from `dictionary`
`Word`	Depreciated. Please use `word` instead

Format

An object of class function of length 1.

Details

stem_malay() is an approach to find the Malay words in a dictionary and then proceed to remove "extra suffix" as explained by Khan et al. (2017), and then "prefix" and lastly, "suffix".

Value

Returns a data frame with the following properties:

⁠Col Word⁠: Renamed input from word
⁠Root Word⁠: An additional column which contains the word(s) after being stemmed.

References

Khan, Rehman Ullah, Fitri Suraya Mohamad, Muh Inam UlHaq, Shahren Ahmad Zadi Adruce, Philip Nuli Anding, Sajjad Nawaz Khan, and Abdulrazak Yahya Saleh Al-Hababi. 2017. "Malay Language Stemmer."

Examples


#Specifying a character vector &
#use a dictionary from malaytextr package

stem_malay(word = "banyaknya", dictionary = malayrootwords)



#A data frame,
#Use a dictionary from malaytextr package,
#With a dataframe, you will need to specify the column to be stemmed

x <- data.frame(text = c("banyaknya","sangat","terkedu", "pengetahuan"))

stem_malay(word = x, dictionary = malayrootwords, col_feature1 = "text")

#Specifying a character vector &
#use a dictionary from malaytextr package

stem_malay(word = "banyaknya", dictionary = malayrootwords)



#A data frame,
#Use a dictionary from malaytextr package,
#With a dataframe, you will need to specify the column to be stemmed

x <- data.frame(text = c("banyaknya","sangat","terkedu", "pengetahuan"))

stem_malay(word = x, dictionary = malayrootwords, col_feature1 = "text")

Package 'malaytextr'

Help Index

Data of Malay root words

Description

Usage

Format

Malaysia Politic Tweets Sentiment Dataset (Positive, Negative or Neutral)

Description

Usage

Format

Data of Malay stop words

Description

Usage

Format

Data of Malay normalized words

Description

Usage

Format

Remove URL links

Description

Usage

Arguments

Details

Value

Examples

Data of Sentiment Words (Positive or Negative)

Description

Usage

Format

Stemming Malay words

Description

Usage

Arguments

Format

Details

Value

References

Examples