Skip to content

Syntactic Analyzer

One of the analyzes to help understanding the meaning of a text is to perform a syntactic analysis that can be done with the Syntactic Analyzer module. In the process, the text will be broken down into smaller parts and given certain labels. The Syntactic Analyzer module performs three analyzes carried out by the components: Tokenizer, Stemmer, and Part-of-Speech tagger (POS tagger).

This module is used by entering text either in the form of paragraphs or sentences and returning the results of the analysis.

Tokenizer

The Tokenizer is a component in the Syntactic Analyzer module that serves to cut text into a set of tokens (words) and/or sentences. The process of cutting the word is then called the term tokenization. Tokenization is the initial process of language processing. This Tokenizer is able to analyze data text with the following conditions:

  • Handles the special cases contained in the General Guidelines for Indonesian Spelling (PUEBI), such as writing money (Rp50), writing titles (Dr.), and separating two or more different words not separated by spaces (Jokowi-JK, Anyer-Panarukan).
  • Supports various types of text formats such as news or social media.

Illustration

  • Tokenize paragraph into sentences Tokenizer paragraphs

  • Tokenize sentence into words Tokenizer sentence

Stemmer

The text analysis needed when understanding the meaning of a sentence is the identification of the basic word because words that have the same basic word will have the same meaning.

  • Change passive sentences becomes active through changing their affixes. Example: "di" becomes "me" on "dibunuh" becomes "membunuh".
  • Increase the relevance of search engine results on a site by matching search keywords that have been processed by the Stemmer API.

Illustration

Stemmer

Part-of-Speech Tagger

In Indonesian, each word occupies a specific class of words that have different functions in a sentence structure. Word classes are needed to help process language automatically at a higher level, such as knowing the structure and meaning in a sentence. POS Tagger is used to obtain various types of word classes, including nouns, pronouns, verbs, adjectives, prepositions, numbers, adverbs, punctuation, and so on. The output from this module is in the form of a word class for each word contained in the input text, either in the form of articles or sentences.

Illustration

POS Tagger

Request Method

POST

Request URL

1
https://api.prosa.ai/v2/text/basic-nlp

Request Header

Key Data Type Description Value
Content-Type string Media type of the body sent to the API. Only Support 'application/json' application/json
x-api-key string API Key Acquired from Prosa API Console [YOUR_API_KEY]

Request Body

The request body accepts the following parameter(s) in JSON format.

Parameter Data Type Description Auto Required
text string text to be processed True
granularity enum part-of-speech types granularity 'fine_grained' False

Granularity Enums

Part-of-speech tags granularity.

Granularity Description
coarse 16 classes based on universal dependencies pos tags.
fine_graned 28 classed based on INACL (Indonesia Association of Computational Linguistics) convention.

Example

Sample Request (JSON)

1
2
3
{
    "text":"Aku mencintai bahasa Indonesia karena aku orang Indonesia."
}

Sample Response (JSON)

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
{
    "sentences": [
        {
            "sentence": "Aku mencintai bahasa Indonesia karena aku orang Indonesia.",
            "begin_offset": 0,
            "length": 58,
            "lemmatized_length": 54,
            "lemmatized_sentence": "aku cinta bahasa indonesia karena aku orang indonesia.",
            "tokens": [
                {
                    "token": "Aku",
                    "length": 3,
                    "pos_tag": "PRN",
                    "begin_offset": 0,
                    "lemma": "aku",
                    "affixes": {
                        "derivational_prefix": [],
                        "derivational_suffix": "",
                        "particle_suffix": "",
                        "possesive_suffix": ""
                    },
                    "lemmatized_begin_offset": 0
                },
                {
                    "token": "mencintai",
                    "length": 9,
                    "pos_tag": "VBT",
                    "begin_offset": 4,
                    "lemma": "cinta",
                    "affixes": {
                        "derivational_prefix": [
                            "me"
                        ],
                        "derivational_suffix": "i",
                        "particle_suffix": "",
                        "possesive_suffix": ""
                    },
                    "lemmatized_begin_offset": 4
                },
                {
                    "token": "bahasa",
                    "length": 6,
                    "pos_tag": "NNO",
                    "begin_offset": 14,
                    "lemma": "bahasa",
                    "affixes": {
                        "derivational_prefix": [],
                        "derivational_suffix": "",
                        "particle_suffix": "",
                        "possesive_suffix": ""
                    },
                    "lemmatized_begin_offset": 10
                },
                {
                    "token": "Indonesia",
                    "length": 9,
                    "pos_tag": "NNP",
                    "begin_offset": 21,
                    "lemma": "indonesia",
                    "affixes": {
                        "derivational_prefix": [],
                        "derivational_suffix": "",
                        "particle_suffix": "",
                        "possesive_suffix": ""
                    },
                    "lemmatized_begin_offset": 17
                },
                {
                    "token": "karena",
                    "length": 6,
                    "pos_tag": "CSN",
                    "begin_offset": 31,
                    "lemma": "karena",
                    "affixes": {
                        "derivational_prefix": [],
                        "derivational_suffix": "",
                        "particle_suffix": "",
                        "possesive_suffix": ""
                    },
                    "lemmatized_begin_offset": 27
                },
                {
                    "token": "aku",
                    "length": 3,
                    "pos_tag": "PRN",
                    "begin_offset": 38,
                    "lemma": "aku",
                    "affixes": {
                        "derivational_prefix": [],
                        "derivational_suffix": "",
                        "particle_suffix": "",
                        "possesive_suffix": ""
                    },
                    "lemmatized_begin_offset": 34
                },
                {
                    "token": "orang",
                    "length": 5,
                    "pos_tag": "NNO",
                    "begin_offset": 42,
                    "lemma": "orang",
                    "affixes": {
                        "derivational_prefix": [],
                        "derivational_suffix": "",
                        "particle_suffix": "",
                        "possesive_suffix": ""
                    },
                    "lemmatized_begin_offset": 38
                },
                {
                    "token": "Indonesia",
                    "length": 9,
                    "pos_tag": "NNP",
                    "begin_offset": 48,
                    "lemma": "indonesia",
                    "affixes": {
                        "derivational_prefix": [],
                        "derivational_suffix": "",
                        "particle_suffix": "",
                        "possesive_suffix": ""
                    },
                    "lemmatized_begin_offset": 44
                },
                {
                    "token": ".",
                    "length": 1,
                    "pos_tag": "SYM",
                    "begin_offset": 57,
                    "lemma": ".",
                    "affixes": {
                        "derivational_prefix": [],
                        "derivational_suffix": "",
                        "particle_suffix": "",
                        "possesive_suffix": ""
                    },
                    "lemmatized_begin_offset": 53
                }
            ],
            "length": 58,
            "begin_offset": 0
        }
    ]
}

Fine Grained POS Tags

Based on INACL (Indonesia Association of Computational Linguistics) convention.

Name Code Example Description
Adjective ADJ biru, sakit, gelisah, cerdas, cantik, panjang, ordinal non-kuantitatif: pertama, kedua, ketiga (example: juara pertama, abad ke-17) Adjectives are words that typically modify nouns and specify their properties or attributes
Adverb ADV sangat, enggan, harus, mesti, agak, hanya, semakin Adverbs are words that typically modify verbs for such categories as time, place, direction or manner.
Articles ART para, si, sang, hang, ini, itu Articles are words that modify nouns or noun phrases and express the indefinite reference of the noun phrase in context.
Coordinative Conjunction CCN dan, atau, tetapi A coordinating conjunction is a word that links words or larger constituents without syntactically subordinating one to the other and expresses a semantic relationship between them.
Subordinative Conjunction CSN jika, meskipun, walaupun A subordinating conjunction is a conjunction that links constructions by making one of them a constituent of the other.
Interjection INT aduh, astaga, wah, aduhai, nah, astaga An interjection is a word that is used most often as an exclamation or part of an exclamation.
Quantifiers KUA sesuatu, semua, beberapa, berbagai, sebagian, segenap, masing-masing, segala Quantifiers are words that modify nouns or noun phrases and express the definite reference of the noun phrase in context.
Negation NEG tidak, bukan, tak, enggak Negation word are words used to express the contradiction or denial of something
Noun NNO buku, mobil, malaikat, pikiran Nouns are a part of speech typically denoting a person, place, thing, animal or idea.
Proper Noun NNP Jakarta, Indonesia, Burhan Silalahi A proper noun is a noun (or nominal content word) that is the name (or part of the name) of a specific individual, place, or object.
Numeral NUM satu, dua, sebuah, pertama A numeral is a word, functioning most typically as a determiner, adjective or pronoun, that expresses a number and a relation to the number, such as quantity, sequence, frequency or fraction.
Preposition PPO di, ke, dari, tentang Adposition is a cover term for prepositions and postpositions.
Interrogative Pronoun PRI apa, siapa, bagaimana Interrogative Pronouns is pronouns used to replace something that is a question in a question sentence.
Cliticized Pronoun PRK mu, ku, nya Cliticized Pronoun is a pronoun that is word morpheme
Pronoun PRN saya, anda, kamu, sesuatu, seseorang, itu, ini Pronouns are words that substitute for nouns or noun phrases, whose meaning is recoverable from the linguistic or extralinguistic context.
Relative Pronoun PRR yang, tempat Relative Pronouns are words that can be used to replace the main part and/or link it with it’s explanation.
Particle PAR pun, per, lah, toh, kah Particles are function words that must be associated with another word or phrase to impart meaning and that do not satisfy definitions of other universal parts of speech (e.g. adpositions, coordinating conjunctions, subordinating conjunctions or auxiliary verbs).
Punctuation PUN ., ?, !, ", /, -, ; a sign used to separate various parts of a written language unit (words, phrases, and sentences) and in some cases can affect the meaning of the language unit.
Character Symbols SYM Rp, $, Ω, ° A symbol is a word-like entity that differs from ordinary words by form, function, or both.
Tense, Aspect, Modality, Evidentiality TAME telah, akan, bakal, sudah, sedang, lagi, masih, pernah, ingin, harus, perlu, boleh, pasti, tampaknya a group of words that change the context of the tense, aspect, modality, or evidentiality of a predicate.
Intransitive Verb VBI duduk, menangis, bergembira, percaya An intransitive verb does not have an object.
Transitive Verb VBT membaca, menyirami, membelikan A transitive verb is one that is used with an object: a noun, phrase, or pronoun that refers to the person or thing that is affected by the action of the verb.
Linking Verb VBL adalah, ialah, merupakan, menjadi An auxiliary is a function word that accompanies the lexical verb of a verb phrase and expresses grammatical distinctions not carried by the lexical verb, such as person, number, tense, mood, aspect, voice or evidentiality.
Passive Verb VBP dipukul, dipenuhi, disembuhkan Passive Verb is a verb where it’s subject is the patient.

Coarse POS Tags

Based on universal dependencies pos tags.

Name Code Example Description
Adjective ADJ biru, sakit, gelisah, cerdas Adjectives are words that typically modify nouns and specify their properties or attributes.
Adposition ADP di, ke, dari, tentang, untuk Adposition is a cover term for prepositions and postpositions.
Adverb ADV enggan, agaknya, masih Adverbs are words that typically modify verbs for such categories as time, place, direction or manner.
Coordinating Conjunction CCONJ dan, atau, tetapi, baik A coodinating conjunction is a word that links words or larger constituents without syntactically subordinating one to other and expresses a semantic relationship between them
Determiner DET sesuatu, semua, beberapa Determiners are words that modify nouns or noun phrases and express the reference of the noun phrase in context.
Interjection INTJ aduh, astaga, wah, maaf An interjection is a word that is used most often as an exclamation or part of an exclamation
Noun NOUN buku, mobil, malaikat, pikiran Nouns are a part of speech typically denoting a person, place, thing, animal or idea.
Numeral NUM satu, dua, sebuah, kedua A numeral is a word, functioning most typically as a determiner, adjective or pronoun, that expresses a number and a relation to the number, such as quantity, sequence, frequency or fraction.
Particle PART pun, per Particles are function words that must be associated with another word or phrase to impart meaning and that do not satisfy definitions of other universal parts of speech (e.g. adpositions, coordinating conjunctions, subordinating conjunctions or auxiliary verbs)
Pronoun PRON saya, anda, seseorang Pronouns are words that substitute for nouns or noun phrases, whose meaning is recoverable from the linguistic or extralinguistic context.
Proper Noun PROPN Jakarta, Indonesia, Burhan Silalahi A proper noun is a noun (or nominal content word) that is the name (or part of the name) of a specific individual, place, or object.
Subordinating conjunction SCONJ jika, sejak, meskipun A subordinating conjunction is a conjunction that links constructions by making one of them a constituent of the other.
Symbol SYM ? . ! - A symbol is a word-like entity that differs from ordinary words by punctuation, form, function, or both.
Verb VERB duduk, membaca, dipenuhi, ada A verb is a member of the syntactic class of words that typically signal events and actions, can constitute a minimal predicate in a clause, and govern the number and types of other constituents which may occur in the clause

Free trial

Are you interested in our API? Click the button below and get your free trial now.

try now

Version History

Below is the version history of our Syntactic Analyzer API.

Version F1 Test Data
1.0 96.13% 10,890 token

Questions?

We do our best to make this documentation clear and user friendly, but if you have unanswered questions, please send email to support@prosa.ai.