Skip to content

Named Entity Recognizer

Named Entities is a term to refer a part in a text document such as names of people, locations, products, events, organizations, telephone number, and so on. All of these entities can provide relevant information and use for further analysis. To be able to find out this type of entity, Named Entity Recognizer is provided as a solution.

Users only need to enter sentences into the Named Entity Recognizer module, then the types of named entities will be separated automatically as seen in the illustration below.

illustration

NER

Request Method

POST

Request URL

1
https://api.prosa.ai/v2/text/ner

Request Header

Key Data Type Description Value
Content-Type string Media type of the body sent to the API. Only Support 'application/json' application/json
x-api-key string API Key Acquired from Prosa API Console [YOUR_API_KEY]

Request Body

The request body accepts the following parameter(s) in JSON format.

Parameter Data Type Description Auto Required
version string NER version to be used v1 False
text string Text to be classified True
flat boolean Enable this parameter to retrieve the result as a flat list of NE, without "begin_offset" nor "length" false False

NER V1.0

Example

Sample Request (JSON)

1
2
3
4
{
    "version": "v1",
    "text" : "JAKARTA, SITUSBERITA.com - Nasib berkas perkara yang ditangani Kejaksaan Agung menjadi perhatian setelah kebakaran melalap Gedung Utama Korps Adhyaksa di Jakarta Selatan, pada Sabtu (22/8/2020) malam."
} 

Sample Response (JSON)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
{
    "entities": [
        {
            "name": "JAKARTA",
            "type": "GPE",
            "begin_offset": 0,
            "length": 7
        },
        {
            "name": "SITUSBERITA.com",
            "type": "ORG",
            "begin_offset": 9,
            "length": 15
        },
        {
            "name": "Kejaksaan Agung",
            "type": "NOR",
            "begin_offset": 63,
            "length": 15
        },
        {
            "name": "Gedung Utama Korps Adhyaksa",
            "type": "FAC",
            "begin_offset": 123,
            "length": 27
        },
        {
            "name": "Jakarta Selatan",
            "type": "GPE",
            "begin_offset": 154,
            "length": 15
        },
        {
            "name": "Sabtu (22/8/2020)",
            "type": "DTE",
            "begin_offset": 176,
            "length": 17
        },
        {
            "name": "malam",
            "type": "TME",
            "begin_offset": 194,
            "length": 5
        }
    ]
}

Type

Based on Ontonotes 5 Entities Type.

Name Code Example Description
Person PER Aditya Satrya, Susilo Bambang Yudhoyono, Jokowi, Batman People, including fictional.
NORP NOR KPK, DPR, Sunda, Islam, Kabinet Indonesia Bersatu, TNI AU, Kopassus, Komisi X Nationalities or religious or political groups.
Facility FAC Institut Teknologi Bandung, Bandara Soekarno Hatta, Museum Perjuangan Bandung, Grand Indonesia, Mesjid Istiqlal, hanamasa, hotel Holiday Inn, kantor KPK Buildings, airports, highways, bridges, etc.
Organization ORG PT Telkom Tbk, Apple Corporations, Pertamina, IBM, Kaskus, Manchester United, The Chainsmoker, One Piece Fans Club, BEM UI Companies, agencies, institutions, etc.
Geo-Political Entity GPE Kota Bandung, Kabupaten Bogor, California, Afrika Selatan, Provinsi Kalimantan Tengah, Pulau Jawa Countries, cities, states.
Location LOC Jl. Dr.Otten 10, Danau Toba, Sungai Musi, Pantai Kuta, Blok A, Gunung Prau, lantai 2 Non-GPE locations, mountain ranges, bodies of water.
Product PRO Yamaha Mio, Toyota Camry, Airbus A380, Big Mac, Nasi goreng, Bakso Pak Mien, KFC Chizza Objects, vehicles, foods, etc. (Not services.)
Event EVT Java Jazz Festival 2017, Djakarta Warehouse Project 2016, The Internationals 7 Named hurricanes, battles, wars, sports events, etc.
Work of Art WOA Dilan 1990, Stairway to Heaven, Tari Pendet, Batik Keris, Patung Pancoran, Patung Selamat Datang Titles of books, songs, etc.
Law LAW UU 1945, RUU, Pancasila, Piagam Jakarta, Perjanjian Linggarjati Named documents made into laws.
Language LNG bahasa Indonesia, bahasa Inggris, bahasa Sunda, bahasa Jawa Any named language.
Date DTE 21 Februari 1998, 2017/6/30, 11-11-11, besok, kemarin, lusa, pekan depan, Absolute or relative dates or periods.
Time TME 10.00 WIB, 21:24 WITA, 6 sore WIT, shubuh, pagi hari, siang hari Times smaller than a day.
Percent PCT 100%, 54.32%, 9,60 persen, 8,02 persen Percentage, including "%".
Money MON US$ 1000, Rp50 juta, 100 miliar peso Monetary values, including unit.
Quantity QUA Ukuran S, Size M, 300 W, 500 kg, 1000 km, 5 sen, satu hari, satu tahun, sepekan, sejam, seminggu, sedetik, setahun Measurements, as of weight or distance.
Ordinal ORD pertama, kesatu, kedua, keseratus, kesejuta, juara 1, kelas 2, S1, S2, S3 "first", "second", etc.
Cardinal CRD 10, 5, seorang, sebuah, sepasang, ½, seperlima Numerals that do not fall under another type.

NER V2.0

Example

Sample Request (JSON)

1
2
3
4
{
    "version": "v2",
    "text" : "JAKARTA, SITUSBERITA.com - Nasib berkas perkara yang ditangani Kejaksaan Agung menjadi perhatian setelah kebakaran melalap Gedung Utama Korps Adhyaksa di Jakarta Selatan, pada Sabtu (22/8/2020) malam."
} 

Sample Response (JSON)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
{
    "entities": [
        {
            "name": "JAKARTA",
            "type": "LOC",
            "begin_offset": 0,
            "length": 7
        },
        {
            "name": "SITUSBERITA.com",
            "type": "LOC",
            "begin_offset": 9,
            "length": 15
        },
        {
            "name": "Kejaksaan Agung",
            "type": "ORG",
            "begin_offset": 63,
            "length": 15
        },
        {
            "name": "Gedung Utama Korps Adhyaksa",
            "type": "FAC",
            "begin_offset": 123,
            "length": 27
        },
        {
            "name": "Jakarta Selatan",
            "type": "LOC",
            "begin_offset": 154,
            "length": 15
        },
        {
            "name": "Sabtu (22/8/2020)",
            "type": "DTE",
            "begin_offset": 176,
            "length": 17
        },
        {
            "name": "malam",
            "type": "TME",
            "begin_offset": 194,
            "length": 5
        }
    ]
}

Type

Name Code Example Description
Other Name OTN Harry Potter, Spiderman, Si Manis Jembatan Ancol, Name given to non-human entities, including fictional characters.
Person PER Jokowi, Prabowo, Budi Name given to humans.
God GOD Buddha, Wisnu, Jesus, Nyi Roro Kidul, Dewi Sri, Jibril Name of gods, and mystical creatures.
Organization ORG PBB, Kangen Band, keluarga manurung, persib, pssi, BCA, PDI, PPP, KPK, Yahudi, Muslim, Kristian Name of organizations (international organizations, performer organizations, families, ethnicities, sport organizations, cooperations, politics organizations, governments, study programs).
Location LOC Kota Bandung, Jalan Dr. Otten No. 10 Name of places (roads, cities, districts, countries, states, mountains, islands, rivers, lakes, oceans, bays, troughs, planets, address, postal codes, phone numbers, emails, urls).
Facility FAC istana negara, lantai 7, lapangan parkir timur senayan Name of buildings which give service(s) (historical sites, institutions, schools, universities, markets, parks, sport facilities, museums, zoos, amusement parks, theaters, worship places, rest area, stations, airports, docks, oil refineries, mines, dams, paths, tunnels, i-th floors, rooms, parking lots, alleys).
Product PRO ayam geprek, K-Pop, sastra Indonesia Name of man-made products (trademarks, food, vehicles, electronics, medicines, weapons, market stock, awards, service names, identity number, license plates, works of art, drawings, paints, broadcasts, films, performances, songs, music, books, magazines, newspapers, religions, cultures, agreements, laws, legal documents, languages).
Event EVT Bencana Chernobyl, Gempa Aceh, Perang Bubat Name of activities or events (religious events, matches, conferences, epidemics, incidents, wars, natural phenomena).
Time TME 11 malam, 20:00, siang hari, malam hari Time expression ranging in 1 day.
Date DTE Hari Ibu, Hari Valentine, Tahun Baru, Hari Pancasila, Halloween, beberapa hari yang lalu Time expression ranging more than 1 day, and commemoration days.
Number NUM semester kedua, kelas tiga, tingkat satu, Any number expressions (money, stock indexes, cardinals, ordinals, percentage, ranks, orders or sequences).
Measurement MEA 20 cm, 500 ha, 300 kalori Any measurement (physical size, volume, speed, area, mass, intensity, temperature, calories).
## Additional feature: Whitelist
A whitelist allows you to have your own entity type for words that are not recognized by the model we provide.

To use whitelist feature, you need to define a word / phrase and its entity type in the request body. For the following example we are going to label "omicron" with type "DIS" and "kasus omicron" with type "CAS", stands for disaster and case respectively.

Sample Request (JSON)

1
2
3
4
5
6
7
8
{
    "version": "v2",
    "text": "kasus omicron di indonesia terus melonjak, okupansi rumah sakit pmi mendekati 80 persen, presiden joko widodo menganjurkan warga bekerja dari rumah untuk mereka yang bisa.",
    "white_list": {
        "omicron": "DIS",
        "kasus omicron": "CAS"
    }
} 

Sample Response (JSON)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
{
    "entities": [
        {
            "name": "kasus omicron",
            "type": "CAS",
            "begin_offset": 0,
            "length": 13
        },
        {
            "name": "indonesia",
            "type": "LOC",
            "begin_offset": 17,
            "length": 9
        },
        {
            "name": "80 persen",
            "type": "NUM",
            "begin_offset": 78,
            "length": 9
        },
        {
            "name": "joko widodo",
            "type": "PER",
            "begin_offset": 98,
            "length": 11
        }
    ]
}
From the sample response above, only the "kasus omicron" that will be extracted and labelled. This is the behaviour if there is phrase that contain another label, the model will only extract the longer words.

Free trial

Are you interested in our API? Click the button below and get your free trial now.

try now

Version History

Below is the version history of our Named Entity Recognizer API.

Version F1 Test Data
1.0 91.1% 36,680 tokens
2.0 97.63% 300,000+ tokens

Questions?

We do our best to make this documentation clear and user friendly, but if you have unanswered questions, please send email to support@prosa.ai.