{"guid":"3bdf5c7e-c32b-592b-bc41-ae882c7b9ec8","title":"NLMaps Web: A Natural Language Interface to OpenStreetMap","subtitle":null,"slug":"sotm2021-academic-10416-nlmaps-web-a-natural-language-interface-to-openstreetmap","link":"https://2021.stateofthemap.org/sessions/GDMBWS/","description":"NLMaps Web is a web interface for querying OSM with natural language questions such as “Show me where I can find drinking water within 500m of the Louvre in Paris”. They are first parsed into a custom query language, which is then used to retrieve the answer by queries to Nominatim and Overpass.\n\nNominatim and Overpass are powerful ways of querying OSM, but the Overpass Query\nLanguage is somewhat impractical for quick queries for unfamiliar users. In\norder to query OSM using natural language (NL) queries such as “Show me where I\ncan find drinking water within 500m of the Louvre in Paris”, Lawrence and\nRiezler [1] created the first NLMaps dataset mapping NL queries to a custom\nmachine-readable language (MRL), which can then be used to retrieve the answer\nfrom OSM via a combination of queries to Nominatim and Overpass. They extended\ntheir dataset in a subsequent work by auto-generating synthetic queries from a\ntable mapping NL terms to OSM tags – calling the combined dataset NLMaps v2. [2]\nThe proposed purpose of these datasets is training a parser that can parse NL\nqueries into their MRL representation, as done in [2-5].\n\nThe main aim of my Master’s thesis was building a web-based NLMaps interface\nthat can be used to issue queries and to view the result. In addition, the web\ninterface should enable the user to give feedback on the returned, either by\nsimply marking the parser-produced MRL query as correct or incorrect, or by\nexplicitly correcting it with the help of a web form. This feedback should be\ndirectly used to improve the parser by training it in an asynchronous online\nlearning procedure.\n\nAfter observing that parsers trained on NLMaps v2 perform poorly on new queries,\nan investigation into the causes for this revealed several shortcomings in\nNLMaps v2, mainly: (1) Train and test split are extremely similar limiting the\ninformativeness of evaluating on the test split. (2) Various inconsistencies\nexist mapping from NL terms to OSM tags (e.g. “forest” sometimes mapping to\nnatural=wood, sometimes to landuse=forest). (3) The NL queries’ linguistic\ndiversity is limited since most of them were generated with a very simple\ntemplating procedure, which leads to parsers trained on the data not being very\nrobust to new wordings of a query. (4) In a similar vein, there is only a small\namount of different area names in NLMaps v2 with the names “Paris”, “Heidelberg”\nand “Edinburgh” being so dominant that parsers are biased towards producing\nthem. (5) Some generated NL queries are worded very unnaturally making them\ncounter-productive learning examples. (6) Usage of OSM tags is sometimes\nincorrect, which affects the usefulness of produced parses.\n\nThe detailed analysis is used to eliminate some of the shortcomings – such as\nincorrect tag usage – from NLMaps v2. Additionally, a new approach of\nauto-generating NL-MRL pairs with probabilistic templates is used to create a\ndataset of synthetic queries that features a significantly higher linguistic\ndiversity and a large set of different area names. The combination of the\nimproved NLMaps v2 and the new synthetic queries is called NLMaps v3.\n\nA character-based GRU encoder-decoder model with attention [6] is used for\nparsing NL queries into MRL queries using the configuration that performed best\nin previous work [5]. This model is trained on NLMaps v3 and used as the parser\nin the newly developed web interface. Mainly through advertising on the OSM talk\nlist and the OSM subreddit, 12 annotators are hired from all over the world to\nuse the web interface to issue new NL queries and to correct the parser-produced\nMRL query if it is incorrect. They are assisted by completing a tutorial before\nthe annotation job and by help compiled from taginfo [7], TagFinder [8] and\ncustom suggestions for difficult tag combinations. The collected dataset\ncontains 3773 NL-MRL pairs and is called NLMaps v4.\n\nWith the help of NLMaps v4, an informative evaluation can be performed revealing\nthat a parser trained on NLMaps v2 parses achieve an exact match accuracy of\n5.2 % on the MRL queries of the test split of NLMaps v4 while a parser trained\non NLMaps v3 performs significantly better with 28.9 %. Pre-training on\nNLMaps v3 and fine-tuning on NLMaps v4 achieves an accuracy of 58.8 %.\n\nSince the thesis’s goal is an online learning system – i.e. a system that\nupdates the parser directly after receiving feedback in the form of an NL-MRL\npair –, various online learning simulations are conducted in order to find the\nbest setup. In all cases, the parser is pre-trained on NLMaps v3 and then\nreceives the NL-MRL pairs in NLMaps v4 one by one, updating the model after each\nstep. The most simple variant of the experiment uses only the one NL-MRL pair\nfor the update, another variant adds NL-MRL pairs from NLMaps v3 to the\nminibatch and a third variant additionally adds further “memorized” NL-MRL pairs\nfrom previously given feedback to the minibatch. The main findings of the\nsimulation are that all variants improve performance on NLMaps v4 with respect\nto the pre-trained parser, but with some of them the performance on NLMaps v3\ndegrades. The simple variant that updates only on the one NL-MRL pair is\nparicularly unstable, while adding NLMaps v3 instances stabilizes the\nperformance on NLMaps v3 and improves the performance on NLMaps v4. Adding the\ninstances from memorized feedback further improves the performance to an\naccuracy of 53.0 %, which is still lower than the offline batch learning\nfine-tuning mentioned in the previous paragraph.\n\nIn conclusion, the thesis improves the existing NLMaps dataset and contributes\ntwo new datasets – one of which is especially valuable since it consists of real\nuser queries – laying the groundwork necessary for further enhancing NLMaps\nparsers. The current parser – achieving an accuracy of 58.8 % – can be used by\nOSM users via the new web interface currently available at\nhttps://nlmaps.gorgor.de/ for issuing queries and also for correcting incorrect\nones. Future work will concentrate on improving the web interface’s UX and\nenhancing the parser’s performance in terms of speed and accuracy.","original_language":"eng","persons":["Simon Will"],"view_count":137,"promoted":false,"date":"2021-07-11T12:00:00.000+02:00","release_date":"2021-11-09T00:00:00.000+01:00","updated_at":"2025-12-25T20:15:04.157+01:00","tags":["sotm2021","10416","2021","OSM","OpenStreetMap"],"length":2165,"duration":2165,"thumb_url":"https://static.media.ccc.de/media/events/sotm/2021/10416-3bdf5c7e-c32b-592b-bc41-ae882c7b9ec8.jpg","poster_url":"https://static.media.ccc.de/media/events/sotm/2021/10416-3bdf5c7e-c32b-592b-bc41-ae882c7b9ec8_preview.jpg","timeline_url":"https://static.media.ccc.de/media/events/sotm/2021/10416-3bdf5c7e-c32b-592b-bc41-ae882c7b9ec8.timeline.jpg","thumbnails_url":"https://static.media.ccc.de/media/events/sotm/2021/10416-3bdf5c7e-c32b-592b-bc41-ae882c7b9ec8.thumbnails.vtt","frontend_link":"https://media.ccc.de/v/sotm2021-academic-10416-nlmaps-web-a-natural-language-interface-to-openstreetmap","url":"https://api.media.ccc.de/public/events/3bdf5c7e-c32b-592b-bc41-ae882c7b9ec8","conference_title":"State of the Map 2021","conference_url":"https://api.media.ccc.de/public/conferences/sotm2021","related":[],"recordings":[{"size":206,"length":2165,"mime_type":"video/webm","language":"eng-rus","filename":"sotm2021-10416-eng-rus-NLMaps_Web_A_Natural_Language_Interface_to_OpenStreetMap_webm-hd.webm","state":"new","folder":"webm-hd","high_quality":true,"width":1920,"height":1080,"updated_at":"2021-11-09T23:42:41.883+01:00","recording_url":"https://cdn.media.ccc.de/events/sotm/2021/webm-hd/sotm2021-10416-eng-rus-NLMaps_Web_A_Natural_Language_Interface_to_OpenStreetMap_webm-hd.webm","url":"https://api.media.ccc.de/public/recordings/55273","event_url":"https://api.media.ccc.de/public/events/3bdf5c7e-c32b-592b-bc41-ae882c7b9ec8","conference_url":"https://api.media.ccc.de/public/conferences/sotm2021"},{"size":119,"length":2165,"mime_type":"video/webm","language":"eng-rus","filename":"sotm2021-10416-eng-rus-NLMaps_Web_A_Natural_Language_Interface_to_OpenStreetMap_webm-sd.webm","state":"new","folder":"webm-sd","high_quality":false,"width":720,"height":576,"updated_at":"2021-11-09T23:37:14.673+01:00","recording_url":"https://cdn.media.ccc.de/events/sotm/2021/webm-sd/sotm2021-10416-eng-rus-NLMaps_Web_A_Natural_Language_Interface_to_OpenStreetMap_webm-sd.webm","url":"https://api.media.ccc.de/public/recordings/55272","event_url":"https://api.media.ccc.de/public/events/3bdf5c7e-c32b-592b-bc41-ae882c7b9ec8","conference_url":"https://api.media.ccc.de/public/conferences/sotm2021"},{"size":89,"length":2165,"mime_type":"video/mp4","language":"eng-rus","filename":"sotm2021-10416-eng-rus-NLMaps_Web_A_Natural_Language_Interface_to_OpenStreetMap_sd.mp4","state":"new","folder":"h264-sd","high_quality":false,"width":720,"height":576,"updated_at":"2021-11-09T22:59:32.928+01:00","recording_url":"https://cdn.media.ccc.de/events/sotm/2021/h264-sd/sotm2021-10416-eng-rus-NLMaps_Web_A_Natural_Language_Interface_to_OpenStreetMap_sd.mp4","url":"https://api.media.ccc.de/public/recordings/55270","event_url":"https://api.media.ccc.de/public/events/3bdf5c7e-c32b-592b-bc41-ae882c7b9ec8","conference_url":"https://api.media.ccc.de/public/conferences/sotm2021"},{"size":152,"length":2165,"mime_type":"video/mp4","language":"eng-rus","filename":"sotm2021-10416-eng-rus-NLMaps_Web_A_Natural_Language_Interface_to_OpenStreetMap_hd.mp4","state":"new","folder":"h264-hd","high_quality":true,"width":1920,"height":1080,"updated_at":"2021-11-09T22:40:53.577+01:00","recording_url":"https://cdn.media.ccc.de/events/sotm/2021/h264-hd/sotm2021-10416-eng-rus-NLMaps_Web_A_Natural_Language_Interface_to_OpenStreetMap_hd.mp4","url":"https://api.media.ccc.de/public/recordings/55269","event_url":"https://api.media.ccc.de/public/events/3bdf5c7e-c32b-592b-bc41-ae882c7b9ec8","conference_url":"https://api.media.ccc.de/public/conferences/sotm2021"},{"size":118,"length":2165,"mime_type":"video/mp4","language":"rus","filename":"sotm2021-10416-rus-NLMaps_Web_A_Natural_Language_Interface_to_OpenStreetMap.mp4","state":"new","folder":"h264-hd","high_quality":true,"width":1920,"height":1080,"updated_at":"2021-11-09T22:40:47.192+01:00","recording_url":"https://cdn.media.ccc.de/events/sotm/2021/h264-hd/sotm2021-10416-rus-NLMaps_Web_A_Natural_Language_Interface_to_OpenStreetMap.mp4","url":"https://api.media.ccc.de/public/recordings/55585","event_url":"https://api.media.ccc.de/public/events/3bdf5c7e-c32b-592b-bc41-ae882c7b9ec8","conference_url":"https://api.media.ccc.de/public/conferences/sotm2021"},{"size":118,"length":2165,"mime_type":"video/mp4","language":"eng","filename":"sotm2021-10416-eng-NLMaps_Web_A_Natural_Language_Interface_to_OpenStreetMap.mp4","state":"new","folder":"h264-hd","high_quality":true,"width":1920,"height":1080,"updated_at":"2021-11-09T22:40:40.444+01:00","recording_url":"https://cdn.media.ccc.de/events/sotm/2021/h264-hd/sotm2021-10416-eng-NLMaps_Web_A_Natural_Language_Interface_to_OpenStreetMap.mp4","url":"https://api.media.ccc.de/public/recordings/55584","event_url":"https://api.media.ccc.de/public/events/3bdf5c7e-c32b-592b-bc41-ae882c7b9ec8","conference_url":"https://api.media.ccc.de/public/conferences/sotm2021"},{"size":33,"length":2165,"mime_type":"audio/mpeg","language":"eng","filename":"sotm2021-10416-eng-NLMaps_Web_A_Natural_Language_Interface_to_OpenStreetMap_mp3.mp3","state":"new","folder":"mp3","high_quality":false,"width":0,"height":0,"updated_at":"2021-11-09T22:51:05.181+01:00","recording_url":"https://cdn.media.ccc.de/events/sotm/2021/mp3/sotm2021-10416-eng-NLMaps_Web_A_Natural_Language_Interface_to_OpenStreetMap_mp3.mp3","url":"https://api.media.ccc.de/public/recordings/55271","event_url":"https://api.media.ccc.de/public/events/3bdf5c7e-c32b-592b-bc41-ae882c7b9ec8","conference_url":"https://api.media.ccc.de/public/conferences/sotm2021"}]}