whereabouts-db / README.md
saunteringcat's picture
Update README.md
d0b2b59 verified
metadata
license: mit

Whereabouts

Whereabouts is a geocoding package in Python that implements some clever record linkage algorithms in SQL using DuckDB. The package itself is available at whereabouts and can be installed via

pip install whereabouts

Installation of reference databases

Once the package is installed you will need to install a geocoding database, which has been built from a country's or region's address data. This repo contains a collection of these databases for different countries and regions. Currently it has files for

  • Australia (whole of country)
  • Victoria, Australia
  • New South Wales, Australia

More are being added as I get around to cleaning the data and creating the corresponding databases. The file format is <country_abbreviation>_<states>_<size> where <size> is either sm or lg depending on whether the inverted index has been created using pairs of consecutive tokens or trigrams. The large models can handle lower quality address data at the expense of speed.

Example (install the small Australian geocoding database)

python -m whereabouts download au_all_sm

Start geocoding

Once you have installed the package and a database you can start geocoding your data.

from whereabouts.Matcher import Matcher

addresslist = ['122 station st fairfield vic', '643-645 sydney road brsunwick', '504 sydney rd brunswick']

matcher = Matcher(db_name='au_all_sm')
matcher.geocode(addresslist, how='standard')

References

The algorithm is based on the following paper https://arxiv.org/abs/1708.01402