Title: | Travel Smoothly Between Country Name and Code Formats |
---|---|
Description: | Smooths the process of working with country names and codes via powerful parsing, standardization, and conversion utilities arranged in a simple, consistent API. Country name formats include multiple sources including the Unicode Common Locale Data Repository (CLDR, <http://cldr.unicode.org/>) common-sense standardized names in hundreds of languages. |
Authors: | Edward Visel [aut, cre] |
Maintainer: | Edward Visel <[email protected]> |
License: | GPL-3 | file LICENSE |
Version: | 0.3.0 |
Built: | 2024-11-06 03:11:33 UTC |
Source: | https://github.com/alistaire47/passport |
as_country_code
converts a vector of standardized country names or codes to
country codes
as_country_code(x, from, to = "iso2c", factor = is.factor(x))
as_country_code(x, from, to = "iso2c", factor = is.factor(x))
x |
A character, factor, or numeric vector of country names or codes |
from |
Format from which to convert. See Details for more options. |
to |
Code format to which to convert. Defaults to |
factor |
If |
as_country_code
takes a character, factor, or numeric vector of country
names or codes to translate into the specified code format. The default for
to
is "iso2c"
, the ISO 3166-1 Alpha-2 character codes, but many
alternatives are available.
Several non-unique codes are available as well, including "continent"
,
"is_independent"
, ISO 4217 currency codes, etc. Backwards conversion will
not work for such cases.
See codes
for all options, or run DT::datatable(codes)
for a
searchable widget.
A vector of country codes. Warns if new NA
values are added.
For converting to country names, use as_country_name()
, which
offers control of short and variant forms. For parsing non-standardized
country names to codes, use parse_country()
.
# Codifies standardized names as_country_code(c("US", "Taiwan", "Myanmar", "Kosovo", "South Korea"), from = "en") # Translates codes; if passed a factor, returns a releveled one as_country_code(factor(c("SAH", "PCN", "OMA", "JPN")), from = "fifa", to = "iso4217_3c")
# Codifies standardized names as_country_code(c("US", "Taiwan", "Myanmar", "Kosovo", "South Korea"), from = "en") # Translates codes; if passed a factor, returns a releveled one as_country_code(factor(c("SAH", "PCN", "OMA", "JPN")), from = "fifa", to = "iso4217_3c")
as_country_name
converts a vector of standardized country codes to
country names.
as_country_name( x, to = "en", from = "iso2c", short = TRUE, variant = FALSE, factor = is.factor(x) )
as_country_name( x, to = "en", from = "iso2c", short = TRUE, variant = FALSE, factor = is.factor(x) )
x |
A character, factor, or numeric vector of country codes or names |
to |
Language code of country names desired. Defaults to |
from |
Code format from which to convert. Defaults to |
short |
Whether to use short alternative name when available. Can be
length 1 or the same length as |
variant |
Whether to use variant alternative name when available. Can
be length 1 or the same length as |
factor |
If |
as_country_name
takes a character, factor, or numeric vector of country
codes (or names in another standardized format) and converts them to
country names in the specified format. If you are trying to standardize an
existing set of names, see parse_country()
.
The default "en"
is
from Unicode Common Locale Data Repository (CLDR),
which aspires to use the most customary name
e.g. "Switzerland" instead of official ones, which are frequently awkward for
common usage, e.g. "Swiss Confederation". CLDR also supplies names in a huge
variety of languages, allowing for easy translation. Short and variant
alternates are available for some countries; if not, the function will fall
back to the standard form. See LICENSE file for terms of use.
Other name sets are available from
the UN Statistics Division(UNSD),
which maintains standardized names in English, Chinese, Russian, French,
Spanish, and Arabic, here named as "en_un"
etc.
the ISO, "en_iso"
and "fr_iso"
, and
"en_cia"
, which include many longer official forms and shorter
practical forms,
"en_cia_local"
, which includes transliterations, and
"en_cia_abbreviation"
, which includes commonly-used abbreviations.
See codes
for all options, or run DT::datatable(codes)
for a
searchable widget.
A character or factor vector of country names. Warns if new NA
values are added.
For converting standardized names to codes, use as_country_code()
.
For standardizing names to codes, use parse_country()
.
# Usable names for tough-to-standardize places as_country_name(c("US", "TW", "MM", "XK", "KR")) # If passed a factor, will return a releveled one as_country_name(factor(c("US", "NF", "CD", "SJ")), short = FALSE, variant = TRUE) # Speaks a lot of languages, knows a lot of codes as_country_name(c("SAH", "PCN", "OMA", "JPN"), from = "fifa", to = "cy") # to Welsh
# Usable names for tough-to-standardize places as_country_name(c("US", "TW", "MM", "XK", "KR")) # If passed a factor, will return a releveled one as_country_name(factor(c("US", "NF", "CD", "SJ")), short = FALSE, variant = TRUE) # Speaks a lot of languages, knows a lot of codes as_country_name(c("SAH", "PCN", "OMA", "JPN"), from = "fifa", to = "cy") # to Welsh
A codebook data.frame of codes and details for country code and name
conversions available. Contains
Internet Engineering Task Force (IETF) language tags
(e.g. "en-nz"
for New Zealand English) for
Unicode Common Locale Data Repository (CLDR) names,
similar approximations for institutional names (e.g. "en-iso"
), and short
names (e.g. "iso2c"
) for country codes.
codes
codes
A data.frame of 427 rows and 9 variables.
Variables:
column
The column name in the internal passport:::countries
data.frame. Valid for use in from
and to
parameters.
code
column
with hyphens for underscores, which is a valid IANA
language tag for Unicode CLDR country names. Valid for use in from
and
to
parameters.
name
Full name or code name for non-CLDR options.
notes
Things to note, including deprecations, oddities, etc.
language
Full language name parsed from code
.
region
Full country or region name parsed from code
.
script
Full language script name parsed from code
.
variant
Full variant parsed from code
. Also used for
organization-standardized names.
extension
Further specification of name type.
All functions can accept codes separated with underscores _
, hyphens -
,
or periods .
.
# A searchable widget to find a code or name if (requireNamespace("DT", quietly = TRUE)) { DT::datatable(codes) }
# A searchable widget to find a code or name if (requireNamespace("DT", quietly = TRUE)) { DT::datatable(codes) }
country_format
is a constructor function that returns a function
to format country codes as country names suitable for passing to ggplot2's
scale functions' label
parameters.
country_format( from = "iso2c", to = "en", short = TRUE, variant = FALSE, factor )
country_format( from = "iso2c", to = "en", short = TRUE, variant = FALSE, factor )
from |
Code format from which to convert. Defaults to |
to |
Language code of country names desired. Defaults to |
short |
Whether to use short alternative name when available. Can be
length 1 or the same length as |
variant |
Whether to use variant alternative name when available. Can
be length 1 or the same length as |
factor |
If |
A frequent reason to convert country codes back to country names is to make
data visualizations more readable. While both a code and name could be
stored in a data frame, the computation and extra storage required can be
avoided by transforming codes to names directly within the visualization via
a formatter function. as_country_name()
could be used without
parentheses to format ISO 2-character codes as English names, but
format_country
allows greater flexibility, returning a formatter
function with the specified parameters set.
A function that accepts a vector of country codes and returns them as country names.
For controlling the order of a discrete scale, pass the results of
order_countries()
to limits
.
if (require(ggplot2, quietly = TRUE)) { ggplot(data.frame(country = c("KOR", "MMR", "TWN", "COG"), y = 1:4), aes(x = country, y = y)) + geom_col() + scale_x_discrete(labels = country_format(from = "iso3c")) }
if (require(ggplot2, quietly = TRUE)) { ggplot(data.frame(country = c("KOR", "MMR", "TWN", "COG"), y = 1:4), aes(x = country, y = y)) + geom_col() + scale_x_discrete(labels = country_format(from = "iso3c")) }
A sample dataset of NATO/OTAN member defense expenditures.
nato
nato
A data.frame of 232 rows and 14 variables.
Variables:
country_stanag
Country code in NATO STANAG format
year
Year, from 2012 to 2019. 2018-2019 numbers may be estimates.
Defense expenditure (USD, current prices)
Defense expenditures in US dollars, using current prices and exchange rates.
Defense expenditure (USD, 2015 prices)
Defense expenditures in US dollars, using 2015 prices and exchange rates.
Defense expenditure (% real GDP)
Defense expenditure as a percentage of real gross domestic product. Based on 2015 prices.
Defense expenditure annual real change (% GDP)
Annual change in defense expenditure as a percentage of real gross domestic product. Based on 2015 prices.
Real GDP (2015 prices)
Real gross domestic product in 2015 US dollars and at 2015 exchange rates.
GDP per capita (USD)
Gross domestic product per capita in 2015 US dollars and at 2015 exchange rates.
Defense expenditure per capita (USD)
Defense expenditure per capita in 2015 US dollars.
Military personnel
Number of military personnel
Equipment expenditure (%)
Percent of defense expenditure spent on equipment. Includes major equipment expenditure and R&D devoted to major equipment.
Personnel expenditure (%)
Percentage of defense expenditure spent on personnel. Includes both military and civilian expenditure and pensions.
Infrastructure expenditure (%)
Percentage of defense expenditure spent on infrastructure. Includes NATO common infrastructure and national military construction.
Other expenditure (%)
Percentage of defense expenditure spent on other categories besides equipment, personnel, and infrastructure. Includes operations and maintenance expenditure, other R&D expenditure, and other expenditure not otherwise captured.
https://www.nato.int/cps/en/natohq/news_167080.htm
as_country_name(nato$country_stanag, from = 'stanag')
as_country_name(nato$country_stanag, from = 'stanag')
order_countries
reorders a vector of countries, returning a result useful
for passing to ggplot2's scale functions' limits
parameters.
order_countries( x, by, ..., from = "iso2c", short = TRUE, variant = FALSE, factor = is.factor(x) )
order_countries( x, by, ..., from = "iso2c", short = TRUE, variant = FALSE, factor = is.factor(x) )
x |
A character, factor, or numeric vector of country codes or names |
by |
Either a length-one country code from |
... |
Parameters passed on to |
from |
Code format from which to convert. Defaults to |
short |
Whether to use short alternative name when available. Can be
length 1 or the same length as |
variant |
Whether to use variant alternative name when available. Can
be length 1 or the same length as |
factor |
If |
order_countries
orders a vector of countries by
itself converted to a country code or name if by
is a code from codes
to which to convert
a sortable vector if by
is a vector of the same length as x
x
itself if neither is supplied.
The original vector of countries, ordered according to the parameters
passed. Note that factors are not releveled, but are reordered. To
relevel, pass the results to levels<-()
To change labels of a discrete scale, pass the results of
country_format()
to the labels
parameter.
countries <- c("FR", "CP", "UZ", "BH", "BR") order_countries(countries) order_countries(countries, "ja") order_countries(countries, rnorm(5)) order_countries(countries, grepl("F", countries), 1:5, decreasing = TRUE) if (require(ggplot2, quietly = TRUE)) { df_countries <- data.frame(country = countries, y = exp(1:5)) ggplot(df_countries, aes(country, y)) + geom_col() + scale_x_discrete( limits = order_countries(df_countries$country, df_countries$y)[df_countries$y > 5], labels = country_format(to = "en-cia-local") ) }
countries <- c("FR", "CP", "UZ", "BH", "BR") order_countries(countries) order_countries(countries, "ja") order_countries(countries, rnorm(5)) order_countries(countries, grepl("F", countries), 1:5, decreasing = TRUE) if (require(ggplot2, quietly = TRUE)) { df_countries <- data.frame(country = countries, y = exp(1:5)) ggplot(df_countries, aes(country, y)) + geom_col() + scale_x_discrete( limits = order_countries(df_countries$country, df_countries$y)[df_countries$y > 5], labels = country_format(to = "en-cia-local") ) }
parse_country
parses irregular country names to the ISO 3166-1 Alpha-2 code
or other standardized code or name format.
parse_country( x, to = "iso2c", how = c("regex", "google"), language = c("en", "de"), factor = is.factor(x) )
parse_country( x, to = "iso2c", how = c("regex", "google"), language = c("en", "de"), factor = is.factor(x) )
x |
A character or factor vector of country names to standardize |
to |
Format to which to convert. Defaults to |
how |
How to parse; defaults to |
language |
If |
factor |
If |
parse_country
tries to parse a character or factor vector of country names
to a standardized form: by default, ISO 3166-1 Alpha-2 codes.
When how = "regex"
(default), parse_country
uses regular expressions to
match irregular forms.
If regular expressions are insufficient, how = "google"
will use the
Google Maps geocoding API instead, which permits a much broader range of
input formats and languages. The API allows 2500 calls per day, and should
thus be called judiciously. parse_country
will make one call per unique
input. For more calls, see options that allow passing an API key like
ggmap::geocode()
with output = "all"
or googleway::google_geocode()
.
Note that due to their flexibility, the APIs may fail unpredictably, e.g.
parse_country("foo", how = "google")
returns "CH"
whereas how = "regex"
fails with a graceful NA
and warning.
A character vector or factor of ISO 2-character country codes or other specified codes or names. Warns of any parsing failure.
parse_country(c("United States", "USA", "U.S.", "us", "United States of America")) ## Not run: # Unicode support for parsing accented or non-Latin scripts parse_country(c("\u65e5\u672c", "Japon", "\u0698\u0627\u067e\u0646"), how = "google") #> [1] "JP" "JP" "JP" "JP" # Parse distinct place names via geocoding APIs parse_country(c("1600 Pennsylvania Ave, DC", "Eiffel Tower"), how = "google") #> [1] "US" "FR" ## End(Not run)
parse_country(c("United States", "USA", "U.S.", "us", "United States of America")) ## Not run: # Unicode support for parsing accented or non-Latin scripts parse_country(c("\u65e5\u672c", "Japon", "\u0698\u0627\u067e\u0646"), how = "google") #> [1] "JP" "JP" "JP" "JP" # Parse distinct place names via geocoding APIs parse_country(c("1600 Pennsylvania Ave, DC", "Eiffel Tower"), how = "google") #> [1] "US" "FR" ## End(Not run)