This module defines a standard interface to break URL strings up in components (addessing scheme, network location, path etc.), to combine the components back into a URL string, and to convert a ``relative URL'' to an absolute URL given a ``base URL.''
The module has been designed to match the Internet RFC on Relative Uniform Resource Locators (and discovered a bug in an earlier draft!). Refer to RFC 1808 for details on relative URLs and RFC 1738 for information on basic URL syntax.
It defines the following functions:
scheme://netloc/path;parameters?query#fragment
.
Each tuple item is a string, possibly empty.
The components are not broken up in smaller parts (e.g. the network
location is a single string), and % escapes are not expanded.
The delimiters as shown above are not part of the tuple items,
except for a leading slash in the path component, which is
retained if present.
Example:
urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
('http', 'www.cwi.nl:80', '/%7Eguido/Python.html', '', '', '')
If the allow_fragments argument is zero, fragment identifiers
are not allowed, even if the URL's addressing scheme normally does
support them. The default value for this argument is 1
.
urlparse()
.
This may result in a slightly different, but equivalent URL, if the
URL that was parsed originally had redundant delimiters, e.g. a ? with
an empty query (the draft states that these are equivalent).
Example:
urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
'http://www.cwi.nl/%7Eguido/FAQ.html'
urlparse()
.