RFC compliant uri and display-name encode/decode

1. URI Encoding This patch changes ast_uri_encode()'s behavior when doreserved is enabled. Previously when doreserved was enabled only a small set of reserved characters were encoded. This set was comprised primarily of the reserved characters defined in RFC3261 section 25.1, but contained other characters as well. Rather than only escaping the reserved set, doreserved now escapes all characters not within the unreserved set as defined by RFC 3261 and RFC 2396. Also, the 'doreserved' variable has been renamed to 'do_special_char' in attempts to avoid confusion. When doreserve is not enabled, the previous logic of only encoding the characters <= 0X1F and > 0X7f remains, except for the '%' character, which must always be encoded as it signifies a HEX escaped character during the decode process. 2. URI Decoding: Break up URI before decode. In chan_sip.c ast_uri_decode is called on the entire URI instead of it's individual parts after it is parsed. This is not good as ast_uri_decode can introduce special characters back into the URI which can mess up parsing. This patch resolves this by not decoding a URI until parsing is completely done. There are many instances where we check to see if pedantic checking is enabled before we decode a URI. In these cases a new macro, SIP_PEDANTIC_DECODE, is used on the individual parsed segments of the URI rather than constantly putting if (pedantic) { decode() } checks everywhere in the code. In the areas where ast_uri_decode is not dependent upon pedantic checking this macro is not used, but decoding is still moved to each individual part of the URI. The only behavior that should change from this patch is the time at which decoding occurs. Since I had to look over every place URI parsing occurs to create this patch, I found several places where we use duplicate code for parsing. To consolidate the code, those areas have updated to use the parse_uri() function where possible. 3. SIP display-name decoding according to RFC3261 section 25. To properly decode the display-name portion of a FROM header, chan_sip's get_calleridname() function required a complete re-write. More information about this change can be found in the comments at the beginning of this function. 4. Unit Tests. Unit tests for ast_uri_encode, ast_uri_decode, and get_calleridname() have been written. This involved the addition of the test_utils.c file for testing the utils api. (closes issue #16299) Reported by: wdoekes Patches: astsvn-16299-get_calleridname.diff uploaded by wdoekes (license 717) get_calleridname_rewrite.diff uploaded by dvossel (license 671) Tested by: wdoekes, dvossel, Nick_Lewis Review: https://reviewboard.asterisk.org/r/469/ git-svn-id: https://origsvn.digium.com/svn/asterisk/trunk@243200 65c4cc65-6c06-0410-ace0-fbb531ad65f3
author: David Vossel <dvossel@digium.com> 2010-01-26 16:30:08 +0000
committer: David Vossel <dvossel@digium.com> 2010-01-26 16:30:08 +0000
commit: d16b89be1783acedb9f3ee0d4f1f5f64ba012675 (patch)
tree: b4f37193aa43a729fdd7ab30df84eaffd4fbbf3b /include/asterisk/utils.h
parent: 2ce1ffc6644ea4d5b9173179cbf2aff7d159d167 (diff)
1 files changed, 23 insertions, 16 deletions
diff --git a/include/asterisk/utils.h b/include/asterisk/utils.h
index 051748510..1ea9a6085 100644
--- a/include/asterisk/utils.h
+++ b/include/asterisk/utils.h
@@ -248,22 +248,25 @@ int ast_base64encode(char *dst, const unsigned char *src, int srclen, int max);
  */
 int ast_base64decode(unsigned char *dst, const char *src, int max);
 
-/*!  \brief Turn text string to URI-encoded %XX version 
-
-\note 	At this point, we're converting from ISO-8859-x (8-bit), not UTF8
-	as in the SIP protocol spec 
-	If doreserved == 1 we will convert reserved characters also.
-	RFC 2396, section 2.4
-	outbuf needs to have more memory allocated than the instring
-	to have room for the expansion. Every char that is converted
-	is replaced by three ASCII characters.
-	\param string	String to be converted
-	\param outbuf	Resulting encoded string
-	\param buflen	Size of output buffer
-	\param doreserved	Convert reserved characters
-*/
-
-char *ast_uri_encode(const char *string, char *outbuf, int buflen, int doreserved);
+/*! \brief Turn text string to URI-encoded %XX version 
+ *
+ * \note 
+ *  At this point, this function is encoding agnostic; it does not
+ *  check whether it is fed legal UTF-8. We escape control
+ *  characters (\x00-\x1F\x7F), '%', and all characters above 0x7F.
+ *  If do_special_char == 1 we will convert all characters except alnum
+ *  and the mark set.
+ *  Outbuf needs to have more memory allocated than the instring
+ *  to have room for the expansion. Every char that is converted
+ *  is replaced by three ASCII characters.
+ *
+ *  \param string	String to be converted
+ *  \param outbuf	Resulting encoded string
+ *  \param buflen	Size of output buffer
+ *  \param do_special_char	Convert all non alphanum characters execept
+ *         those in the mark set as defined by rfc 3261 section 25.1
+ */
+char *ast_uri_encode(const char *string, char *outbuf, int buflen, int do_special_char);
 
 /*!	\brief Decode URI, URN, URL (overwrite string)
 	\param s	String to be decoded 
@@ -755,4 +758,8 @@ int ast_str_to_eid(struct ast_eid *eid, const char *s);
  */
 int ast_eid_cmp(const struct ast_eid *eid1, const struct ast_eid *eid2);
 
+/*!
+ * \brief Registers util api unit tests
+ */
+void ast_utils_register_tests(void);
 #endif /* _ASTERISK_UTILS_H */
author	David Vossel <dvossel@digium.com>	2010-01-26 16:30:08 +0000
committer	David Vossel <dvossel@digium.com>	2010-01-26 16:30:08 +0000
commit	d16b89be1783acedb9f3ee0d4f1f5f64ba012675 (patch)
tree	b4f37193aa43a729fdd7ab30df84eaffd4fbbf3b /include/asterisk/utils.h
parent	2ce1ffc6644ea4d5b9173179cbf2aff7d159d167 (diff)