Note that all binary data in the URL (particularly info_hash and peer_id) must be properly escaped. This means any byte not in the set 0-9, a-z, A-Z, '.', '-', '_' and '~', must be encoded using the "%nn" format, where nn is the hexadecimal value of the byte. (See RFC1738 for details.)
For a 20-byte hash of \x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a,
The right encoded form is %124Vx%9A%BC%DE%F1%23Eg%89%AB%CD%EF%124Vx%9A
-[摘自 BT 协议]
按照协议的逻辑,
对于 \x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a 这个 info_hash,每个字节的 10 进制数字如果小于等于 127 的(ascii 范围内),都先转换成 ascii 字符,然后用 encodeURIComponent 进行了编码,大于 127 的可以直接用%加 16 进制字符串表示,比如%ff
例如开头的字节 0x12,换算成 10 进制,等于 18 ,查 ascii 码表,18 对应的是 DC2(device control 2),不是一个可见字符,用 encodeURIComponent 编码后返回%12 。
又比如 0x34 ,换算成 10 进制,等于 52 ,查 ascii 码表,对应的是字符"4"。
所以头两个字节经过转换后变成了 %124
我的问题是,16 进制的字符表达在 URL 作为参数中应该是完全安全的
16 进制也就是 0-9 以及 a-f,两个字符代表一个字节,例如上面的 16 进制字符串完全可以表示为:123456789abcdef123456789abcdef123456789a ,作为 URL 的查询参数传输也不会存在安全问题才对。
这是因为历史原因约定俗成导致的,还是因为其他原因呢?
找到一个 stackoverflow 的回答,看来是 BT 协议的问题了。。。也就是为时已晚。。
https://stackoverflow.com/questions/4072234/bittorrent-tracker-request-format-of-info-hash
Q:When I want to send an initial request to a tracker all references I've seen says it needs to be url-encoded. If I transform the SHA-1 hash I have of the info key into a hex string, why would I need to url-encode the hash? It only contains allowed characters.
A:The info_hash parameter is not a hex string. It's a pure binary string, so yes, you will have to URL-encode many of the bytes in it. (This tends to make it longer in the end than just using a hex-encoded string, but that's the BitTorrent protocol for you, too late to do anything about it now!)