**Reverse Engineering the Peer to Peer Streaming Media System**

Chunxi Li1 and Changjia Chen2 *1Beijing Jiaotong University 2Lanzhou Jiaotong University China* 

#### **1. Introduction**

94 Reverse Engineering – Recent Advances and Applications

[6] Java Emitter Templates (JET) http://www.eclipse.org/articles/Article-JET/jet\_tutorial1.html [7] S. Sendall, W. Kozaczynski; "Model Transformation - the Heart and Soul of Model-

[8] Sendall S. Kuster J. "*Taming Model Round-Trip Engineering".* In Proceedings of Workshop

[9] Uwe Aßmann, Automatic Roundtrip Engineering, Electronic Notes in Theoretical

[10] Nija Shi; Olsson, R.A.; , "Reverse Engineering of Design Patterns from Java Source

[11] L. C. Briand, Y. Labiche, and J. Leduc. Toward the reverse engineering of uml sequence

[12] G. Canfora, A. Cimitile, and M. Munro. Reverse engineering and reuse re-engineering.

[13] R. Fiutem, P. Tonella, G. Antoniol, and E.Merlo. A clich´e based environment to support

[14] Gerardo CanforaHarman and Massimiliano Di Penta. 2007. New Frontiers of Reverse

[15] Atanas Rountev, Olga Volgin, and Miriam Reddoch. 2005. Static control-flow analysis

[16] Rountev A., Kagan S., and Gibas, M. Static and dynamic analysis of call chains in Java. In International Symposium on Software Testing and Analysis, pages 1.11, July 2004. [17] L. Briand, Y. Labiche, and Y. Miao. Towards the reverse engineering of UML sequence diagrams. In Working Conference on Reverse Engineering, pages 57.66, 2003.

http://www.oracle.com/technetwork/topics/ent20/whatsnew/index.html

http://www.ibm.com/developerworks/rational/products/rsa/

http://www-01.ibm.com/software/integration/wbsf/ [20] SAP NetWeaver: Adoptive technology for the networked Fabric. http://www.sap.com/platform/netweaver/index.epx

*Software Maintenance*, pages 319–328. IEEE Computer Society, 1996.

Society, Washington, DC, USA, 326-341. DOI=10.1109/FOSE.2007.15

http://www.ibm.com/developerworks/webservices/library/ws-soa-design1/ [5] Albert, M et al.: Model to Text Transformation in Generating Code from Rich

Associations Specifications, In: Advances in Conceptual Modeling – Theory and

Driven Software Development". IEEE Software, vol. 20, no. 5, September/October

Computer Science, Volume 82, Issue 5, April 2003, Pages 33-41, ISSN 1571-0661,

Code," *Automated Software Engineering, 2006. ASE '06. 21st IEEE/ACM International* 

diagrams for distributed java software. *IEEE Trans. Software Eng.*, 32(9):642–663, 2006.

*Journal of Software Maintenance and Evolution - Research and Practice*, 6(2):53–72, 1994.

architectural reverse engineering. In *Proceedings of the International Conference on* 

Engineering. In *2007 Future of Software Engineering* (FOSE '07). IEEE Computer

for reverse engineering of UML sequence diagrams. In *Proceedings of the 6th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering* (PASTE '05), Michael Ernst and Thomas Jensen (Eds.). ACM, New York, NY, USA,

[4] Arsanjani A.: Service Oriented Modeling and Architecture (SOMA).

'Best Practices for Model-Driven Software Development.

*Conference on* , vol., no., pp.123-134, 18-22 Sept. 2006.

http://dx.doi.org/10.1109/FOSE.2007.15

96-102. DOI=10.1145/1108792.1108816 http://doi.acm.org/10.1145/1108792.1108816

[22] Microsoft .Net http://www.microsoft.com/net

[18] IBM Rational Software Architect.

[19] IBM WebSphere Services Fabric:

[21] Oracle Enterprise 2.0

Practice, LNCS 4231, Springer Berlin:2006.

10.1016/S1571-0661(04)80732-1.

2003, pp. 42-45.

Peer to peer (P2P) content distribution network like BitTorrent (BT) is one of most popular Internet applications today. Its success heavily lies on the ability to share the capacity of all the individuals as a whole. As the first deployed prototype on the real Internet, CoolStreaming (Zhang et al., 2005) for the first time manifests what a great application potential and huge business opportunity it can reach if the content is delivered not only in large scale but also on real-time. Along the way to the large-scale Along the way to such as system, people (Vlavianos et al., 2006) find there is no natural connection between the abilities of mass data delivering and real-time distributing in any protocol. This discovery stimulates people to study how to modify protocols like BT to meet the real-time demand. Since 2004, a series of large-scale systems like PPLive and PPStream have been deployed in China and all over the world, and become the world-class popular platforms. Many research reports on them also mark their success.

However, most existing works are descriptive. They tell about how such a system works and how to measure it, but do not pay much effort to explain why. In this chapter, we take a different route. We seek to better understand the operation and dynamics of P2P systems at a deeper level of detail. We split our understanding objective into the following subobjectives 1) understand the working principle through the communication protocol crack, 2) comprehend the streaming content-delivery principle, 3) locate the measurable parameters which can be used to evaluate the system performance; 4) understand the P2P network through the models of startup process and user behavior, and analyze the engineering design objectives. The requirements for reaching those goals are as follows. 1) the research must be driven by mass measured data of real network. 2) for us, the measuring platform must be suitable to the normal access situation like the home line. 3) datasets must be available in terms of scalability, quality and correctness of information. 4) the process of reversing engineering should be well designed with ease to set up the analysis, ease to interpret the results and ease to draw conclusions from the presented results.

However, the road towards reaching our research goals is full of challenges. On this road, many new findings are reported, many original problems are presented, and many design philosophies are discussed for the first time. Because all P2P streaming systems so far are proprietary without any public technical documentation available, the fundamental "entry

Reverse Engineering the Peer to Peer Streaming Media System 97

P2P VoD system is of receiver-driven and each peer controls playback rate by himself. Unlike live peer, VoD user has more flexibility to choose different playback patterns, such as

In general, the protocol crack is a cycling procedure including following steps:

kinds of message, even though some messages' functions are unknown.

general, we crack more than 80% messages for PPLive, PPStream and UUSee.

**Network sniffer/measurement**: In the first step, performed using a client sniffer, we capture the interactive packets between the local peer and others. We get to know the important protocol messages must be there such as *shake hand* message, buffer map message (*BM*), and peer list message (*peerlist*), based on existing research reports and our experience. By connecting those types of message to the sniffer trace, it is not difficult to distinguish all

**Protocol message guess**: Next, we observe each message in different dimensions, including the dimensions of time, channel and peer. For facilitating observation, we use a small software (developed by us) to extract the wanted messages with some query conditions, such as source IP/port, destination IP/port and message type, from the traces. From the extracted records, we can see many regular patterns which help parse the detailed format of each message. Of course, this way doesn't always work well, for the minority of messages can't be explained. So, we don't neglect any available reference information, e.g., we have ever found the fields of total upload/download count and upload/download speed per peer contained in BM based on the information displayed in PPStream client window. In

**Test and Confirmation:** In this stage, we analyze and validate the interactive sequences of messages. We guess and try different interactive sequences until the normal peer or tracker gives the right response. At last, nearly all the guesses are confirmed by our successfully and

skipping, fast forwards and fast backwards.

**3.2 The communication protocol cracking** 

Fig. 1. The system structure

legally access to the real network.

Fig. 2. Buffer and buffer map

point" of the analysis is to crack the system protocol, and then develop measurement platform to access to the system legally; next, based on the mass raw data, we investigate and study the user/peer behaviors, especially the startup behaviors which are believed to involve much more systematic problems rather than stable stage; at last, the system's performance, scalability and stability are discussed and the design models and philosophy are revealed based on the peer behavior models. The research steps outlined previously in this paragraph are detailed in Sections 3 to 5. In addition, Section 2 presents related work.
