Thursday, July 23, 2009

Thursday, July 9, 2009

Network Traces



If the slideshow is not showing, you can view it here.

Wednesday, July 8, 2009

Circumventing P2P blocks

Assumptions
  • allowed port22/SSH outgoing
  • Squid proxy on port443 and port80
  • NAT support
  • outgoing VPN allowed

SOCKS4/5 proxy

  • using ssh -D8080 root@remote-host.com
  • using proxifiers (HTTP/SOCKS) / stunnel-encrypt any TCP connection (single port service) over SSL
  • (then use as SOCKS/HTTP proxy in btclient)

Bypassing SQUID

  • HTTP CONNECT on specified FQDN peers (to bypass CONNECT to IPaddr filter). The peers are HTTP proxies.
P2P on VPN (OpenVPN, IPsec)

  • openvpn multiplexes on a single TCP/UDP port
  • IPSec, security scheme on layer3/Network layer (OSI)/Internet layer

NAT on tcp/443
  • all browser sessions use proxy

A measurement study on video acceleration service

P. Pan, Y. Cui, and B. Liu, "A measurement study on video acceleration service," in IEEE CCNC, 2009.

Relevance
  • Pipes getting bigger.
  • Bandwidth and storage getting cheaper.
  • Browsers getting smarter.
  • People getting closer / social media.
  • VoD: Youtube / Tudou / Huulu / etc rely on streaming.
Performance

Buffer
- long time to buffer / multi-connection download, P2P
  • multiple connections over the same data (e.g. TV show)
  • caching at peering points
  • TCP/UDP data transfer
  • intelligent P2P routing between peering points
- buffer may stop / auto-reconnect download session
- multiple instances of buffered data / cache sharing

Result highlights


Conclusion
  • Accelerator in the browser.
  • ISP peering/caching technology
  • Partial Net neutrality?

Wednesday, July 1, 2009

Longitudinal Study of Internet traffic in 1998-2003

M. Fomenkov, K. Keys, D. Moore, and k, claffy, "Longitudinal study of internet traffic in 1998-2003," in WISICT, 2004

Introduction

This research presents a longitudinal study of Internet traffic behavior at a number of institutions for the span of four and half years(1998-2003)
Cited previous works such as:
McCreary and Claffy. They analyzed IP traffic at NASA Ames Internet eXchange point (AIX) for 8 months.
Thompson, Miller, and Wilder discussed characteristics of MCI's commercial Internet backbone which ranged from one day to a week.
Fraleigh et al described IPMON traffic monitoring system and reported observations in the Sprint E-Solutions backbone network for a day.
WAND network research group in University of Waikato conducted measurements on OC3 links between the University of Auckland and the public internet.

Data
They obtained 4000 traffic samples from various sites connected to High Performance Computing networks.
At each site, packet headers were captured between one to eight times a day per month. Average duration of each measurement ranges from 60 to 120 seconds.

Four metrics of measure traffic:
1. number of bytes
2. number of packets where packets are actual quanta of traffic
3. number of flows
4. number of source-destination pairs (port numbers and protocols ignored)

Flow is determined in a sequence of packets if they have the same source IP address, source port, destination IP address, destination port and protocol flow key.

They discovered that traces captured with an FATM card often have problems with accuracy of time measurements such as apparent clock resets and delays. They solved this by checking timestamps, properly converting absolute counts to rates, and averaging rates.

Results and Conclusions

Variations in bit rate are large and mostly without trends which reflects the Internet's traffic bustiness. No observed cycle or consistent long-term growth

Quality of available data is often insufficient for other qualitative measurements. (e.g. traffic flunctuations can be caused by a number of reasons.

Assuming data is representative of overall traffic evolution, they conclude that the data do not support the claim of Internet traffic universally and rapidly increasing both before and after the Internet bubble burst

TCP is the predominant transport protocol.
TCP traffic is between 60% to 90% of the total load
UDP is between 10% to 40 % of the total load
and other protocols combined amount to less than 5 %.

By bytes, the proportion of TCP and UDP traffic on average is 5 to 1 or by packets which is 3 to 1.

Packet rate is sublinear function of bit rate. packet rate ~ bitrate^0.75 and count of flows and IP pairs behave as bitrate^0.5

Analysis of Internet Backbone Traffic and Header Anomalies Observed

Authors: J Wolfgang, S Tafvelin

This is a comparison with their later paper.

Differences
  • length of study
    • this study collected data from spring 2006 (april), 7.5 TB data
    • their later study included data from spring 2006, and newer data from fall 2006 (september to november), 5 TB data
  • focus
    • this study: headers used and anomalies
    • their later study: traffic classes, also observed some header anomalies
This study:
  • ecn deployment is still small 0.2% of tested clients
  • more upd packets are fragmented (97%) than tcp (3%) for their incoming segments. not surprising since path mtu is for TCP only
Their later study:
  • p2p is more aggressive in using SACK
  • WS and TS is more established in http
Common:
This study represents the initial results of their overall study by focusing on headers and their effect on the applications being used. Their later paper presents a more in-depth study and presented the impact of the header anomalies in ways that can be used to improve the monitoring of applications using the network, and detection of malicious attacks being conducted

Micro Transport Protocol

Micro Transport Protocol is basically BitTorrent over UDP.

Traditionally, when an application needs to communicate via a network, it chooses between TCP or UDP for its transport protocol. When the need for reliability is much more important then speed, TCP is the right choice. Otherwise, it can use UDP to take advantage of its strengths. BitTorrent, which deals with the reliable transfer of data, obviously should use TCP.

However, in recent years, BitTorrent started to dominate the Internet. Regardless of the legality of the files being transferred, BitTorrent has become a bandwidth hog. This is a concern for ISPs, particularly in the US where there are no download limits. To combat the BitTorrent onslaught, ISPs started shaping their traffic. This is the start of the net neutrality debate.

Two simple examples of traffic shaping are TCP reset and random packet discard. In TCP reset, the ISP looks for P2P traffic (long session between 2 peers involving large packets) and sends a TCP reset to one or both users. In random packet discard, the ISP simply drops random packets.

To defeat the traffic shaping techniques of the ISPs, BitTorrent designers turn to UDP since UDP traffic are much harder to shape. ISPs will have to look inside the UDP packets to interfere with the traffic but such deep packet inspection is almost similar to wiretapping which is illegal in most countries. And besides, looking for long TCP sessions is so much easier than inspecting UDP packets. It is analogous to counting truck trailers in the highway versus counting the occupants of all the vehicles passing by.

Using UDP, however, has its problems too. Without the congestion control of TCP, there is a danger of flooding which can slow down other applications that use the Internet. Furthermore, TCP features such as retransmission of lost packets should be reimplemented again.

The Micro Transport Protocol addresses congestion issue by controling the transfer rate of TCP connections using information gathered from the transport. It aims to decrease latency caused by applications using the protocol while maximizing bandwidth when latency is not excessive. This way, there is no need for the user to set the upload/download rate since the protocol automatically adjusts to the network.

Since the protocol is still being implemented, its features are typically hidden or made obscure. The lack of an open-source implementation of this protocol or even a standard surrounding it will, in my opinion, slow down its adaptation and may further the debate if UDP is right for BitTorrent or not.

Analysis of Internet Backbone Traffic and Header Anomalies observed

Wolfgang John and Sven Tafvelin
Chalmers University of Technology

Introduction

In order to support research and further development, the Internet community needs to understand the nature of Internet traffic. In this paper, an analysis of IP and TCP traffic was done using headers from two OC-192 links.

Methodology

Collection of Traces.
- April 7 -- 26, 2006
- Optical splitters were used on two OC-192 links attached to Endace DAG6.2SE cards
- The first 120 bytes of each (Packet over SONET) frame were captured by the DAG cards
- Four traces of 20 minutes each day. (2AM, 10AM, 2 PM, 8PM)

Processing and Analysis.
- Payload beyond transport layer were removed.
- Traces were sanitized, checked for inconsistencies.
- Traces were desensitized, stripped of all sensitive information to ensure privacy.

Results
- 148 traces
- 10.77 billion PoS frams
- 7.6 TB of data, 99.97% of the frames contain IPv4 packets

IP packet size distribution
- bimodal
- 44% is between 40 and 100 bytes
- 37% is between 1400 and 1500 bytes

Transport Protocols
- TCP: 90 - 95% of the data volume
- largest fraction of TCP and lowest of UPD during 2PM
- potential UDP DoS detected by high UDP traffic during April 16-17, later confirmed

Analysis of IP properties
- IP options are virtually not used
- only 68 packets carrying IP options were observed
- only 0.06% of IP fragmented traffic was observed, contrary to previous reports of up to 0.67%

Analysis of TCP properties
- MSS and SACK permitted options are widely used on connection establishment. (on the average 99.2% and 89.9% resp.)
- also observed were TCP options misbehavior which included undefined option types and inconsistencies in option header length value and actual option header length

Conclusions
- Current trends in Internet backbone traffic is useful in protocol and application design.
- Anomalies detected were caused by: buggy and misbehaving appliactions and protocol stacks; active OS fingerprinting, and; network attacks exploiting vulnerabilities.

Critique
The results of this paper only applies to the particular Internet backbone links used in the collection of data. A much more wider source of packet traces, (say, hundreds of OC links in different continents,) is needed to generalize the properties of Internet traffic.

A Survey of Techniques for Internet Traffic Classification using Machine Learning

Authors: Thuy T.T. Nguyen and Grenville Armitage

Summary:


The paper is a survey of works that involved machine learning techniques in classifying IP network traffic.

To facilitate the review of the papers, the authors grouped the surveyed works into the following categories:

a) Clustering approaches
In this section, the authors gave a summary of the usage framework and results for the following algorithms:

  1. Expectation Maximization (for flow clustering)
  2. Unsupervised Bayesian classification (coupled with expectation maximization for automated application identification)
  3. Simple K-means (one for TCP-based application identification and one for identifying Web and P2P traffic in the network core
b) Supervised Learning Approaches
The same treatment as with the review of clustering algorithms were used on the review of the following ML techniques:

(The following three have been used for mapping network apps to predetermined QoS traffic classes)
  1. Neural Networks
  2. Linear Discriminate Analysis
  3. Quadratic Discriminant Analysis
  4. (The above three have been used for mapping network apps to predetermined QoS traffic classes)

  5. Supervised Bayesian classification (one for classifying Net traffic based on application, one coupled with Multiple Sub-flows features for real-time traffic classification, and another coupled with Muliple Synthetic Sub-flows Pairs also for real-time classification)
  6. Genetic algorithms (for feature selection and flow classification)
  7. Statistical techniques (coupled with so-called "protocol fingerprints" for flow classification)

c) Hybrid Approaches

Under this type, a proposed semi-supervised classification technique is reported. This technique is a two-step method involving the use of maximum likelihood estimation (via a Bayesian method-like statistic) and later with the employment of K-means clustering.

d) Comparison and Related Work

The last category reported works that included comparisons of algorithms that were mentioned before. In summary, the papers compared clustering vs. other clustering methods, clustering vs. supervised methods, and statistical (particularly Pearson's chi-square test) vs. supervised methods (particularly Naive Bayes. Another concept under this section was the presentation of "novel" ML-based methods: ACAS (ML techniques on application signatures) and BLINC (application classification based on behavior of the source host at the transport layer.

Finally, they gave an assessment of sorts on the works they surveyed based on the following "challenges for operational deployment":

  1. Timely and Continuous Classification
    Some have explored the performance of ML classifiers that utilise only the first few packets of a flow, but they cannot cope with missing the flow’s initial packets. Others have explored techniques for continuous classification of flows using a small sliding window across time, without needing to see the initial packets of a flow.

  2. Directional Neutrality
    The assumption that application flows are bi-directional, and the application’s direction may be inferred prior to classification, permeates many of the works published to date. Most work has assumed that they will see the first packet of each bi-directional flow, that this initial packet is from a client to a server. The classification model is trained using this assumption, and subsequent evaluations have presumed the ML classifier can calculate features with the correct sense of forward and reverse direction.

  3. Efficient Use of Memory and Processors
    There are definite trade-offs to be made between the classification performance of a classifier and the resource consumption of the actual implementation... The overhead of computing complex features (such as effective bandwidth based upon entropy, or Fourier Transform of the packet inter-arrival time) must be considered against the potential loss of accuracy if one simply did without those features.

  4. Portability and Robustness
    None of the reviewed works has addressed and evaluate their model’s robustness in terms of classification performance with the introduction of packet loss, packet fragmentation, delay and jitter.


Critique:

Even though the paper's main purpose is to report on the status of ML usage for traffic classification, this paper also presents other opportunities to which network-related research may be directed. One of the (obvious?) things that merit some research is the wide array of network classification tasks (e.g. flow classification, application identification). A potential topic that comes to mind would be a synthesis (of the output) these different classification tasks into a unified view of the profile of a network. Another one is feature selection (i.e. the task of identifying the attributes needed for input). Although the "standard" set of features is the usual 5-tuple, since there is now a more complex set-up of network transaction, a study could be conducted on another "optimal" set of features to be able to carry out network traffic classification better.