Thursday, June 25, 2009

State of the Art in Traffic Classification: A Research Paper

M. Zhang, W. John, k. claffy, and N. Brownlee, "State of the art in traffic classification: A research review," PAM Student Workshop, 2009.


They surveyed 64 papers with over 80 data sets to create a structured taxonomy of traffic classification papers. The taxonomy is based on the following definition of traffic classification:

"Methods of classifying traffic data sets based on features passively observed in the traffic, according to specific classification goals."

They grouped the papers into 5 categories: analysis, surveys, tools, methodology and others. They used the 5 attributes (in bold) from the definition to categorize the paper.

Data sets:
- can be classified based on what type of traffic is, where it was collected, etc.

Classification goals:
- can be coarse grained(p2p, transaction oriented) or fine grained (from a specific application)

Methods:
- exact method (via port numbers)
- heuristics (based on patterns)
- machine learning methods: supervised or unsupervised learning

Features:
- choosing features to use for traffic classification is related to trends in application development. A good example given in the paper is the trend of modern applications to use UDP instead of TCP and to change ports from time to time. Because of this, mere examination of port numbers may not be enough and we might need to look at payload, flows, etc.

Using the taxonomy that they developed they tried to answer the following question: How much of modern Internet is P2P?

The following are the observations they have gathered from the papers they've surveyed:
- 1.2% to 93% of the traffic are due to P2P file sharing (observed range from 18/64 papers)
- the fractions have increased from 2002 to 2006
- P2P is more popular in Europe
- P2P traffic varies by time of day with higher percentages at night
- P2P is used more at home than in the office

Based on this, they can't have conclusive claims to answer the question above. All they can say is,

”there is a wide range of P2P traffic on Internet links; see your specific link of interest and classification technique you trust for more details.”

Shortcomings of current traffic classification:
- lack of shared current data sets
- lack of standardized measure and classification

No comments:

Post a Comment