Rate/Size Dependency Project
The original data sets are still available here.
That
page also documents the type of data available for this project.
TO DO: I am still working on the synthetic data sets. They
should
be ready for analysis in the next few days.
Guide: Scatter plots and EDA results are only for non-0
duration flows.
Thresholding lines for L-L Correlation plots: 1KB,
10KB,
and 100KB for Size.
0.01sec, 0.1sec,
1sec,
5sec,
100sec
for Duration.
Thresholding numbers for EDA: Top 100, 200, 500, 1000, 2000
(in the table), 5000, 10000, and 20000.
Thresholding percentages for EDA: Top 0.01, 0.02, 0.05, 0.1, 0.2, 0.5,
1, 2, 5, 10, 20 (%).
Bell
Lab "all" and "rest" data sets are split into two subsets to avoid
calculation
problems of Matlab.
Note: Some plots do not show thresholding lines properly
especially
in cases of large data sets.
However,
the printed plots look all right.
Connection
| Data File |
# Data Points |
L-L Correlartion |
EDA (2000) |
EDA (0.5%)
|
EDA (S-IR, 0.5%)
|
Size-Duration |
| UNC
All |
1,433,924 |
plot |
0.03766 |
0.07705
|
0.00370
|
plot
|
UNC
All UNC Clients
|
|
plot |
|
|
|
|
| UNC
All non-UNC Cleints |
|
|
|
|
|
|
| UNC
HTTP |
1,055,823 |
plot |
0.05640 |
0.10983
|
0.00449
|
plot
|
UNC
HTTP UNC Clients
|
|
|
|
|
|
|
| UNC
HTTP non-UNC Clients |
|
|
|
|
|
|
| UNC
SMTP/NNTP |
35,338 |
plot |
0.58212 |
0.48541
|
0.00328
|
plot
|
UNC
SMTP/NNTP UNC Clients
|
|
|
|
|
|
|
| UNC
SMTP/NNTP non-UNC Clients |
|
|
|
|
|
|
| UNC
Rest |
342,763 |
plot |
0.19787 |
0.18928
|
0.00327
|
plot
|
UNC
Rest UNC Clients
|
|
|
|
|
|
|
| UNC
Rest non-UNC Clients |
|
|
|
|
|
|
Synthetic
UNC All
(No Bias) |
1,433,932 |
plot
plot (S-IR) |
0.10825 |
0.28959
|
0.00351
|
plot
|
Synthetic
UNC All
(Fully Biased) |
|
plot
plot (S-IR)
|
|
0.25958
|
0.00355
|
plot
|
Synthetic
UNC All
(Half Biased) |
|
plot
plot (S-IR)
|
|
0.33847
|
0.00351
|
plot
|
| Abilene
All |
1,318,661 |
plot |
0.27151 |
0.40573
|
0.00326
|
plot
|
| Abilene
HTTP |
1,003,817 |
plot
|
0.27916
|
0.25335
|
0.00350
|
plot
|
| Abilene
SMTP/NNTP |
82,743 |
plot
|
0.53687
|
0.27891
|
0.00332
|
plot
|
| Abilene
Rest |
232,101 |
plot
|
0.31945
|
0.31140
|
0.00325
|
plot
|
| Bell
Labs
All |
2,313,748 |
plot
|
0.04632
|
0.11236
|
0.00724
|
plot
|
| Bell
Labs
HTTP |
1,967,442 |
plot
|
0.04943
|
0.07343
|
0.00850
|
plot
|
| Bell
Labs SMTP/NNTP |
262,514 |
plot
|
0.80155
|
0.78023
|
0.00524
|
plot
|
| Bell
Labs
Rest |
83,792 |
plot
|
0.25838
|
0.13797
|
0.00473
|
plot
|
Data Units
| Data File |
# Data Points |
L-L Correlartion |
EDA (2000) |
EDA (0.5%)
|
EDA (S-IR, 0.5%)
|
Size-Duration |
| UNC
All |
9,562,081 |
plot |
0.00479 |
0.01715
|
0.00370
|
plot
|
UNC All UNC Clients
(DATA
+ ACKS)
|
|
|
|
|
|
|
UNC All non-UNC Clients
(DATA
+ ACKS) |
|
|
|
|
|
|
| UNC
HTTP |
4,331,676 |
plot |
0.00676 |
0.01384
|
0.00449
|
plot
|
UNC HTTP UNC Clients
(DATA
+ ACKS) |
|
|
|
|
|
|
UNC HTTP non-UNC Clients
(DATA
+ ACKS) |
|
|
|
|
|
|
| UNC
SMTP/NNTP |
406,687 |
plot |
0.24549 |
0.02643
|
0.00385
|
plot
|
UNC SMTP/NNTP UNC Clients
(DATA
+ ACKS) |
|
|
|
|
|
|
UNC SMTP/NNTP non-UNC Clients
(DATA
+ ACKS) |
|
|
|
|
|
|
| UNC
Rest |
4,823,718 |
plot |
0.01273 |
0.02861
|
0.00349
|
plot
|
UNC Rest UNC Clients
(DATA
+ ACKS) |
|
|
|
|
|
|
UNC Rest non-UNC Clients
(DATA
+ ACKS) |
|
|
|
|
|
|
Synthetic
UNC
All
(No Bias) |
9,277,363 |
plot
plot (S-IR) |
0.00480 |
0.00990
|
0.00392
|
plot
|
Synthetic
UNC All
(Fully Biased) |
|
plot
plot (S-IR)
|
|
0.00912
|
0.00393
|
plot
|
Synthetic
UNC All
(Half Biased) |
|
plot
plot
(S-IR)
|
|
0.01027
|
0.00395
|
plot
|
Synthetic
All
(Unbiased) |
|
plot
plot (S-IR)
|
|
0.03905
|
0.00373
0.00507
0.42961
|
plot
|
Synthetic
All
(Modem bandwidth) |
|
plot
plot (S-IR)
|
|
0.05170
|
0.00490
0.08678
0.00845
|
plot
|
Synthetic
All
(256Kbps Bandwidth) |
|
plot
plot (S-IR)
|
|
0.00794
|
0.00842
0.13429
0.00407
|
plot
|
Synthetic
All
(Long RTT) |
|
plot
plot (S-IR)
|
|
0.04024
|
0.00355
0.00400
0.85880
|
plot
|
| Abilene
All |
21,024,889 |
plot |
0.00274 |
0.02537
|
0.00398
|
plot
|
| Abilene
HTTP |
4,464,446 |
plot
|
0.00542
|
0.01284
|
0.00441
|
plot
|
| Abilene
SMTP/NNTP |
2,788,102 |
plot
|
0.01844
|
0.01400
|
0.00445
|
plot
|
| Abilene
Rest |
13,772,341 |
plot
|
0.00802
|
0.06374
|
0.00392
|
plot
|
| Bell
Labs All |
37,302,355 |
plot
plot
|
0.02671
0.00110
|
0.03646
0.00730
|
0.00407
0.00501
|
plot
plot
|
| Bell
Labs HTTP |
7,462,332 |
plot
|
0.00223
|
0.00605
|
0.00890
|
plot
|
| Bell
Labs SMTP/NNTP |
2,656,192 |
plot
|
0.02182
|
0.00891
|
0.00526
|
plot
|
| Bell
Labs Rest |
27,183,831 |
plot
plot
|
0.03317
0.00489
|
0.03406
0.00769
|
0.00426
0.00551
|
plot
plot
|
Gaussian Simulation
Experiment:
1. Generate 500,000 Bivariate Gaussian with rho= 0 and 0.9.
2. Take absolute values.
3. Create EDA movie without thresholding first.
4. Apply thresholding (>2) and create EDA movie again.
|
Indep.
|
No thresholding
|
EDA
|
|
Indep.
|
Threshoding > 2
|
EDA
|
|
rho = 0.9
|
No thresholding
|
EDA
|
|
rho = 0.9
|
Threshoding > 2
|
EDA
|
Summary
EDA Summary (It also has Size-Duration
and
Duration-Inverse Rate)
EDA Summary based on percentage
EDA Summary plots: Size-Rate Size-Duration
Duration-IRate
(blue: All, magenta:
HTTP, red: SMTP/NNTP, green:
Rest, solid: CONN, dotted: DATA_UNIT)
EDA Summary plots (based on percentage): Size-Rate
Size-Duration Duration-IRate
Size-IRate
Synthetic data
1. Log-Log correlation plots of Size-Rate show the same story
(contradiction
in two regions).
Roughly, CONN: 0.8 vs. 0.4, DATA_UNIT: 0.9 vs. 0.1.
2. Synthetic all (DATA_UNIT) looks different (strong linear
relationship
in Size-Duration and Duration-IRate).
3. Connection level is more dependent: DATA_UNIT < CONN.
4. Abilene data are most dependent in Connection level: UNC, Bell <
Abilene.
5. Abilene data are most independent in Data unit level: Abilene <
Bell < UNC.
6. SMTP/NNTP is most dependent and All and HTTP are most independent:
All, HTTP < Rest < SMTP/NNTP.
7. In Size-IRate, all of them are quite independent.
Felix Hernandez Campos
Last
modified: Sun Aug 24 19:54:45 EDT 2003