Rate/Size Dependency Project

The original data sets are still available here. That page also documents the type of data available for this project.

TO DO: I am still working on the synthetic data sets. They should be ready for analysis in the next few days.

Guide: Scatter plots and EDA results are only for non-0 duration flows.
            Thresholding lines for L-L Correlation plots: 1KB, 10KB, and 100KB for Size.
                                                                       0.01sec, 0.1sec, 1sec, 5sec, 100sec for Duration.
            Thresholding numbers for EDA: Top 100, 200, 500, 1000, 2000 (in the table), 5000, 10000, and 20000.
            Thresholding percentages for EDA: Top 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1, 2, 5, 10, 20 (%).

            Bell Lab "all" and "rest" data sets are split into two subsets to avoid calculation problems of Matlab.

Note: Some plots do not show thresholding lines properly especially in cases of large data sets.
           However, the printed plots look all right.

Connection


Data File # Data Points L-L Correlartion EDA (2000)
EDA (0.5%)
EDA (S-IR, 0.5%)
Size-Duration
UNC All 1,433,924 plot 0.03766
0.07705
0.00370
plot
UNC All UNC Clients

plot



UNC All non-UNC Cleints
plot




UNC HTTP 1,055,823 plot 0.05640
0.10983
0.00449
plot
UNC HTTP UNC Clients

plot




UNC HTTP non-UNC Clients
plot




UNC SMTP/NNTP 35,338 plot 0.58212
0.48541
0.00328
plot
UNC SMTP/NNTP UNC Clients

plot




UNC SMTP/NNTP non-UNC Clients
plot




UNC Rest 342,763 plot 0.19787
0.18928
0.00327
plot
UNC Rest UNC Clients

plot




UNC Rest non-UNC Clients
plot




Synthetic UNC All
(No Bias)
1,433,932 plot
plot (S-IR)
0.10825
0.28959
0.00351
plot
Synthetic UNC All
(Fully Biased)

plot
plot (S-IR)
 
0.25958
0.00355
plot
Synthetic UNC All
(Half Biased)

plot
plot (S-IR)
 
0.33847
0.00351
plot
Abilene All 1,318,661 plot 0.27151
0.40573
0.00326
plot
Abilene HTTP 1,003,817
plot
0.27916
0.25335
0.00350
plot
Abilene SMTP/NNTP 82,743
plot
0.53687
0.27891
0.00332
plot
Abilene Rest 232,101
plot
0.31945
0.31140
0.00325
plot
Bell Labs All 2,313,748
plot
0.04632
0.11236
0.00724
plot
Bell Labs HTTP 1,967,442
plot
0.04943
0.07343
0.00850
plot
Bell Labs SMTP/NNTP 262,514
plot
0.80155
0.78023
0.00524
plot
Bell Labs Rest 83,792
plot
0.25838
0.13797
0.00473
plot


Data Units


Data File # Data Points L-L Correlartion EDA (2000)
EDA (0.5%)
EDA (S-IR, 0.5%)
Size-Duration
UNC All 9,562,081 plot 0.00479
0.01715
0.00370
plot
UNC All UNC Clients
(DATA + ACKS)

plot




UNC All non-UNC Clients
(DATA + ACKS)

plot




UNC HTTP 4,331,676 plot 0.00676
0.01384
0.00449
plot
UNC HTTP UNC Clients
(DATA + ACKS)

plot




UNC HTTP non-UNC Clients
(DATA + ACKS)

plot




UNC SMTP/NNTP 406,687 plot 0.24549
0.02643
0.00385
plot
UNC SMTP/NNTP UNC Clients
(DATA + ACKS)

plot




UNC SMTP/NNTP non-UNC Clients
(DATA + ACKS)

plot




UNC Rest 4,823,718 plot 0.01273
0.02861
0.00349
plot
UNC Rest UNC Clients
(DATA + ACKS)

plot




UNC Rest non-UNC Clients
(DATA + ACKS)

plot




Synthetic UNC All
(No Bias)
9,277,363 plot
plot (S-IR)
0.00480
0.00990
0.00392
plot
Synthetic UNC All
(Fully Biased)

plot
plot (S-IR)
 
0.00912
0.00393
plot
Synthetic UNC All
(Half Biased)

plot
plot (S-IR)
 
0.01027
0.00395
plot
Synthetic All
(Unbiased)

plot
plot (S-IR)
 
0.03905
0.00373
0.00507
0.42961
plot
Synthetic All
(Modem bandwidth)

plot
plot (S-IR)
 
0.05170
0.00490
0.08678
0.00845
plot
Synthetic All
(256Kbps Bandwidth)

plot
plot (S-IR)
 
0.00794
0.00842
0.13429
0.00407
plot
Synthetic All
(Long RTT)

plot
plot (S-IR)
 
0.04024
0.00355
0.00400
0.85880
plot
Abilene All 21,024,889 plot 0.00274
0.02537
0.00398
plot
Abilene HTTP 4,464,446
plot
0.00542
0.01284
0.00441
plot
Abilene SMTP/NNTP 2,788,102
plot
0.01844
0.01400
0.00445
plot
Abilene Rest 13,772,341
plot
0.00802
0.06374
0.00392
plot
Bell Labs All 37,302,355
plot
plot
0.02671
0.00110
0.03646
0.00730
0.00407
0.00501
plot
plot
Bell Labs HTTP 7,462,332
plot
0.00223
0.00605
0.00890
plot
Bell Labs SMTP/NNTP 2,656,192
plot
0.02182
0.00891
0.00526
plot
Bell Labs Rest 27,183,831
plot
plot
0.03317
0.00489
0.03406
0.00769
0.00426
0.00551
plot
plot


Gaussian Simulation

Experiment:
1. Generate 500,000 Bivariate Gaussian with rho= 0 and 0.9.
2. Take absolute values.
3. Create EDA movie without thresholding first.
4. Apply thresholding (>2) and create EDA movie again.
 
Indep.
No thresholding
EDA
Indep.
Threshoding > 2
EDA
rho = 0.9
No thresholding
EDA
rho = 0.9
Threshoding > 2
EDA


Summary

EDA Summary (It also has Size-Duration and Duration-Inverse Rate)
EDA Summary based on percentage

EDA Summary plots: Size-Rate Size-Duration   Duration-IRate
                                 (blue: All, magenta: HTTP, red: SMTP/NNTP, green: Rest, solid: CONN, dotted: DATA_UNIT)
EDA Summary plots (based on percentage): Size-Rate  Size-Duration    Duration-IRate    Size-IRate
                                                                   Synthetic data

1. Log-Log correlation plots of Size-Rate show the same story (contradiction in two regions).
    Roughly, CONN: 0.8 vs. 0.4, DATA_UNIT: 0.9 vs. 0.1.
2. Synthetic all (DATA_UNIT) looks different (strong linear relationship in Size-Duration and Duration-IRate).
3. Connection level is more dependent: DATA_UNIT < CONN.
4. Abilene data are most dependent in Connection level: UNC, Bell < Abilene.
5. Abilene data are most independent in Data unit level: Abilene < Bell < UNC.
6. SMTP/NNTP is most dependent and All and HTTP are most independent: All, HTTP < Rest < SMTP/NNTP.
7. In Size-IRate, all of them are quite independent.


Felix Hernandez Campos

Last modified: Sun Aug 24 19:54:45 EDT 2003