Paris-Networking

About Paris-Networking | Announce a talk | Subscribe

Seminars on Modeling failures, Cloud Computing, Meta-routing and Social Networks  

Wednesday, February 17th 2010, 10h30 - 16h00

Location :

Paris Research Lab
Thomson Technicolor
1, rue Jeanne d'Arc,
92443 Issy-les-Moulineaux

How to get there?
- line 12 at station "Porte de Versailles" (connect to Montparnasse and the
west of Saint Germain des Pres area)
- line 8 at station "Balard" (connect to the 15th and the west of the left
bank)
- Tram T3 at station Desnouette (connect to all the south border of Paris,
including Porte d'Orl&eamp;ans, Porte d'Italie).
- Tram T2 (connect to Issy-les-Moulineaux).


Please register in advance!
All are welcome to attend the sessions and stay for lunch, but we would like to know in advance to help organizing the event.
Please contact augustin.chaintreau@thomson.net or venus.apovo@thomson.net to announce that you are coming. 
Thank you.

Abstract :

Program:

Coffee will be served starting from 10h

======== Session I: 10h30 - 12h30

Bianca Schroeder, University of Toronto

DRAM errors in the wild: A large-scale field study

Errors in dynamic random access memory (DRAM) are a common form of hardware
failure that can affect network routers, modern compute clusters as well
personal computers.  Failures are costly both in terms of hardware replacement
costs and service disruption.  While a large body of work exists on DRAM in
laboratory conditions, little has been reported on real DRAM failures in large
production clusters. In this talk, we analyze measurements of memory errors in
a large fleet of commodity servers over a period of 2.5 years.  The collected
data covers multiple vendors, DRAM capacities and technologies, and comprises
many millions of DIMM days.

The goal of this talk is to answer questions such as the following: How common
are memory errors in practice?  What are their statistical properties?  How are
they affected by external factors, such as temperature and utilization, and by
chip-specific factors, such as chip density, memory technology and DIMM age?
As we will see the answers to many of these questions are surprising and differ
in many key aspects from commonly held assumptions.

The talk is targeted at a general audience and does not require any background
in hardware architecture or memory technology.

=== 

Rade Stanojevic, Telefonica Research

Temporal Rate Limiting: cloud elasticity at a flat fee

Abstract:
In the current usage-based pricing scheme offered by most cloud computing
providers, customers are charged based on the capacity and the lease time of
the resources they capture (bandwidth, number of virtual machines, IOPS rate,
etc.). Taking advantage of this pricing scheme, customers implement
auto-scaling purchase policies by leasing (eg. hourly) necessary amounts of
resources to satisfy a desired QoS threshold under their current demand.
Auto-scaling yields strict QoS and variable charges. Some customers, however,
would be willing to settle for a more relaxed statistical QoS in exchange for a
predictable flat charge. In this work we propose Temporal Rate Limiting (TRL),
a purchase policy that permits a customer to allocate optimally a predefined
purchase budget over a predefined period of time. By taking advantage of
non-linearities in the dependence of QoS on the demand and the available
resources, TRL can offer the same expected QoS as auto-scaling but at a lower,
predefined, flat charge.

===== Lunch from 12:30pm to 2pm

===== Session II: 2pm - 4pm

Steve Uhlig, TU Berlin-Deutsche Telekom Laboratories

Improving Internet-wide routing convergence with MRPC timers: bringing order to routing dynamics

Abstract: 
The behavior of routing protocols during convergence is critical as
it impacts end-to-end performance. Network convergence is particularly
important in BGP, the current interdomain routing protocol. In order to
decrease the amount of exchanged routing messages and transient routes, BGP
routers rely on MRAI timers and route flap damping. These timers are intended
to limit the exchange of transient routing messages. In practice, these timers
have been shown to be partly ineffective at improving convergence, making it
even slower in some situations.

In this paper, we propose to add a timer mechanism to routing protocols, that
enforces an ordering of the routing messages such that path exploration is
drastically reduced while controlling convergence time. Our approach is based
on known results in generalized path algorithms and endomorphism semi- rings.
Our timers, called MRPC (metrics and routing policies compliant), are set
independently by each router and depend only on the metrics of the routes
received by the router as well as the routing policies of the router. No
sharing of information about routing policies between neighboring ASs is
required by our solution. Similarly to the case of routing policies that may
lead to BGP convergence problems, arbitrary routing policies can also make it
impossible to enforce an ordering of the messages that will prevent path
exploration to occur. We explain under which conditions path exploration can be
avoided with our timers, and provide simulations to understand how they compare
to MRAI.

=== 

Vijay Erramilli, Telefonica Research

The little engine(s) that could: Scaling Online Social Networks

Scaling of Online Social Networks (OSNs) has introduced new system design
challenges because  social-network graphs cannot be partitioned easily.
Traditional partitioning algorithms that distribute social data randomly among
multiple servers (for e.g., using DHT based key-value stores), or the ones that
resort to full replication suffer either from large delays and inter-server
traffic, or high vertical scaling costs. Such challenges have often been
responsible for the continuous and costly re-architecting of Twitter and
Facebook. 

We propose a joint partitioning and replication scheme that leverages the
underlying social-network structure, minimizes memory requirements and network
update traffic, and ensures that users have their neighbors' data co-located in
the same machine.  The gains from this are multi-fold: developers can design
their applications assuming {em local data semantics}, ie, as if they
developed for a single machine; scalability is achieved at a low cost by adding
commodity machines with low memory and network I/O requirements; and N+K
redundancy is achieved at a fraction of the cost.

To validate our OSN scaling scheme we provide a complete system design,
extensive measurement-driven evaluation, and a working implementation. We use
three large datasets from Twitter, Orkut, and Facebook to quantify the overhead
of our scheme and its feasibility. Based on a well-known Twitter clone we
develop a system that can scale to Twitter-levels without changing a line of
application code. We further compare our scheme against Cassandra, Facebook's
DHT database, and show substantial gains in throughput.

Host :

The team of the Thomson-Technical lab