Prof. Ken Duffy|
Eolas Building, Room 3.08
My primary research interests are in probability and statistics, and their application to science and engineering. More details can be found in my CV (September, 2017).
March 2017 - Principal inertia components and applications.While on sabbatical at MIT in 2015-2016, I had correlation my mind thanks to Muriel Médard's students Flávio du Pin Calmon (now at Harvard), Soheil Feizi (now at Stanford), and Ali Makhdoumi.
|All of statistics is about summary: given a set of data, how can one succinctly describe its important properties without having to relate the data set in full? One core measurement of interest is how correlated two variables are. If one measures pairs (x,y), such as (age, income), how can one efficiently report if there is a positive or negative relationship between them. One would intuitively say that the (x,y) pairs in the image to the right are positively correlated - as one gets larger, the other typically does - but you have a range of choices on how to quantify that relationship. How correlated two variables are has important ramifications for the ability to draw inferences. If two variables are correlated, perhaps you can estimate one having observed the other? This has ramifications for cases where you wish to be able to estimate one from the other such as when the pair is (what is said into the phone, what is heard over the phone), but also when you don't, such as (political preference, film you watched).|
In work led by Flavio, published in IEEE Transactions on Information Theory, he, Ali, Muriel, Mayank Varia, my student MarkChristiansen and I explore measurements of correlation between pairs of variables taking discrete values and relate them to mechanical notions, Principal Inertia Components (PICs). The PICs lie in the intersection of information and estimation theory, and provide a fine-grained decomposition of the dependence between two variables X and Y. The PICs describe which functions of X can or cannot be reliably inferred given an observation of Y. In the paper we demonstrate that the PICs play an important role in Information Theory, while in privacy settings, the paper proves that the PICs are related to fundamental limits of perfect privacy.
January 2017 - SEEIt, A Stochastic Model of the DNA Forensics Pipeline.
With my collaborator, the forensic scientist Catherine Grgicak (BU), Kelsey C. Peters and Genevieve Wellner in her lab, and my ex-graduate student at MIT, Neil Gurram, we recently published a stochastic model that produces simulated electropherograms, the core measurement output in DNA forensics. The model encodes all of the standard lab procedure, capturing sampling issues, allele drop-out, PCR amplification and stutter, and inherent instrument noise in the process. Its construction was informed and parameterized by a huge number of samples created in Catherine's lab. As those samples are costly and time-consuming to create, through its use, its main boon is in being able to quickly and cheaply create simulated mixtures with known contributors with which to explore modification of lab procedures to improve inference.
November 2016 - Clonal Cellular Programmes in the Immune System.
With my long-run immunology collaborator, Phil Hodgkin and members of his lab, Julia Marchingo, Andrey Kan, Susanne Heinzel at the Walter and Eliza Hall Institute of Medical Research, and my student Giulio Prevedello we have published a paper in Nature Communication. The article introduces a new technique that allowed us to determine the division state of offspring from individual cells, which has broad applicability, in a relatively high-throughput way that is significantly less labor intensive than microscopy methods.
The experiment design is based on a multiplex of division diluting dyes. In the early 1990s, a green fluorescent dye, CFSE, was identified by two Australian scientists, Lyons and Parish, that cells can imbibe without impacting their function. When a stained cell divides, its daughters fluoresce with half the intensity of their parent, and their daughters with half again, allowing division numbers to be determined. In the past few years, a new generation of dyes in a range of colours have been created. To enable determination of the division history of individual families, we designed a system where cells could be stained with a unique combination of colours. This enables observation of the lineages of hundreds of founding cells through multiple generations, as illustrated below with two dyes, CFSE and CTV.
In the paper, we use the approach to consider the initial clonal expansion of T cells in response to challenge and to address the question of how individual clones integrate distinct stimuli. Contrary to expectation, the finding is that that the initial cell is being programmed independently by the stimuli to execute a regimented family expansion programme. The significant population-level heterogeneity that had been previously described comes about as a consequence of heterogeneity in the depths of individual regular trees rather than frayed family trees.
In order to address these questions and interrogate the data from a newly developed method, we had to develop some pretty new mathematics and statistics related to the concatenation of directed trees.
June 2016 - The Design of a DNA Coded Randomized Algorithm For In Situ Lineage Tracing.
Tom Weber, our combinatorist colleague Mark Dukes from University College Dublin, and three wet lab collaborators from the Walter and Eliza Hall Institute of Medical Research, Denise Miles, Stefan Glaser and Shalin Naik, and I recently published a paper in BMC Systems Biology describing the optimal design of a randomized algorithm that can be encoded in the DNA of a cell. At an abstract level, the programme can be thought of as installing in the DNA a deck of cards, as well as mechanisms for shuffling and picking. Upon activation by the delivery of a drug, each cell executes the randomized algorithm, shuffling and removing cards from the deck until they are left with either one or three cards, whereupon the algorithm terminates.
Conceptually, the end effect is that each cell shuffles and picks either one or three cards from the deck, which are then left in that cell's DNA. As the offspring of that cell then get copies of those cards, they serve as labels them as having a shared ancestor, at least for rare combinations. Thus the algorithm's design is to maximize the diversity of the card-picks and to ensure that the selection is stable.
Starting with a deck of 13 cards, one execution of the algorithm would look as follows:
January 2016 - The Design of a DNA Coded Randomized Algorithm For Inferring Tree Depth.
How many cell division take place between birth and the construction of a brain? Are there more cell divisions between the development of skin than gut? Perhaps surprisingly, these are questions that one cannot answer at present. In a paper published in the Journal of Mathematical Biology by Tom Weber, Leïla Perié and myself, we have established a formula, which I personally think is remarkable, that may allow us to answer these questions:
The core of the mathematics is based on newly established properties of the fundamental model of growing trees, the Bellman-Harris process first studied in 1948, which assumes substantial statistical homogeneity. We do, however, also provide a derivation of a weaker form of the formula in broad generality and establish through simulations its accuracy and utility for trees that are far from samples of a Bellman-Harris process, such as the early development of C. elegans , as illustrated here:
December 2015 - Cellular Barcoding and Erythropoiesis.
All blood cells, red and white, originate from Hematopoietic Stem Cells (HSCs), which reside in bone marrow. The reason, for example, that a bone marrow transplant can be needed after extreme chemotherapy is that it sometimes the case that cancer treatments, as an unwanted side effect, destroys the blood system, and the transplant reseeds it with HSCs from the donor. Given that significance, how, exactly, the reseeded system reboots the blood system has been a significant topic of research in the medical sciences since the discovery of HSCs in the 1960s.
Before becoming a red or white blood cell, of which there are many types, a series of maturation steps takes place where offspring of HSCs progressively acquire specificities. That is, the cells increasingly commit to produce only restricted classes of cell type. When and where that commitment occurs is an important matter as, for example, it identifies where errors that lead to disease must occur and determines where intervention should take place to change outcomes.
The majority of studies determining those commitment steps have been in vitro (i.e. in test tubes), which is the primary distinction of work, "the branching point in erythro-myeloid differentiation" published in Cell. with my colleagues Leïla Perié, Lianne Kok, Rob J. de Boer and Ton N. Schumacher. Leïla designed and performed, with Lianne's aid, a series of experiments using a novel system called cellular barcoding that was developed within the Ton Schumacher's lab. at the Netherlands Cancer Institute. That methodology enables us to observe this commitment in vivo (i.e. within the body). The present work focuses on where in this chain of events the commitment to erythropoiesis, i.e. red blood cell production, takes place. Our analysis of the data from those experiments showed that the commitment is earlier than previously thought.
December 2015 - Quantifying the Computational Security of Multi-User Systems.
The security of many systems is predicated on the following logic: a user selects a string, for example a password, from a list; an inquisitor who knows the list can query each string in turn until gaining access by chancing upon the user's chosen string; the resulting system is deemed to be computationally secure so long as the list of strings is large.
Implicit in that definition is the assumption that the inquisitor knows nothing about the likely nature of the selected string. If instead one assumes that the inquisitor knows the probabilities with which strings are selected, then the number of queries required to identify a stochastically selected string, and the quantification of computational security, becomes substantially more involved. In the mid 1990s, James Massey and Erdal Arikan initiated a programme of research on the susceptibility of such systems to brute for interrogation, dubbed Guesswork, to which from we have been contributing in recent years with collaborators MIT and, more recently, Duke University.
With my my recently graduated Ph.D. student, Mark Christiansen, my colleague Muriel Médard (MIT) and her recently graduated student, Flávio du Pin Calmon (now IBM Research), our most recent contribution has just been published in IEEE Transactions on Information Theory. In it, we address a natural extension in this investigation of brute force searching: the quantification for multi-user systems. We were motivated by both classical systems, such as the brute force entry to a multi-user computer where the inquisitor need only compromise a single account, as well as modern distributed storage services where coded data is kept at distinct sites in a way where, due to coding redundancy, several, but not all, servers need to be compromised to access the content.
The results establish that from an inquisitor's point of view there is a law of diminishing returns as the number of potentially hacked accounts increases. From a designer's point of view, coming full circle to Massey's original observation that Shannon entropy has little quantitative relationship to how hard it is to guess a single string, Shannon entropy plays a fundamental role in the multi-user setting.
November 2015 - Noise in Forensic Analysis of DNAWhen cells are taken from a crime scene, a substantial amount of processing is necessary before human identification can be determined. This processing results in both noise and artifacts being added to the measured signal. In a paper published in Forensic Science International: Genetics, my colleagues Ullrich Mönich (now TU Munich), Muriel Médard (MIT), Viveck Cadambe (now Penn. State U.), Lauren Alfonse (BU) and Catherine Grgicak (BU) and I characterize the most fundamental element of this process, baseline noise, for circumstances in which there is little initial DNA, as often occurs with crime-scene samples. Historically, this noise has been modeled as Gaussian when present, but the data generated by Lauren and Catherine does not support that hypothesis and instead suggests that a heavy-tailed distribution is more appropriate. This has ramifications for how noise should be treated in continuous models or how filters should be calibrated by crime-labs.
June 2015 - Network Coding, Constraint Satisfaction, Guesswork, and Network Infusion.Three pieces of work on quite distinct topics will be presented at international conferences this month, one of which is a prize-winner.
Ying Cui will present one in London, which has won the best paper award of the Communication Theory Symposium and an overall best paper award, at the IEEE International Conference on Communications. It is a contribution to on Network Coding, relating it to constraint satisfaction problems and was written with collaborators Muriel Médard (MIT), Edmund Yeh (Northeastern) and Doug Leith (Trinity College Dublin),
Ahmad Beirami (Duke & MIT) will present another, written with Robert Calderbank (Duke), myself and Muriel Médard (MIT) at the IEEE Symposium on Information Theory in Hong Kong. The work is an investigation of computational security subject to source constraints.The last, but certainly not least, will be presented Soheil Feizi (MIT) at the International Conference on Computational Social Science in Finland. It is based on work in progress with myself, Manolis Kellis (MIT) and Muriel Médard (MIT). It forms our contribution the question of how you identify the source, or sources, of an epidemic, an idea or a social network application. Soheil presented an earlier version of the work at RECOMB/ISCB Conference on Regulatory and Systems Genomics in San Diego in November 2014.
May 2015 - Information Theory and Cryptography.In 1949, Claude Shannon published one of his opuses, Communication Theory of Secrecy Systems, in which he introduced a framework for a mathematical theory of cryptography. At a workshop of the 8th International Conference on Information-Theoretic Security (ICITS), Flávio du Pin Calmon presented a piece entitled Revisiting the Shannon Theory approach to cryptography written with Mayank Varia, Muriel Médard, Mark M. Christiansen, myself, Linda Zeger and Joao Barros, revisiting the topic.
November 2014 - An Algebra for Immune ResponsesLymphocytes, the key players in an adaptive immune response, have long since been known to need to receive multiple signals to mount an effective defense. That discovery led to the two signal theory of T cell activation. The principle being that two independent signals, antigen followed by costimulation, ensure that lymphocyte expansion is only initiated in response to genuine infection. Redundancy in this second signal has long since left a conundrum in the field.
September 2014 - Randomness, Determinism and Immune ResponsesOne of the difficult challenges in science is doing justice while framing other people's hypotheses. Steven L. Reiner, one of the main instigators behind investigating the role of asymmetric cell division in immunology, and William C. Adams have neatly laid out a deterministic description of adaptive immune responses in a fascinating opinion piece published in Nature Reviews Immunology. My Australian colleagues Philip D. Hodgkin, one of the main proponents of stochastic processes in immunology, Mark R. Dowling and I have written a brief response in defense of randomness in correspondence to the same journal. The community has not yet acquired the data required to answer how deterministic and stochastic processes interleave to build the complete immune response, but with so many different groups designing experiments and doing analysis based on distinct hypotheses, we're all looking forward to the time when that resolution occurs.
July 2014 - Forensic Analysis of DNAThe basis of a DNA fingerprint is the measurement of the number of repeats of microsatellites, short repeated sequenes of of DNA, at various locations in the genome. While the combination of given numbers of repeats are (largely) unique for individuals, in forensics applications often the number of repeats cannot be measured precisely. This occurs as the amount of DNA at a crime scene can be small, requiring amplification before the measurement takes place, which can create false signals or missing data, or the sample itself may be made up of a composite from several individuals in which case one gets a combined, mixed measurement.
February 2014 - Cellular Barcoding and Inferring Lineage PathwaysThere are many instances where one would wish to know the familial relationships of cells, to be able identify those that came from the same parent. For example, the purpose of a bone-marrow transplant is to place new hematopoietic stem cells into to the recipient so that their blood system can be rebuilt. Does each stem cell contribute equally to this rebuilding? Does chance play a role? Are some stem cells specialized? These are all questions that can only be tackled if one can identify cells from the same progenitor.
January 2014 - Integrated Random Walks and Large Busy PeriodsEver since an elegant paper by Venkat Anantharam in 1989, it's been known that the way a random walk becomes large, or how a big queue builds up, is by a piecewise linear path. The question of how a large integrated random walk, or how a large volume of work done during a busy period of a queue, occurs was first tackled by A. A. Borovkov , O. J. Boxma and Z. Palmowski in a paper in 2003.
November 2013 - Guesswork, Wiretap and Erasure ChannelsDevices such as door-card readers transmit passwords over wireless channels. Unintended receivers, eavesdroppers, are typically distant and observe the communication over a noisy channel that distorts it. How hard is it for the eavesdropper to fill in the erased gaps and how does it depend on the properties of the noisy channel?
October 2013 - Limits of Statistical InferenceA paper, which was driven by our collaborators, Flávio du Pin Calmon at MIT and Mayank Varia, at the Lincoln Lab., was presented by Muriel Médard at this year's Allerton conference Communication, Control, and Computing. The work establishes bounds on how much can be inferred about a function of a random variable's value, if only a noisy observation of the variable is available.
September 2013 - Constraint Satisfaction and Rate ControlIEEE/ACM Transactions on Networking papers are like buses in that you wait for a long time for one and then two come at once. The work on Decentralized Constraint Satisfaction that is mentioned below has appeared in the August issue of the journal. The final paper from my ex-student Kaidi Huang's doctoral thesis also appears in that issue. The work, performed in collaboration with my Hamilton Institute colleague David Malone, addresses a practical problem in wireless local area networks: rate control. When transmitting packets, WiFi cards must select a physical-layer rate at which to transmit. In principle, the faster the rate, the less robust the transmission is to noise. For the poorly engineered rates of standard WiFi, this is not strictly true as Kaidi, David and I demonstrated in a paper published in IEEE Transactions on Wireless Communications in 2011. In the present work, we proposed a rate adaption scheme based on opportunity cost and Bayesian decision making that was, demonstrably thanks to hard work of Kaidi and David, implementable on standard hardware and outperforms the standard algorithms.
July 2013 - Information Theory, Guesswork & Computational SecurityThe late Kai Lai Chung purportedly said that there's only one theorem in Information Theory: the Asymptotic Equipartition Property. At the 2013 IEEE International Symposium on Information Theory, Mark Christiansen presented a preliminary version of a surprising result that we established in collaboration with our friends from M.I.T., Flávio du Pin Calmon and Muriel Médard. Namely, despite the fact that everything within a Typical Set has, by construction, approximately equal probability, it is exponentially easier as a function of word length to guess a word chosen from a Typical Set than a naive application of the AEP would have you deduce. This has ramifications for physical layer security.
January 2013 - Guesswork & Information TheoryHow hard is it to guess someone's password? More importantly, how do you measure how hard it is to guess someone's password? It's a question that has recieved less attention than one would have expected. The recently deceased information theorist James Massey published a fascinating one page paper at ISIT on this question in 1994 and it is the topic of a paper that my student, Mark Christiansen, and I have had published in IEEE Transactions on Information Theory. Building on beautiful work of others, which began with Erdal Arikan and included a contribution from my Hamilton Institute colleague David Malone, that identified Rényi entropy as a key player in the quantification of guesswork, in the article we provide a direct estimate on the likelihood that it takes a given number of guesses to guess a randomly selected object.
January 2013 - Networking & Medium Access ControlThere are two fundamental paradigms for sharing a resource such as wireless medium. One can empower a centralized controller who gathers everyone's requirements and sets out a schedule of who gets access to the resource when. This system is used, for example, for speakers at every conference and in cell phone networks. The alternate system is to not be prescriptive, but to listen before speaking and when the medium becomes silent, randomly interject. This is used, for example, in human conversation as well forming the basis for the standard access method, IEEE 802.11, for WiFi networks.
October 2012 - Constraint Satisfaction ProblemsIs it possible to find a global solution to a problem, with everyone only having local information and being unable to see
September 2012 - Immunology & Cell Biology
My Australian immunology collaborator Philip Hodgkin and I had a paper published in Trends in Cell Biology. It expands on the hypothesis we employed in our earlier 2012 paper, published in Science, to explain cell fate diversity. Artwork based on the paper, illustrated by WEHI's Jie H. S. Zhou, was used for the edition's cover:
January 2012 - Immunology & Cellular BiologyWith colleagues from the Walter and Eliza Hall Institute of Medical Research led by Philip D. Hodgkin, I had a paper published in Science. The piece analyses data from an experiment that took my Australian colleagues, particularly Mark Dowling and John Markham, four years to perfect. It enabled us to directly observe the times at which an important class of cells in the immune system, the antibody secreting B lymphocytes, make decisions on when and how to combat a pathogen. Despite the data looking immensely complex, it is consistent with a remarkably simple, holistic hypothesis.
The group on sabbatical at MIT in 2016 (L-R: Gianfelice, Harry, Alex, Giulio, Neil, myself and Tom):