Scientific Program Information

For abstracts and links to the slides, click on the talk or poster title.

One-hour talks:

Short talks:


Marta Casanellas: Phylogenetic invariants versus classical phylogenetics [ slides ]

Although many mathematicians are working on obtaining phylogenetic invariants, biologists are not really using them. The main reason is that, up to now, there did not exist accurate methods based on invariants that could outperform classical reconstruction methods in a broad range of situations. But nowadays all theoretical results that could interest biologists are at our disposal (for the general Markov model, at least), so it is time to provide accurate phylogenetic reconstruction methods based on them. In this talk we will present an improvement of a method proposed by N. Eriksson that may finally convince biologists about the usefulness of phylogenetic invariants. Using data simulated under different settings, we will compare its accuracy against classical phylogenetic reconstruction methods.
This is a joint work with J. Fernández-Sánchez.

Luis David Garcia-Puente: Markov bases for noncommutative Fourier analysis of partially ranked data [ slides ]

Given the result v of a survey and a nested collection of summary statistics that could be used to describe that result, it is natural to ask which of these summary statistics best describe v. In 1998, Diaconis and Sturmfels presented an approach for determining the conditional significance of a higher order statistic, after sampling a space conditioned on the value of a lower order statistic. Their approach involves the computation of a Markov basis, followed by the use of a Markov process with stationary hypergeometric distribution to generate a sample. In 2005, Diaconis and Eriksson used these techniques to calibrate Fourier analysis of fully ranked data.
In this talk, we present an extension of this technique for data analysis to the study of partially ranked data, focusing on data from surveys in which participants are asked to identify their top k choices of n items. Furthermore, we introduce an R package for algebraic statistics. This package provides links to 4ti2, Macaulay2 and Bertini from R. In particular, it has the capabilities of running Markov chains using Markov bases.
This is joint work with Mike Hansen, Ann Johnston, David Khale, and Michael Orrison.

Hisayuki Hara: Running Markov chain with and without Markov basis [ slides ]

The methodology of Markov basis initiated by Diaconis and Sturmfels (1998) stimulated active research on Markov bases for more than a decade. It also motivated improvements of algorithms for Gr\""obner basis computation for toric ideals, such as those implemented in 4ti2.
However explicit forms of Markov bases are known only for some relatively simple models, such as the decomposable models. Furthermore general algorithms for Markov bases computation often fail to produce Markov bases even for moderate-sized models in a practical amount of time. Hence so far we could not perform exact tests based on Markov basis methodology for many important practical problems.
In this talk we introduce some alternative methods. The first one is to use a useful subset of Markov basis for connecting only practical fibers. The second one is to use a lattice basis which is an integer kernel of a design matrix. We also discuss connecting tables whose cell counts are restricted to zero or one by using Graver basis.

Hélène Massam: Distributed parameter estimation of discrete hierarchical models via marginal likelihoods

Maximum likelihood estimation of the parameter is an intrinsic part of learning in Markov random fields and much research has been devoted to finding efficient algorithms to obtain a reasonably accurate estimate of the parameter. We consider the class of discrete graphical models embedded in the larger class of discrete hierarchical models. A number $N$ of individuals are classified according to criteria indexed by a finite set $V$. Each variable $X_v, v\in V$ takes values in a finite set $\I_v$. The data is gathered in a contingency table and we assume that the cell counts follow a multinomial distribution Markov with respect to a given undirected graph $G$.
The cumulant generating function of the sampling distribution is usually intractable when the number of variables is large. Therefore, even though the likelihood function is a convex function of the parameter, the traditional convex optimization methods using the derivative of the likelihood cannot be used. We will address two aspects of the problem of maximum likelihood estimation for discrete graphical loglinear models when the number of variables is large.
1. When the maximum likelihood exists, we propose a distributed estimation approach based on composite likelihood. We will use ""one-hop"" or ""two-hop"" marginal models including a vertex $v$ of $G$ and its neighbours (one-hop) or the vertex $v$, its neighbours and the neighbours of the neighbours (two-hop). We show that this composite likelihood estimator is consistent and that its variance decreases with the number of hops.
2. Sometimes, as it is well known, the maximum likelihood estimate of the data does not exist because the data belongs to a face of the convex hull of the support of the measure generating the discrete graphical model. To identify this face, we propose an algorithm that exploits some marginal aspects of the problem.

Ezra Miller: Applying persistent homology to brain artery and vein imaging [ slides ]

Persistent homology measures geometric structures using topological invariants. When the structures are magnetic resonance images of branching arteries, for example, persistent homology records the connectedness of an increasing subset of the vessels. Although the theory of persistent homology is relatively well developed, and many aspects of its behavior are understood in synthetic examples, only recently have applications to genuine expermiental data begun. This talk explains what we have learned about the geometry of blood vessels in aging human brains, as well as lessons this exploration has taught us about applications of persistent homology in general. These lessons inform further potential applications of persistent homology in statistical problems from biological and medical imaging. The main results are joint with Paul Bendich, Steve Marron, and Sean Skwerer.

Jason Morton: When Does a Mixture of Products Contain a Product of Mixtures? [ slides ]

We derive relations between theoretical properties of restricted Boltzmann machines (RBMs), popular machine learning models which form the building blocks of deep learning models, and several natural notions from discrete mathematics and convex geometry. We give implications and equivalences relating RBM-representable probability distributions, perfectly reconstructibe inputs, Hamming modes, zonotopes and zonosets, point configurations in hyperplane arrangements, linear threshold codes, and multi-covering numbers of hypercubes. As a motivating application, we prove results on the relative representational power of mixtures of product distributions and products of mixtures of pairs of product distributions (RBMs) that formally justify widely held intuitions about distributed representations. In particular, we show that an exponentially larger mixture of products, requiring an exponentially larger number of parameters, is required to represent the probability distributions represented as products of mixtures.

Alessandro Rinaldo: Asymptotics and extremal properties of the edge-triangle exponential random graph model [ slides ]

We consider the asymptotic extremal properties of the edge-triangle exponential model. This is a two-parameters exponential model for simple random graphs whose sufficient statistics are rescaled edge and triangle density homomorphisms. We show that, as both the number of nodes and the magnitude of the model parameters grow unbounded, most of the mass of the distribution will concentrate on the isomorphic class of a Turan graph, as well as the empty and complete graph. Which Turan graph will essentially depend on the direction of the vector of parameters with respect to the origin. Furthermore, there is an array of extremal directions that will give the same isomorphic class of a Turan graph. We will categorize all such extremal properties based on the dual geometry of a certain convex set on the plane with infinitely many facets. In particular, our analysis reveals the emergence of phase transitions along distinguished critical directions. We will illustrate our results within the framework of graphons.
This is joint work with Mei Yin and Sukhada Fadnavis.

John Rhodes: Parameter identifiability of discrete DAG models with latent variables [ slides ]

Understanding parameter identifiability, or its failure, for a graphical model with latent variables is a basic, yet often difficult problem. With finite state spaces, this amounts to understanding fibers of certain highly-structured polynomial maps. While a computational algebra approach can be applied to a specific graph, more desirable would be graphical conditions that could be applied generally to imply identifiability statements. Complete results along these lines currently seem far out of reach. However, by focusing on small graphs with simple structures, some progress can be made. A methodical study of these graphs leads both to some surprising examples and several fairly general theorems that may provide initial steps toward a more complete theory. This is joint with with E. Allman, E. Stanghellini, and M. Valtorta.

Eduardo Sáenz de Cabezón: The algebraic method in system reliability analysis

The use of commutative algebra in reliabilty has a long tradition. Several approaches have used algebraic concepts for the evaluation odf system reliability. In the last years, the authors have used combinatorial commutative algebra, in particular free resolutions of monomial ideals to analyze the reliability of systems and networks. This talk reviews the main results of this method as well as the last advances in using monomial ideals to new problems like all-terminal reliability.
This is joint work with Henry Wynn.

Vishesh Karwa: Differentially Private Degree Sequences and Synthetic Graphs [ slides ]

In this talk, we give a brief overview of challenges associated with sharing sensitive network data under rigorous privacy guarantees. The focus will be on sharing degree sequences privately, while allowing for statistical utility in the form of estimation and hypothesis testing for the $\beta$ model of random graphs. To provide utility, we need to solve an optimization problem over the polytope of degree sequences for which we present an efficient algorithm. Time permitting, we we also describe asymptotic results and possible directions for future research. We will illustrate our results on simulated and real-life datasets.
The talk will assume no background on networks and privacy.
This is joint work with Aleksandra Slavkovic.

Peter Spirtes: Causal Inference from Algebraic Constraints [ slides ]

When scientists in many fields such as educational research, psychometrics, sociology, economics, etc. are interested in knowing the values of, or inferring causal relations between variables that they cannot directly measure, they typically record several survey or test “items” that are thought to be indicators of latent variables of interest, e.g. “mathematical ability” (educational research) or personality traits such as “impulsiveness” (psychometrics). A number of problems make it difficult to find a correctly specified measurement model: a) associations among items are often confounded by additional unknown latent common causes, b) there are often a plethora of alternative models that are consistent with the data and with the prior knowledge of domain experts, c) there may be non-linear dependencies among latent variables, or linear relationships among non-Gaussian latent variables, and d) there may be feedback relationships among latent variables. This work will use recent algebraic discoveries by Sullivant, Talaska, and Draisma that will allow provably reliable and efficient search procedures for models with several latent factors underlying each pair of items, that involve non-linearities and non-Gaussian variables, and that involve feedback between latent variables of interest.

Piotr Zwiernik: Geometry of binary latent tree models [ slides ]

I want to present some recent results on the geometry of binary tree models (also known as general Markov models). The first series of results requires embedding a latent tree model in the space of tree cumulants. We can show that in this embedding these models admit a monomial parametrization. This allows a detailed analysis of their semi-algebraic structure and model identifiability. However, from inferential point of view it is also important to obtain a tractable description directly in raw probabilities. Focusing on trees with one inner node (star trees) we obtain an exceptionally simple description: a probability tensor lies in the latent tree model on a star tree if and only if it is supermodular and has flattening rank at most 2.
This talk is based on joint projects with Elizabeth Allman, Robin Evans, Stefanie Jegelka, Mateusz Michałek, Luke Oeding, John Rhodes, Jim Q. Smith and Bernd Sturmfels.

Mitsunori Ogawa: Conditional maximal likelihood estimation via holonomic gradient method with A-hypergeometric equations

We discuss the calculation of conditional maximal likelihood estimation of some statistical models. In our setting, the normalizing constant of the conditional likelihood is of the form of summation over whole elements of the fiber, that is, all integral solutions of a linear constraint. Thus, the direct evaluation of the normalizing constant and its gradient vector is infeasible even for the moderate-sized problem. We utilize A-hypergeometric equations to the calculation based on the framework of holonomic gradient method. In our method, we need the Pfaffian system of the A-hypergeometrics system. After reviewing the previous result on the Pfaffian system for Aomoto--Gel'fand system, we discuss the relations on the singular points. We also discuss the calculation of initial value. For this purpose we use the A-hypergeometric differential-difference system.
This is joint work with N. Takayama and A. Takemura.

Kayvan Sadeghi: Hierarchical Models and Marginal Models for Independence Structures of Networks [ slides ]

In this talk we propose a method for creating two families of statistical network models. Members of one family, called hierarchical network models, induce the same sets of conditional independencies as those induced by undirected graphs in graphical models, and members of the other family, called marginal network modes, induce the same sets of conditional independencies as those induced by bidirected graphs. Every member of these families corresponds to a given undirected or bidirected graph, called the dependence graph. Every network model with the diadic independence assumption can be generalized to construct members of these new families of models. In this paper, we apply this method to Erdos-Renyi and Beta models to create hierarchical Erdos-Renyi and Beta models as well as marginal Erdos-Renyi and Beta models. We also propose several interesting problems in order to develop algebraic statistics machinery for these models. These include finding models polytopes and identifiability problems for these models.

Serkan Hosten: Analysis of Chromosome Aberration Frequencies Using Algebraic Statistics [ slides ]

Exchange type chromosome aberrations are rearrangements of the genome commonly observed in cancer cells and in cells exposed to radiation. The frequency of these large scale rearrangements have been used to infer the physical proximity of chromosome territories. However systematic approaches to determine statistically significant chromosome aberrations are few. We here propose a new approach to find significant exchange type chromosome aberrations using the theory of algebraic statistics. This is an application of "classical" algebraic statistics where we use Groebner bases of the second hypersimplex for an MCMC procedure to test the existence of chromosome clusters. Our techniques identify in a data set of irradiated cells two human chromosome pairs that are part of exchanges more frequently than one would expect from randomness.

Dennis Leung: Identifiability of directed Gaussian graphical models with one latent variable [ slides ]

Directed graphical models capture dependence structures that arise from causal relations between random variables of interest. We consider the problem of parameter identifiability in the Gaussian setting, in which we seek conditions that guaranteed that all model parameters can be recovered from the covariance matrix of the observed variables. Specializing to the case of precisely one latent/unobserved variable, we will discuss the results of algebraic computations for graphs up to 7 nodes and describe a new sufficient criterion for finite identifiability.

Ruth Davidson: Distance-based phylogenetic methods near a polytomy. [ slides ]

A phylogenetic tree models the common evolutionary history of a group of species. A tree metric is a distance function on a set of species realized by a tree with edge weights. Distance-based phylogenetic algorithms attempt to solve the NP-hard least-squares phylogeny problem by mapping an arbitrary dissimilarity map representing biological data to a tree metric. The set of all dissimilarity maps is a Euclidean space properly containing the space of all tree metrics as a polyhedral fan. Outputs of distance-based tree reconstruction algorithms such as UPGMA and Neighbor-Joining are points in the maximal cones in the fan. Tree metrics with polytomies, or internal vertices of degree higher than three, lie at the intersections of maximal cones. A phylogenetic algorithm divides the space of all dissimilarity maps into regions based upon which combinatorial tree is reconstructed by the algorithm. We use polyhedral geometry to compare the local nature of the subdivisions induced by least-squares phylogeny, UPGMA, and Neighbor-Joining. Our results suggest that in some circumstances, UPGMA and Neighbor-Joining poorly match least-squares phylogeny when the true tree has a polytomy.
This is joint work with Seth Sullivant.

Elina Robeva: Fixed Points of the EM Algorithm and Nonnegative Rank Boundaries [ slides ]

Matrices of nonnegative rank $r$ represent mixtures of $r$ independent distributions. Likelihood inference for this model leads to problems in real algebraic geometry that will be addressed in this talk. We characterize the fixed point locus of the Expectation Maximization algorithm, and we study the boundary of the space of matrices with nonnegative rank at most $3$. Both of these are algebraic varieties with many irreducible components. Moreover, we give a complete semialgebraic description of the space of matrices of nonnegative rank at most $3$.
This talk is based on our recent paper having the same title with Bernd Sturmfels and Kaie Kubjas.

Caroline Uhler: Between Sphere Packing and Sphere Covering [ slides ]

The classical sphere packing problem asks for the densest arrangement of non-overlapping unit balls, while the classical sphere covering problem asks for the thinnest arrangement of unit balls so that they cover the whole space. Although these problems have attracted a lot of attention by mathematicians, the arrangements of spheres encountered in biological models usually fall between sphere packing and sphere covering: limited overlap and some free space between the balls is allowed and even important. Examples are the spatial organization of chromosomes in the cell nucleus, the spatial organization of neurons, or the arrangement of receptive fields on the retina. This motivates defining and studying generalized versions of the sphere packing and sphere covering problem, where we allow a limited amount of overlap or free-space. We show that the optimal packing and covering lattices are in fact robust to the amount of allowed overlap / free-space and discuss the implications for biological modeling.

Nicolette Meshkat: Identifiability of Linear Compartment Models

Identifiability concerns finding which unknown parameters of a model can be quantified from given input-output data. Many linear ODE models, used primarily in Systems Biology, are unidentifiable, which means that parameters can take on an infinite number of values and yet yield the same input-output data. We study a particular class of unidentifiable models and find conditions to obtain identifiable reparameterizations of these models. In particular, we use a graph-theoretic approach to analyze the models and show that graphs with certain properties allow a monomial scaling reparameterization over identifiable functions of the parameters. We also examine conditions to obtain local identifiability for this particular class of models and show how identifiability can be determined simply by looking at the structure of these linear compartment models.

Jeremy Sumner: Markov invariants and phylogenetic modules [ slides ]

I will consider a continuous-time Markov chain as defining a matrix group G. With polynomial representations of G in mind, Markov invariants are defined as a one-dimensional G-modules, while phylogenetic modules are (possibly higher dimensional) G-modules with the special property that each polynomial in the module is a so-called phylogenetic invariant. I will argue that this perspective is profitable, as understanding how phylogenetic invariants organise into irreducible G-invariant modules may hold the key to constructing statistically robust measures useful in an applied setting. In particular, low-dimensional phylogenetic modules are preferred, with the ideal occurring in the one-dimensional case when the phylogenetic module is spanned by a Markov invariant. In this case, it is possible to show that any reasonable statistical measure that compares alternate tree fitness (such as least squares) is invariant to changes in the underlying stochastic process, a property that is difficult to establish when not dealing with Markov invariants.
By way of example I will discuss the four degree 5 Markov invariants, that exist in the case of quartet phylogenetic trees under the general Markov model of DNA substitution. For a given quartet tree, these Markov invariants can be organised such that two of them form phylogenetic invariants, and remarkably this property can be understood in two independent ways: (i) via analysis of quartet leaf permutations, and (ii) as ``tripod'' invariants on 4x4x16 tensors.

Fabio Rapallo: An algebraic characterization of saturated designs for factorial experiments [ slides ]

In this talk we present some properties of saturated fractions of factorial designs under the perspective of Algebraic Statistics. Exploiting the identification of a fraction with a binary contingency table, we define a criterion to check whether a fraction is saturated or not with respect to a given model. The proposed criterion is based on combinatorial algebraic objects, namely the circuit basis of the toric ideal associated to the design matrix of the model. Moreover, we show that the combinatorial description of a fraction with respect to the circuit basis is strictly related to the notion of D-optimal fraction and to other optimality criteria.
This work is joint with R. Fontana (Politecnico di Torino) and M.P. Rogantin (University of Genova).

Simon Bonner: Application of Algebraic Statistics to Mark-Recapture Models with Misidentification [ slides ]

This talk introduces a new application of algebraic statistics for modeling mark-recapture (MR) data with potential identification errors. Mark-recapture methods are widely used in monitoring populations of wild animals. In a standard MR study, individuals are captured on a sequence of occasions, identified using man-made marks or natural characteristics, and returned to the population. Data or an individual is summarized by its capture history (a string of 0s and 1s indicating when the individual was captured) and a sufficient statistic is the vector of counts for all possible observed histories, n.
Standard MR models assume that individuals are identified correctly whenever they are captured and ignoring potential errors may bias results. Link et al. (2010) introduced a Bayesian framework to account for misidentification under the assumption that that the error process is linear (n = Ax where x represents the unknown counts of the true capture histories and A is a known configuration matrix). Further assuming x to be multinomial with parameters θ, inference is obtained by MCMC sampling from the joint posterior distribution of both x and θ. In particular, Link et al. (2010) employ a Metropolis-Hastings step to update x conditional on θ generating proposals by adding elements from a lattice basis of ker(A) to the current value one at a time.
Link et al. (2010) apply this method to a specific model, and we show that their algorithm does produce an irreducible Markov chain in this case. However, lattice bases cannot connect the fibers for more complex MR models with misidentification. As an example, we present a band-read error model which allows for both false positive and false negative identifications. We show how a Markov basis and Markov subbases for specific data can be generated for this model and discuss further results for MR models with misidentification.

Fatemeh Mohammadi: Matroid ideals and reliability systems [ slides ]

We study various monomial ideals arising in the theory of orientations, and matroids on graphs. We use ideas from commutative algebra and from the theory of Delaunay decompositions for lattices to describe their minimal polyhedral cellular free resolutions. As corollaries, we give conceptual reliability bounds for their associated coherent systems.

Andrew Francis: A group-theoretic approach to inversion distance [ slides ]

The inversion process in bacteria is often used as a means to estimate evolutionary distance. In this talk I will explain how it can also be described using models that involve group theory because each inversion can be regarded as an invertible action on the genome. The use of group theory brings with it an understanding of the process as a random walk on a Cayley graph, and subsequent questions about possible alternatives to use of the minimal distance as a metric.

Anna Klimova: Parametric faithfulness of discrete data [ slides ]

In this talk, we discuss faithfulness and strong-faithfulness for discrete distributions. Graphs are not sufficient to describe the association structure in the discrete case, and there exist distributions that are not faithful to any DAG or undirected graphical model. We consider the class of hypergraphs instead, and introduce parametric (strong-) faithfulness with respect to a hypergraph. Since association in a discrete distribution may be described using different parameterizations and its strength can be quantified using various measures, strong-faithfulness for discrete distributions can be defined in many ways. We explore the consequences of different parameterizations and association measures by computing proportions of distributions that do not satisfy strong-faithfulness.
Joint work with Caroline Uhler and Tamas Rudas.

Johannes Rauh: Polytopes from Subgraph Statistics [ slides ]

For two graphs $H$ and $G$ let $s(H,G)$ be the subgraph density of $H$ in $G$. For a family of graphs $H_1,\dots,H_d$ we obtain a vector of subgraph densities for each graph~$G$. For fixed $N$ I want to study the convex hull of all these vectors for all graphs on $N$ nodes. This polytope plays an important role in the study of random graphs; for example, it equals the convex support of the corresponding exponential random graph model. In my talk I will present relations to the theory of graph limits (graphons), and I will discuss the example of the k-star polytope.

Despina Stasi: The beta model for hypergraphs [ slides ]

Often graphs are used to model relational data that describe k-ary relations for k larger than 2. We define and study the beta model for hypergraphs, an exponential family of probability distributions for hypergraphs, whose natural sufficient statistic vector is the degree sequence of the hypergraph.

Isaac Z. Burke: Universal Gr{\"o}bner Bases of Determinantal Ideals [ poster ]

Here we study the algebraic properties of log-linear independence models, considering in particular 2x2x…x2 independence models. The fiber polytopes of these models are a special class of n-way transportation polytopes, the general properties of which have been well documented for n=2,3.
Given a generic mxn matrix A over a field k whose entries are algebraically independent variables, the set of j-minors of A generates a determinantal ideal I. We associate such a determinantal ideal I with each independence model and seek to enumerate the elements of the universal Groebner basis of I, drawing on results of Sturmfels while making use of the software systems polymake and gfan. Implications are discussed in relation to the problem of describing graphs of n-way transportation polytopes for n > 3.

Abraham Martin del Campo: Goodness-of-fit testing in the Ising Model

Markov bases have been developed in algebraic statistics for exact goodness-of-fit testing. They connect all elements in a fiber (given by the sufficient statistics) and allow building a Markov chain to approximate the distribution of a test statistic by its posterior distribution. However, finding a Markov basis is often computationally intractable. In addition, the number of Markov steps required for converging to the stationary distribution depends on the connectivity of the sampling space. In this joint work with Caroline Uhler and Sarah Cepeda, we compare different test statistics and study the combinatorial structure of the finite lattice Ising model. We propose a new method for exact goodness-of-fit testing. Our technique avoids computing a Markov basis but builds a Markov chain consisting only of simple moves (i.e. swaps of two interior sites). These simple moves might not be sufficient to create a connected Markov chain. We prove that when a bounded change in the sufficient statistics is allowed, the resulting Markov chain is connected. The proposed algorithm not only overcomes the computational burden of finding a Markov basis, but it might also lead to a better connectivity of the sampling space and hence a faster convergence.

Jesus Fernández-Sánchez: Local description of phylogenetic group-based models [ poster ]

We consider phylogenetic varieties defined via group-based models and obtain a system of equations that define these varieties on an open set containing the biologically meaningful points. More precisely, for any finite abelian group $G$, we provide an explicit construction of $codim X$ polynomial equations of degree at most $|G|$ (the cardinality of the group) that define the variety $X$ on a Zariski open set $U$. In particular, the main result gives a positive answer to a conjecture by M. Michalek and, on the set $U$, confirms a conjecture by B. Sturmfels and S. Sullivant.
This talk is based on joint work with M. Casanellas and M. Michalek.

Rina Foygel Barber: Half-trek criterion for generic identifiability of linear structural equation models

We consider a linear structural equation model, which can be represented as a mixed graph, with a directed edge from node i to node j encoding a linear effect of variable i on variable j, while a bidirected edge between nodes i and j encodes possible correlation of the noise terms of the two variables. The half-trek criterion is a method for testing whether the parameters in the model are generically identifiable, meaning that they can be uniquely determined by the joint distribution of the variables (generically, i.e. for almost all chioces of parameter values). The criterion provides both a sufficient condition and a necessary condition for generic identifiability, although some graphs fall into the "gap" between the two conditions and cannot be classified with the method. We reformulate both conditions as max-flow problems, so that the conditions can be tested efficiently on large graphs. We provide a package in R that implements the method and is able to handle graphs with 100+ nodes.
(Joint work with Jan Draisma and Mathias Drton.)

Elizabeth Gross: Dynamic Markov bases using hypergraphs

Social networks and other large sparse data sets pose significant challenges for statistical inference, as many standard statistical methods for testing model/data fit are not applicable in such settings. Algebraic statistics offers a theoretically justified approach to goodness-of-fit testing that relies on the theory of Markov bases and is intimately connected with the geometry of the model as described by its fibers. Most current practices require the computation of the entire basis, which is infeasible in many practical settings. A recent contribution by A. Dobra proposes dynamic Markov bases for sampling tables with fixed marginals. In this talk, we present a dynamic approach to explore the fiber of a general log-linear model, based on the combinatorics of hypergraphs arising from the toric algebra structure of the model. We demonstrate the approach on the Holland-Leinhardt $p_1$ model for random directed graphs.

Kaie Kubjas: Phylogenetic semigroups and phylogenetic networks [ poster ]

Phylogenetic semigroups on trees are semigroups associated with the Cavender-Farris-Neyman model by the toric varieties and semigroups correspondence. Buczynska generalized phylogenetic semigroups from trees to graphs. Similar generalizations can be defined for other group-based models. We will review results about phylogenetic semigroups by Buczynska, Buczynski, Kubjas, Manon, Michalek, and we will explore connections between phylogenetic semigroups and evolutionary models on phylogenetic networks.

Colby Long: Identifiability of 3-Tree Jukes Cantor Mixture Models [ poster ]

If the statistical distribution arising from a phylogenetic mixture model uniquely determines the tree topologies, then we say the tree parameters are generically identifiable. The generic identifiability of the tree parameters has been established for both JC and K2P 2-class mixture models using various algebraic techniques. We extend these results to certain 3-class Jukes Cantor mixture models and provide counterexamples demonstrating the limitations of the previous methods in the 3-class setting. We also seek to resolve numerically any outstanding cases and establish the likelihood of generic identifiability for all 3-class Jukes Cantor mixture models.

Guido Montufar: Geometry of Hidden-Visible Products of Statistical Models [ poster ]

We study the dimension of probability models defined as marginals of exponential families with sufficient statistics matrices given by Kronecker products. An example are hierarchical models with interaction sets defined by Cartesian products of hidden-visible simplicial complexes, including discrete naive Bayes models, restricted Boltzmann machines, mixtures of interaction models, and their Hadamard products as special cases. We describe the tropicalized models in terms of polyhedral-fan slicings of point configurations, and estimate their dimension, showing in many cases that the probability models have the dimension expected from counting parameters.
This is joint work with Jason Morton.

Duy Nguyen: Sampling Properties of Sudoku-based Space-filling Designs

Sudoku is played by millions of people across the globe. It has simple rules and is very addictive. The game board is a nine-by-nine grid of numbers from one to nine. The entries must be filled in subject to no row, column, or three-by-three subsquare containing duplicate numbers. By exploring these three types of uniformity, Xu et al., 2011 proposed an approach to constructing a new type of design, called a Sudoku-based space-filling design. A stepping-stone of the construction is a set of doubly orthogonal Sudoku Latin hypercubes, where the whole designs are mutually orthogonal to each other. For the same position, the subsquares are mutually orthogonal after suitable level collapsing.
Applications of such designs include ensembles of computer models and computer experiments with qualitative and quantitative factors and cross-validation. We study sampling properties of Sudoku-based space-filling designs. For numerical integration, we show that both the whole design and each slice can filter out the main effects and two-way interactions in the functional ANOVA decomposition of variance.
Joint work with Peter Z. G. Qian.

Patrik Norén: Geometry meets Causal Inference

The sparsest permutation (SP) algorithm is an algorithm proposed by Raskutti and Uhler for learning a directed acyclic graph (DAG) based on conditional independence testing. The conditions for consistency of the SP algorithm are strictly weaker than the faithfulness assumptions that many other algorithms require. The drawback of the SP algorithm is that it is extremely slow, since it requires computing a DAG for every permutation of the vertices. Using geometric and combinatorial tools we develop a new computationally efficient version of the SP algorithm. It is based on studying a variant of the permutohedron polytope and understanding which permutations lead to the same DAG. Another advantage of our algorithm is that prior information on the DAG can be incorporated and in fact used to speed up the algorithm, which radically increases its power and applicability.

Jing Xi: Sequential Importance Sampling for Ising Models [ poster ]

The Ising models, which were invented by Lenz in 1920, find its applications in physics and neuroscience. In 1989, Besag and Clifford introduced how to carry out exact tests for 2-dimensional Ising models via Monte Carlo Markov chain (MCMC) sampling. However, their approach does not lead to a connected Markov chain, i.e. their Markov moves cannot connect all tables. Sequential importance sampling (SIS) procedure is another sampling method which have been well developed in sampling contingency tables with linear constraints. It was shown that compared with MCMC-based approach, SIS procedure does not require expensive or prohibitive pre-computations. In this talk, we apply SIS procedure to Ising models, which includes both linear constraints and quadratic constraints. Then we show by simulations that the acceptance rate of SIS procedure can be much more improved using combinatorial techniques. We end the talk with discussion and future questions on the SIS procedure for Ising models.