Our work has received several awards, including best paper awards from the IEEE and the ACM societies as well as American Academy of Advertising, including two best paper awards (on value of activity traces ; and on user-moderator alignment) at CSCW 2023, a best article award at the Journal of Interactive Advertising (2020), an honorable mention award at ACM CSCW 2019, the best paper award at IEEE/ACM ASONAM 2019, best student paper award at JCDL 2007, the best ACM Multimedia demo award in 2006. The best student paper award at ACM Multimedia 2002, best paper runner-up at ACM Multimedia 2007, best student paper runner-up at ICASSP 2006. We also received a best paper award on video retrieval from IEEE Trans. On Circuits and Systems for Video Technology, 2000. In 2002, Prof. Sundaram received the Eliahu I. Jury Award for best Ph.D. dissertation.
Large Language Models (LLMs) have demon- strated remarkable performance on assisting humans in programming and facilitating programming automation. However, existing benchmarks for evaluating the code understanding and generation capacities of LLMs suffer from severe limitations. First, most benchmarks are insufficient as they focus on a narrow range of popular programming languages and specific tasks, whereas real-world software development scenarios show a critical need to implement systems with multilingual and mul- titask programming environments to satisfy diverse requirements. Second, most benchmarks fail to consider the actual executability and the consistency of execution results of the generated code. To bridge these gaps be- tween existing benchmarks and expectations from practical applications, we introduce Code-Scope, an execution-based, multilingual, multi- task, multidimensional evaluation benchmark for comprehensively measuring LLM capabil- ities on coding tasks. CodeScope covers 43 programming languages and eight coding tasks. It evaluates the coding performance of LLMs from three dimensions (perspectives): length, difficulty, and efficiency. To facilitate execution-based evaluations of code generation, we develop MultiCodeEngine, an automated code execution engine that supports 14 programming languages. Finally, we systematically evaluate and analyze eight mainstream LLMs and demonstrate the superior breadth and challenges of CodeScope for evaluating LLMs on code understanding and generation tasks compared to other benchmarks. The CodeScope benchmark and code are publicly available at https://github.com/WeixiangYAN/CodeScope.
Refugees or immigrants who arrive in new countries often feel isolated. In this work, we examine how a resource-bounded public entity can make recommendations to increase integration of these new arrivals into a community. The community is made up of agents who engage in a strategic network formation process; agents join periodically — new arrivals are the refugees. The public entity meanwhile makes a limited number of edge-formation recommendations (according to its resource constraint) per iteration in order to increase integration of refugees. This work investigates the relationship between community trust and network fairness. First, we show that increasing the public entity's resource allocation will not compensate for low trust in the community. Then, we introduce two trust-increasing interventions by the public entity: a targeted advertising campaign, and an announcement to increase transparency. We find that diverting a fraction (20%) of the public entity's resources to a targeted advertising campaign can increase trust and fairness in the community, especially in low trust scenarios. We find that personalized, local announcements are more effective at increasing fairness than global announcements in low trust scenarios; they almost double our fairness metric in some cases. Importantly, the transparent announcement requires no ex- tra resource expenditure on the part of the public entity. Our work underscores the importance of community trust — low trust cannot be compensated for with resources. This work provides theoretical support for these trust-increasing interventions, which we show can lead to increased integration of refugees in communities.
Position paper on how computing can help individuals address climate change.
As large-scale language models become the standard for text generation, there is a greater need to tailor the generations to be more or less concise, targeted, and informative, depend- ing on the audience/application. Existing con- trol approaches primarily adjust the semantic (e.g., emotion, topics), structural (e.g., syntax tree, parts-of-speech), and lexical (e.g., key- word/phrase inclusion) properties of text, but are insufficient to accomplish complex objectives such as pacing which control the complexity and readability of the text. In this paper, we introduce CEV-LM - a lightweight, semi-autoregressive language model that utilizes constrained edit vectors to control three complementary metrics (speed, volume, and circuitousness) that quantify the shape of text (e.g., pacing of content). We study an extensive set of state-of-the-art CTG models and find that CEV-LM provides significantly more targeted and precise control of these three metrics while preserving semantic content, using less training data, and containing fewer parameters.
Existing works on outline-conditioned text generation typically aim to generate text using provided outlines as rough sketches, such as keywords and phrases. However, these approaches make it challenging to control the quality of text generation and assess consistency between outlines and generated texts due to lack of clarity and rationality of the rough outlines. In this paper, we introduce a novel text generation task called Precise Outline-conditioned Gener- ation, which requires generating stories based on specific, sentence-level outlines. To facilitate research on this task, we construct two new datasets, WPOG and CDM. We provide strong baselines based on fine-tuning models such as BART and GPT-2, and evaluating zero- shot performance of models such as ChatGPT and Vicuna. Furthermore, we identify an issue of imbalanced utilization of the outline infor- mation in the precise outline-conditioned generation, which is ubiquitously observed across fine-tuned models and zero-shot inference models. To address this issue, we propose an ex- plicit outline utilization control approach and a novel framework that leverages the task duality between summarization and generation. Experimental results show that the pro- posed approaches effectively alleviate the issue of imbalanced outline utilization and enhance the quality of precise outline-conditioned text generation for both fine-tuning and zero-shot settings.
We show the effectiveness of automatic differentiation in efficiently and correctly computing and controlling the spectrum of im- plicitly linear operators, a rich family of layer types including all standard convolu- tional and dense layers. We provide the first clipping method which is correct for general convolution layers, and illuminate the rep- resentational limitation that caused correct- ness issues in prior work. We study the effect of the batch normalization layers when concatenated with convolutional layers and show how our clipping method can be applied to their composition. By comparing the accuracy and performance of our algorithms to the state-of-the-art methods, using various experiments, we show they are more precise and efficient and lead to better generalization and adversarial robustness. We provide the code for using our methods at https://github.com/Ali-E/FastClip.
Generating follow-up questions on the fly could significantly improve conversational survey quality and user experiences by enabling a more dynamic and personalized survey structure. In this paper, we proposed a novel task for knowledge-driven follow-up question generation in conversational surveys. We constructed a new human-annotated dataset of human- written follow-up questions with dialogue his- tory and labeled knowledge in the context of conversational surveys. Along with the dataset, we designed and validated a set of reference- free Gricean-inspired (Grice, 1975) evaluation metrics to systematically evaluate the quality of generated follow-up questions. We then propose a two-staged knowledge-driven model for the task, which generates informative and coherent follow-up questions by using knowledge to steer the generation process. The experiments demonstrate that compared to GPT- based baseline models, our two-staged model generates more informative, coherent, and clear follow-up questions.
We show the effectiveness of automatic differentiation in efficiently and correctly computing and controlling the spectrum of implicitly linear operators, a rich family of layer types including all standard convolutional and dense layers. We provide the first clipping method which is correct for general convolution layers, and illuminate the representational limitation that caused correctness issues in prior work. By comparing the accuracy and performance of our methods to existing methods, using various experiments, show they lead to better generalization and adversarial robustness of the models. In addition to these advantages over the state-of-the-art methods, we show they are much faster than the alternatives.
Peer evaluations are a well-established tool for evaluating individual and team performance in collaborative contexts, but are susceptible to social and cognitive biases. Current peer evaluation tools have also yet to address the unique opportunities that online collaborative technologies provide for addressing these biases. In this work, we explore the potential of one such opportunity for peer evaluations: data traces automatically generated by collaborative tools, which we refer to as "activity traces". We conduct a between-subjects experiment with 101 students and MTurk &lq workers, investigating the effects of reviewing activity traces on peer evaluations of team members in an online collaborative task. Our findings show that the usage of activity traces led participants to make more and greater revisions to their evaluations compared to a control condition. These revisions also increased the consistency and participants' perceived accuracy of the evaluations that they received. Our findings demonstrate the value of activity traces as an approach for performing more reliable and objective peer evaluations of teamwork. Based on our findings, as well as qualitative analysis of free-form responses in our study, we also identify and discuss key considerations and design recommendations for incorporating activity traces into real-world peer evaluation systems.
Social media sites like Reddit, Discord, and Clubhouse utilize a community-reliant approach to content moderation. Under this model, volunteer moderators are tasked with setting and enforcing content rules within the platforms' sub-communities. However, few mechanisms exist to ensure that the rules set by moderators reflect the values o f their community. Misalignments between users and moderators can be detrimental to community health. Yet little quantitative work has been done to evaluate the prevalence or nature of user-moderator misalignment. Through a survey of 798 users on r/ChangeMyView, we evaluate user-moderator alignment at the level of policy-awareness (does users know what the rules are?), practice- awareness (do users know how the rules are applied?) and policy-/practice-support (do users agree with the rules and how they are applied?). We find that policy-support is high, while practice-support is low--- using a hierarchical Bayesian model we estimate the correlation between community opinion and moderator decisions to range from .14 to .45 across subreddit rules. Surprisingly, these correlations were only slightly higher when users were asked to predict moderator actions, demonstrating low awareness of moderation practices. Our findings demonstrate the need for careful analysis of user-moderator alignment at multiple levels. We argue that future work should focus on building tools to empower communities to conduct these analyses themselves.
In many online markets we «shop alone» — there is no way for us to know the prices other consumers paid for the same goods. Could this lack of price transparency lead to differential pricing? To answer this question, we present a generalized framework to audit online markets for differential pricing using automated agents. Consensus is a key idea in our work: for a successful black-box audit, both the experimenter and seller must agree on the agents' attributes. We audit two competitive online travel markets on kayak. com (flight and hotel markets) and construct queries representative of the demand for goods. Crucially, we assume ignorance of the sellers' pricing mechanisms while conducting these audits. We conservatively implement consensus with nine distinct profiles based on behavior, not demographics. We use a structural causal model for price differences and estimate model parameters using Bayesian inference. We can unambiguously show that many sellers (but not all) demonstrate behavior-driven differential pricing. In the flight market, some profiles are nearly 90% more likely to see a worse price than the best performing profile, and nearly 60% more likely in the hotel market. While the control profile (with no browsing history) was on average offered the best prices in the flight market, surprisingly, other profiles outperformed the control in the hotel market. The price difference between any pair of profiles occurring by chance is $ 0.44 in the flight market and $ 0.09 for hotels. However, the expected loss of welfare for any profile when compared to the best profile can be as much as $ 6.00 for flights and $ 3.00 for hotels (i.e., 15x and 33x the price difference by chance respectively). This illustrates the need for new market designs or policies that encourage more transparent market design to overcome differential pricing practices.
Informed consent is a core cornerstone of ethics in human subject research. Through the informed consent process, participants learn about the study procedure, benefits, risks, and more to make an informed decision. However, recent studies showed that current practices might lead to uninformed decisions and expose participants to unknown risks, especially in online studies. Without the researcher's presence and guidance, online participants must read a lengthy form on their own with no answers to their questions. In this paper, we examined the role of an AI-powered chatbot in improving informed consent online. By comparing the chatbot with form-based interaction, we found the chatbot improved consent form reading, promoted participants' feelings of agency, and closed the power gap between the participant and the researcher. Our exploratory analysis further revealed the altered power dynamic might eventually benefit study response quality. We discussed de-sign implications for creating AI-powered
We present InfoMotif, a new semi-supervised, motif-regularized, learning framework over graphs. We overcome two key limitations of message passing in popular graph neural networks (GNNs): localization (a k-layer GNN cannot utilize features outside the k-hop neighborhood of the labeled training nodes) and over-smoothed (structurally indistinguishable) representations. We formulate attributed structural roles of nodes based on their occurrence in different network motifs, independent of network proximity. Network motifs are higher-order structures indicating connectivity patterns between nodes and are crucial to the organization of complex networks. Two nodes share attributed structural roles if they participate in topologically similar motif instances over covarying sets of attributes. InfoMotif achieves architecture-agnostic regularization of arbitrary GNNs through novel self-supervised learning objectives based on mutual information maximization. Our training curriculum dynamically prioritizes multiple motifs in the learning process without relying on distributional assumptions in the underlying graph or the learning task. We integrate three state-of-the-art GNNs in our framework, to show notable performance gains (3--10% accuracy) across nine diverse real-world datasets spanning homogeneous and heterogeneous networks. Notably, we see stronger gains for nodes with sparse training labels and diverse attributes in local neighborhood structures.
Adopting contextually appropriate, audience-tailored linguistic styles is critical to the success of user-centric language generation systems (e.g., chatbots, computer-aided writing, dialog systems). While existing approaches demonstrate textual style transfer with large volumes of parallel or non-parallel data, we argue that grounding style on audience-independent external factors is innately limiting for two reasons. First, it is difficult to collect large volumes of audience-specific stylistic data. Second, some stylistic objectives (e.g., persuasiveness, memorability, empathy) are hard to define without audience feedback. In this paper, we propose the novel task of style infusion - infusing the stylistic preferences of audiences in pretrained language generation models. Since humans are better at pairwise comparisons than direct scoring - i.e., is Sample-A more persuasive/polite/empathic than Sample-B - we leverage limited pairwise human judgments to boot-strap a style analysis model and augment our seed set of judgments. We then infuse the learned textual style in a GPT-2 based text generator while balancing fluency and style adoption. With quantitative and qualitative assessments, we show that our infusion approach can generate compelling stylized examples with generic text prompts. The code and data are accessible at https://github.com/ CrowdDynamicsLab/StyleInfusion
In this paper, we propose MuTATE, a Multi-Task Augmented approach to learn Transferable Embeddings of knowledge graphs. Previous knowledge graph representation techniques either employ task-agnostic geometric hypotheses to learn informative node embeddings or integrate task-specific learning objectives like attribute prediction. In contrast, our framework unifies multiple co-dependent learning objectives with knowledge graph enrichment. We define co-dependence as multiple tasks that extract covariant distributions of entities and their relationships for prediction or regression objectives. We facilitate knowledge transfer in this setting: tasks → graph, graph → tasks, and task-1 → task-2 via task-specific residual functions to specialize the node embeddings for each task, motivated by domain-shift theory. We show 5% relative gains over state-of-the-art knowledge graph embedding baselines on two public multi-task datasets and show significant potential for cross-task learning.
In recent times, deep learning methods have supplanted conventional collaborative filtering approaches as the backbone of modern recommender systems. However, their gains are skewed towards popular items with a drastic performance drop for the vast col- lection of long-tail items with sparse interactions. Moreover, we empirically show that prior neural recommenders lack the resolution power to accurately rank relevant items within the long-tail. In this paper, we formulate long-tail item recommendations as a few-shot learning problem of learning-to-recommend few-shot items with very few interactions. We propose a novel meta-learning framework ProtoCF that learns-to-compose robust prototype representations for few-shot items. ProtoCF utilizes episodic few-shot learning to extract meta-knowledge across a collection of diverse meta-training tasks designed to mimic item ranking within the tail. To further enhance discriminative power, we propose a novel architecture-agnostic technique based on knowledge distillation to extract, relate, and transfer knowledge from neural base recommenders. Our experimental results demonstrate that ProtoCF consistently outperforms state-of-art approaches on overall recommendation (by 5% Recall@50) while achieving significant gains (of 60-80% Recall@50) for tail items with less than 20 interactions.
Purpose: Unhealthy eating is a major modifiable risk factor for noncommunicable diseases and obesity, and remote acculturation to U.S. culture is a recently identified cultural determinant of unhealthy eating among adolescents and families in low/middle-income countries. This small-scale randomized controlled trial evaluated the efficacy of the “JUS Media? Programme,” a food-focused media literacy intervention promoting healthier eating among remotely acculturating adolescents and mothers in Jamaica. Methods: Gender-stratified randomization of 184 eligible early adolescents and mothers in Kingston, Jamaica (i.e., 92 dyads: M_adolescent.age = 12.79 years, 51% girls) determined 31 “Workshops-Only” dyads, 30 “Workshops + SMS/texting” dyads, and 31 “No-Intervention-Control” dyads. Nutrition knowledge (food group knowledge), nutrition attitudes (stage of nutritional change), and nutrition behavior (24-hour recall) were primary outcomes assessed at four time points (T1/baseline, T2, T3, T4) across 5 months using repeated measures analysis of covariances. Results: Compared to control, families in one or both intervention groups demonstrated significantly higher nutrition knowledge (T3 adolescents, T4 mothers: mean differences .79–1.08 on a 0–6 scale, 95% confidence interval [CI] .12–1.95, Cohen’s ds = .438–.630); were more prepared to eat fruit daily (T3 adolescents and mothers: .36–.41 on a 1–5 scale, 95% CI .02–.77, ds = .431–.493); and were eating more cooked vegetables (T4 adolescents and T2 and T4 mothers: .20–.26 on a 0–1 scale, 95% CI -.03–.50, ds = .406-.607). Post-intervention focus groups (6-month-delay) revealed major positive impacts on participants’ health and lives more broadly. Conclusions: A food-focused media literacy intervention for remotely acculturating adolescents and mothers can improve nutrition. Replication in Jamaica and extension to the Jamaican diaspora would be useful.
Surveys are a common instrument to gauge self-reported opinions from the crowd for scholars in the CSCW community, the social sciences, and many other research areas. Researchers often use surveys to prioritize a subset of given options when there are resource constraints. Over the past century, researchers have developed a wide range of surveying techniques, including one of the most popular instruments, the Likert ordinal scale [49], to elicit individual preferences. However, the challenge to elicit accurate and rich self-reported responses with surveys in a resource-constrained context still persists today. In this study, we examine Quadratic Voting (QV), a voting mechanism powered by the affordances of a modern computer and straddles ratings and rankings approaches [64], as an alternative online survey technique. We argue that QV could elicit more accurate self-reported responses compared to the Likert scale when the goal is to understand relative preferences under resource constraints. We conducted two randomized controlled experiments on Amazon Mechanical Turk, one in the context of public opinion polling and the other in a human-computer interaction user study. Based on our Bayesian analysis results, a QV survey with a sufficient amount of voice credits, aligned significantly closer to participants’ incentive-compatible behaviors than a Likert scale survey, with a medium to high effect size. In addition, we extended QV’s application scenario from typical public policy and education research to a problem setting familiar to the CSCW community: a prototypical HCI user study. Our experiment results, QV survey design, and QV interface serve as a stepping stone for CSCW researchers to further explore this surveying methodology in their studies and encourage decision-makers from other communities to consider QV as a promising alternative.
The rise of social media has changed the nature of the fashion industry. Influence is no longer concentrated in the hands of an elite few: social networks have distributed power across a broader set of tastemakers. To understand this new landscape of influence, we created FITNet—a network of the top 10k influencers of the larger Twitter fashion graph. To construct FITNet, we trained a content-based classifier to identify fashion- relevant Twitter accounts. Leveraging this classifier, we estimated the size of Twitter’s fashion subgraph, snowball sampled more than 300k fashion-related accounts based on following relationships, and identified the top 10k influencers in the resulting subgraph. We use FITNet to perform a large-scale analysis of fashion influencers, and demonstrate how the network facilitates discovery, surfacing influencers relevant to specific fashion topics that may be of interest to brands, retailers, and media companies.
The quality variance in user-generated content is a major bottleneck to serving communities on online platforms. Current content ranking methods primarily evaluate textual/non-textual features of each user post in isolation. This paper demonstrates the utility of the implicit and explicit relational aspects across user con-tent to assess their quality. First, we develop a modular platform-agnostic framework to represent the contrastive (or competing) and similarity-based relational aspects of user-generated content via independently induced content graphs. Second, we develop two complementary graph convolutional operators that enable feature contrast for competing content and feature smoothing/sharing for similar content. Depending on the edge semantics of each content graph, we embed its nodes via one of the above two mechanisms. We show that our contrastive operator creates discriminative magnification across the embeddings of competing posts. Third, we show a surprising result—applying classical boosting techniques to combine embeddings across the content graphs significantly outperforms the typical stacking, fusion, or neighborhood aggregation methods in graph convolutional architectures. We exhaustively validate our method via accepted answer prediction over fifty diverse Stack-Exchanges with consistent relative gains of ∼5% accuracy over state-of-the-art neural, multi-relational and textual baselines.
Recommender systems can benefit from a plethora of signals influencing user behavior such as her past interactions, her social connections, as well as the similarity between different items. However, existing methods are challenged when taking all this data into account and often do not exploit all available information. This is primarily due to the fact that it is non-trivial to combine the various information as they mutually influence each other. To address this shortcoming, here, we propose a ‘Fusion Recommender’ (FuseRec), which models each of these factors separately and later combines them in an interpretable manner. We find this general framework to yield compelling results on all three investigated datasets, Epinions, Ciao, and CiaoDVD, outperforming the state-of-the-art by more than 14% for Ciao and Epinions. In addition, we provide a detailed ablation study, showing that our combined model achieves accurate results, often better than any of its components individually. Our model also provides insights on the importance of each of the factors in different datasets.
This study presents development of a coding system to examine food parenting topics presented in posts on social media, and compared topics between two social media platforms (Facebook, Reddit). Publicly available social media posts were gathered from Facebook (2 groups) and Reddit (3 subreddits) and a coding system was developed based on the concept map of food parenting proposed by Vaughn et al. (2016). Based on the developed coding system, we coded posts into overarching food parenting practice constructs (coercive control: attempts to dominate, pressure or impose parents’ will on child, structure: organization of child’s environment to facilitate competence, autonomy support: supporting child’s ability to self-regulate through allowing food choices, conversations about food, and a positive emotional climate) and recipes. We also coded posts dichotomously as including a question or advice-seeking. Differences in frequencies of food parenting constructs presented in posts on Facebook and Reddit were considered using chi-square tests of independence. Of the 2459 posts coded, 900 were related to food parenting (37%). In the subsample of 900, posts related structure (43%) and recipes (40%) were the most frequent. Close to half of the posts (44%) included questions about food parenting. Frequency of food parenting topics in posts was related to social media platform, with coercive control and structure more frequently discussed on Reddit and recipes more commonly posted on Facebook. Results suggest that food parenting topics discuss on social media differ by platform, which can aid researchers and practitioners in targeting social media-based outreach to the topics of most interest for users. Findings give insight into the everyday food parenting topics and questions that parents and caregivers may be exposed to on social media. Taxonomy: Development of Feeding; Parenting; Online Information Services.
This paper introduces a novel task-independent sampler for attributed networks. The problem is important because while data mining tasks on network content are common, sampling on internet-scale networks is costly. Link-trace samplers such as Snowball sampling, Forest Fire, Random Walk, Metropolis-Hastings Random Walk are widely used for sampling from networks. The design of these attribute-agnostic samplers focuses on preserving salient properties of network structure, and are not optimized for tasks on node content. This paper has three contributions. First, we propose a task-independent, attribute aware link-trace sampler grounded in Information Theory. Our sampler greedily adds to the sample the node with the most informative (i.e. surprising) neighborhood. The sampler tends to rapidly explore the attribute space, maximally reducing the surprise of unseen nodes. Second, we prove that content sampling is an NP-hard problem. A well-known algorithm best approximates the optimization solution within 1 − 1/e, but requires full access to the entire graph. Third, we show through empirical counterfactual analysis that in many real-world datasets, network structure does not hinder the performance of surprise based link-trace samplers. Experimental results over 18 real-world datasets reveal: surprise-based samplers are sample efficient, outperform the state-of-the-art attribute-agnostic samplers by a wide margin (e.g. 45% performance improvement in clustering tasks).
Cause‐related marketing (CRM) refers to the phenomenon where brands partner with causes, such as nonprofit organizations. Consumers may see some CRM partnerships as less compatible than others, however the level of perceived compatibility differs from one consumer to another. We know a great deal about how perceptions of compatibility affect attitude and behavior toward CRM partnerships, but we know less about how to predict a consumer’s perception of compatibility. Therefore, our purpose was to investigate the boundaries in which balance theory could be used to make predictions about consumers’ responses to CRM partnerships. This is the first study to consider the construct of attitude strength (vs. attitude alone) when considering balance theory. We found that a consumer’s attitude toward a brand, along with their attitude toward a cause, predicts their perceptions of CRM compatibility. We also found that CRM triadic balance could be predicted when attitude strength was included in the models, and that balance theory allowed us to observe preliminary evidence of attitude and attitude strength spillover effects in CRM triads. Practitioners can use these insights to determine which organizations to partner with, as well as determine how advertising these partnerships may affect acceptance of these partnerships.
We present InfoMotif, a new semi-supervised, motif-regularized, learning framework over graphs. We overcome two key limitations of message passing in popular graph neural networks (GNNs): localization (a k-layer GNN cannot utilize features outside the k-hop neighborhood of the labeled training nodes) and over-smoothed (structurally indistinguishable) representations. We propose the concept of attributed structural roles of nodes based on their occurrence in different network motifs, independent of network proximity. Two nodes share attributed structural roles if they participate in topologically similar motif instances over co-varying sets of attributes. Further, InfoMotif achieves architecture independence by regularizing the node representations of arbitrary GNNs via mutual information maximization. Our training curriculum dynamically prioritizes multiple motifs in the learning process without relying on distributional assumptions in the underlying graph or the learning task. We integrate three state-of-the-art GNNs in our framework, to show significant gains (3–10% accuracy) across six diverse, real-world datasets. We see stronger gains for nodes with sparse training labels and diverse attributes in local neighborhood structures.
As colleges and universities continue their commitment to increasing access to higher education through offering education online and at scale, attention on teaching open-ended subjects online and at scale, mainly the arts, humanities, and the social sciences, remains limited. While existing work in scaling open-ended courses primarily focuses on the evaluation and feedback of open-ended assignments, there is a lack of understanding of how to effectively teach open-ended, university-level courses at scale. To better understand the needs of teaching large-scale, open-ended courses online effectively in a university setting, we conducted a mixed- methods study with university instructors and students, using surveys and interviews, and identified five critical pedagogical elements that distinguish the teaching and learning experiences in an open-ended course from that in a non-open-ended course. An overarching theme for the five elements was the need to support students’ self-expression. We further uncovered open challenges and opportunities when incorporating the five critical pedagogical elements into large-scale, open-ended courses online in a university setting, and suggested six future research directions: (1) facilitate in-depth conversations, (2) create a studio-friendly environment, (3) adapt to open-ended assessment, (4) scale individual open-ended feedback, (5) establish trust for self-expression, and (6) personalize instruction and harness the benefits of student diversity.
To advance the emerging research field of computational advertising this article describes the new computational advertising ecosystem, identifies key actors within it and interactions among them, and discusses future research agendas. Specifically, we propose systematic conceptualization for the redefined advertising industry, consumers, government, and technology environmental factors, and discuss emerging and anticipated tensions that arise in the macro and exogenous factors surrounding the new computational advertising industry, leading to suggestions for future research directions. From multidisciplinary angles, areas of tension and related research questions are explored from advertising, business, computer science, and legal perspectives. The proposed research agendas include exploring transparency of computational advertising practice and consumer education; understanding the trade-off between explainability and performance of algorithms; exploring the issue of new consumers as free data laborers, data as commodity, and related consumer agency challenges; understanding the relationship between algorithmic transparency and consumers’ literacy; evaluating the trade-off between algorithmic fairness and privacy protection; examining legal and regulatory issues regarding power imbalance between actors in the computational advertising ecosystem; and studying the trade-off between technological innovation and consumer protection and empowerment.
The rapid proliferation of new users and items on the social web has aggravated the gray-sheep user/long-tail item challenge in recommender systems. Historically, cross-domain co-clustering methods have successfully leveraged shared users and items across dense and sparse domains to improve inference quality. However, they rely on shared rating data and cannot scale to multiple sparse target domains (i.e., the one-to-many transfer setting). This, combined with the increasing adoption of neural recommender architectures, motivates us to develop scalable neural layer-transfer approaches for cross-domain learning. Our key intuition is to guide neural collaborative filtering with domain-invariant components shared across the dense and sparse domains, improving the user and item representations learned in the sparse domains. We leverage contextual invariances across domains to develop these shared modules, and demonstrate that with user-item interaction context, we can learn-to-learn informative representation spaces even with sparse interaction data. We show the effectiveness and scalability of our approach on two public datasets and a massive transaction dataset from Visa, a global payments technology company (19% Item Recall, 3x faster vs. training separate models for each domain). Our approach is applicable to both implicit and explicit feedback settings.
We study the problem of making item recommendations to ephemeral groups, which comprise users with limited or no historical activities together. Existing studies target persistent groups with substantial activity history, while ephemeral groups lack historical interactions. To overcome group interaction sparsity, we propose data-driven regularization strategies to exploit both the preference covariance amongst users who are in the same group, as well as the contextual relevance of users’ individual preferences to each group. We make two contributions. First, we present a recommender architecture-agnostic framework GroupIM that can integrate arbitrary neural preference encoders and aggregators for ephemeral group recommendation. Second, we regularize the user-group latent space to overcome group interaction sparsity by: maximizing mutual information between representations of groups and group members; and dynamically prioritizing the preferences of highly informative members through contextual preference weighting. Our experimental results on several real-world datasets indicate significant performance improvements (31-62% relative NDCG@20) over state-of-the-art group recommendation techniques.
While researchers have developed rigorous practices for offline housing audits to enforce the US Fair Housing Act, the online world lacks similar practices. In this work we lay out principles for developing and performing online fairness audits. We demonstrate a controlled sock-puppet audit technique for building online profiles associated with a specific demographic profile or intersection of profiles, and describe the requirements to train and verify profiles of other demographics. We also present two audits using these sock-puppet profiles. The first audit explores the number and content of housing-related ads served to a user. The second compares the ordering of personalized recommendations on major housing and real-estate sites. We examine whether the results of each of these audits exhibit indirect discrimination: whether there is correlation between the content served and users' protected features, even if the system does not know or use these features explicitly. Our results show differential treatment in the number and type of housing ads served based on the user's race, as well as bias in property recommendations based on the user's gender. We believe this framework provides a compelling foundation for further exploration of housing fairness online.
Community discussion forums are increasingly used to seek advice; however, they often contain conflicting and unreliable information. Truth discovery models estimate source reliability and infer information trustworthiness simultaneously in a mutual reinforcement manner, and can be used to distinguish trustworthy comments with no supervision. However, they do not capture the diversity of word expressions and learn a single reliability score for the user. CrowdQM addresses these limitations by modeling the fine-grained aspect-level reliability of users and incorporate semantic similarity between words to learn a latent trustworthy comment embedding. We apply our latent trustworthy comment for comment ranking for three diverse communities in Reddit and show consistent improvement over non-aspect based approaches. We also show qualitative results on learned reliability scores and word embeddings by our model.
Some social networks provide explicit mechanisms to allocate social rewards such as reputation based on users’ actions, while the mechanism is more opaque in other networks. Nonetheless, there are always individuals who obtain greater rewards and reputation than their peers. An intuitive yet important question to ask is whether these successful users employ strategic behaviors to become influential. It might appear that the influencers "have gamed the system." However, it remains difficult to conclude the rationality of their actions due to factors like the combinatorial strategy space, inability to determine payoffs, and resource limitations faced by individuals. The challenging nature of this question has drawn attention from both the theory and data mining communities. Therefore, in this paper, we are motivated to investigate if resource-limited individuals discover strategic behaviors associated with high payoffs when producing collaborative/interactive content in social networks. We propose a novel framework of Dynamic Dual Attention Networks (DDAN) which models individuals’ content production strategies through a generative process, under the influence of social inter- actions involved in the process. Extensive experimental results illustrate the model’s effectiveness in user behavior modeling. We make three strong empirical findings: (1) Different strategies give rise to different social payoffs; (2) The best performing individuals exhibit stability in their preference over the discovered strategies, which indicates the emergence of strategic behavior; and (3) The stability of a user’s preference is correlated with high payoffs.
This paper examines the use of the abstract comic form for persuading online charitable donations. Persuading individuals to contribute to charitable causes online is hard and responses to the appeals are typically low; charitable donations share the structure of public goods dilemmas where the rewards are distant and non- exclusive. In this paper, we examine if comics in abstract form are more persuasive than in the plain text form. Drawing on a rich literature on comics, we synthesized a three-panel abstract comic to create our appeal. We conducted a between-subject study with 307 participants from Amazon Mechanical Turk on the use of abstract comic form to appeal for charitable donations. As part of our experimental procedure, we sought to persuade individuals to contribute to a real charity focused on Autism research with monetary costs. We compared the average amount of donation to the charity under three conditions: the plain text message, an abstract comic that includes the plain text, and an abstract comic that additionally includes the social proof. We use Bayesian modeling to analyze the results, motivated by model transparency and its use in small-sized studies. Our experiments reveal that the message in abstract comic form elicited significantly more donations than text form (medium to large effect size=0.59). Incorporating social proof in the abstract comic message did not show a significant effect. Our studies have design implications: non-profits and governmental agencies interested in alleviating public goods dilemmas that share a similar structure to our experiment (single-shot task, distant, non-exclusive reward) ought to consider including messages in the abstract comic form as part of their online fund-raising campaign.
In content-based online platforms, use of aggregate user feedback (say, the sum of votes) is commonplace as the “gold standard” for measuring content quality. Use of vote aggregates, however, is at odds with the existing empirical literature, which suggests that voters are susceptible to different biases—reputation (e.g., of the poster), social influence (e.g., votes thus far), and position (e.g., answer position). Our goal is to quantify, in an observational setting, the degree of these biases in online platforms. Specifically, what are the causal effects of different impression signals—such as the reputation of the contributing user, aggregate vote thus far, and position of content—on a participant’s vote on content? We adopt an instrumental variable (IV) framework to answer this question. We identify a set of candidate instruments, carefully analyze their validity, and then use the valid instruments to reveal the effects of the impression signals on votes. Our empirical study using log data from Stack Exchange websites shows that the bias estimates from our IV approach differ from the bias estimates from the ordinary least squares (OLS) method. In particular, OLS underestimates reputation bias (1.6–2.2x for gold badges) and position bias (up to 1.9x for the initial position) and overestimates social influence bias (1.8–2.3x for initial votes). The implications of our work include: redesigning user interface to avoid voter biases; making changes to platforms’ policy to mitigate voter biases; detecting other forms of biases in online platforms.
Cause‐related marketing (CRM) refers to the phenomenon where brands partner with causes, such as nonprofit organizations. Consumers may see some CRM partnerships as less compatible than others, however the level of perceived compatibility differs from one consumer to another. We know a great deal about how perceptions of compatibility affect attitude and behavior toward CRM partnerships, but we know less about how to predict a consumer’s perception of compatibility. Therefore, our purpose was to investigate the boundaries in which balance theory could be used to make predictions about consumers’ responses to CRM partnerships. This is the first study to consider the construct of attitude strength (vs. attitude alone) when considering balance theory. We found that a consumer’s attitude toward a brand, along with their attitude toward a cause, predicts their perceptions of CRM compatibility. We also found that CRM triadic balance could be predicted when attitude strength was included in the models, and that balance theory allowed us to observe preliminary evidence of attitude and attitude strength spillover effects in CRM triads. Practitioners can use these insights to determine which organizations to partner with, as well as determine how advertising these partnerships may affect acceptance of these partnerships.
This paper proposes a novel framework to incorporate social regularization for item recommendation. Social regularization grounded in ideas of homophily and influence appears to capture latent user preferences. However, there are two key challenges: first, the importance of a specific social link depends on the context and second, a fundamental result states that we cannot disentangle homophily and influence from observational data to determine the effect of social inference. Thus we view the attribution problem as inherently adversarial where we examine two competing hypothesis—social influence and latent interests—to explain each purchase decision.We make two contributions. First, we propose a modular, adversarial framework that decouples the architectural choices for the recommender and social representation models, for social regularization. Second, we overcome degenerate solutions through an intuitive contextual weighting strategy, that supports an expressive attribution, to ensure informative social associations play a larger role in regularizing the learned user interest space. Our results indicate significant gains (5-10% relative Recall@K) over state-of-the-art baselines across multiple publicly available datasets.
This short paper is based on summary of the full report of the NSF Workshop on Multimedia Challenges, Opportunities and Research Roadmaps held on March 30-31, 2017 in Washington DC. This material is based upon work supported by the National Science Foundation under Grant No. 1735591. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
This paper proposes a novel algorithm to discover hidden individuals in a social network. The problem is increasingly important for social scientists as the populations (e.g., individuals with mental illness) that they study converse online. Since these populations do not use the category (e.g., mental illness) to self-describe, directly querying with text is non-trivial. To by-pass the limitations of network an query re-writing frameworks, we focus on identifying hidden populations through attributed search. We propose a hierarchical Multi-Arm Bandit (DT-TMP) sampler that uses a decision tree coupled with reinforcement learning to query the combinatorial attributed search space by exploring and expanding along high yielding decision- tree branches. A comprehensive set of experiments over a suite of twelve sampling tasks on three online web platforms, and three offline entity datasets reveals that DT-TMP outperforms all baseline samplers by upto a margin of 54% on Twitter and 48% on RateMDs. An extensive ablation study confirms DT-TMP’s superior performance under different sampling scenarios.
This paper proposes an attributed network growth model. Despite the knowledge that individuals use limited resources to form connections to similar others, we lack an understanding of how local and resource-constrained mechanisms explain the emergence of structural properties found in real-world networks. We make three contributions. First, we propose an simple and accurate model of attributed network growth that jointly explains the emergence of in-degree, local clustering, clustering-degree relationship and attribute mixing patterns. Second, we make use of biased random walks to develop a model that forms edges locally, without recourse to global information. Third we account for multiple sociological phenomena: bounded rationality; structural constraints; triadic closure; attribute homophily preferential attachment. Our experiments show that the proposed Attributed Random Walk (ARW) model accurately preserves network structure and attribute mixing patterns of real-world networks; it improves upon the performance of eight well-known models by a significant margin of 2.5--10x.
This paper proposes a generative model to identify roles in Community Question Answering (CQA) communities and shows how differences in role composition influences community health. Community Question Answering (CQA) platforms are useful for people to share knowledge and to form ad-hoc social networks around specific topics. User experience across communities varies: whether a question will be answered, the extent of the delay in responding to a question, etc. While past research shows that participants play different roles in online communities, we investigate a complementary question: how does the distribution of roles differ across communities and do these differences help explain the differences in user experience? We propose the use of a generative model for inferring action- based roles for users both at the level of an individual brows- ing session as well as at the broader community level. Our model is specifically designed to produce descriptions of user behavior roles in the form of interpretable probability distributions over the atomic actions a user may take within a community while also modeling the composition of those roles inside individual communities to facilitate cross-community analysis. A comprehensive experiment on all 161 non-meta communities on the StackExchange CQA platform reveals three empirical insights. First, we show interesting distinctions within CQA communities in question-asking behavior (where two distinct types of askers can be identified) and answering behavior (where two distinct roles surrounding answers emerge). Second, clustering communities with similar role compositions according reveals that these clusters have interesting topical differences as well as statistically significant differences in mean health for different health metrics. Third, we show that each discovered cluster corresponds to a distinct evolution of role composition that suggests that users engaging in discussion on answers will eventually become the dominant role in most communities and remain that way.
In this paper, we interpret the community question answering websites on the StackExchange platform as knowledge markets, and analyze how and why these markets can fail at scale. A knowledge market framing allows site operators to reason about market failures, and to design policies to prevent them. Our goal is to provide insights on large-scale knowledge market failures through an interpretable model. We explore a set of interpretable economic production models on a large empirical dataset to analyze the dynamics of content generation in knowledge markets. Amongst these, the Cobb-Douglas model best explains empirical data and provides an intuitive explanation for content generation through the concepts of elasticity and diminishing returns. Content generation depends on user participation and also on how specific types of content (e.g. answers) depends on other types (e.g. questions). We show that these factors of content generation have constant elasticity and a percentage increase in any of the inputs leads to a constant percentage increase in the output. Furthermore, markets exhibit diminishing returns-the marginal output decreases as the input is incrementally increased. Knowledge markets also vary on their returns to scale-the increase in output resulting from a proportionate increase in all inputs. Importantly, many knowledge markets exhibit diseconomies of scale-measures of market health (e.g., the percentage of questions with an accepted answer) decrease as a function of the number of participants. The implications of our work are two-fold: site operators ought to design incentives as a function of system size (number of participants); the market lens should shed insight into complex dependencies amongst different content types and participant actions in general social networks.
This paper addresses the question of identifying a concept dependency graph for a MOOC through unsupervised analysis of lecture transcripts. The problem is important: extracting a concept graph is the first step in helping students with varying preparation to understand course material. The problem is challenging: instructors are unaware of the student preparation diversity and may be unable to identify the right resolution of the concepts, necessitating costly updates; inferring concepts from groups suffers from polysemy; the temporal order of concepts depends on the concepts in question. We propose innovative unsupervised methods to discover a directed concept dependency within and between lectures. Our main technical innovation lies in exploiting the temporal ordering amongst concepts to discover the graph. We propose two measures—the Bridge Ensemble Measure and the Global Direction Measure—to infer the existence and the direction of the dependency relations between concepts. The bridge ensemble measure identifies concept overlap between lectures, determines concept co-occurrence within short windows, and the lecture where concepts occur first. The global direction measure incorporates time directly by analyzing the concept time ordering both globally and within lectures. Experiments over real-world MOOC data show that our method outperforms the baseline in both AUC and precision/recall curves.
The rise of the ``big data'' era has created a pressing demand for educating many data scientists and engineers quickly at low cost. It is essential they learn by working on assignments that involve real world data sets to develop the skills needed to be successful in the workplace. However, enabling instructors to flexibly deliver all kinds of data science assignments using real world data sets to large numbers of learners (both on-campus and off-campus) at low cost is a significant open challenge. To address this emerging challenge generally, we develop and deploy a novel Cloud-based Lab for Data Science (CLaDS) to enable many learners around the world to work on real-world data science problems without having to move or otherwise distribute prohibitively large data sets. Leveraging version control and continuous integration, CLaDS provides a general infrastructure to enable any instructor to conveniently deliver any hands-on data science assignment that uses large real world data sets to as many learners as our cloud-computing infrastructure allows at very low cost. In this paper, we present the design and implementation of CLaDS and discuss our experience with using CLaDS to deploy seven major text data assignments for students in both an on-campus course and an online course to work on for learning about text data retrieval and mining techniques; this shows that CLaDS is a very promising novel general infrastructure for efficiently delivering a wide range of hands-on data science assignments to a large number of learners at very low cost.
Imagine a movie-trailer voice intoning, "In a world where AI has learned to partner with humans to peacefully advance society..." And then forget it! That movie, you see, is never getting made—no explosions, no antagonism, no killer robots. However, outside Hollywood, the co-adaptive development of humans and artificial intelligence (AI) may be worth a bit more consideration. There is little doubt at this point that the growth and maturation of AI will be a major influence on our economy and society overall. Significant work is underway both on advancing AI and on combining human and artificial intelligence to improve the functionality and user experience of AI-based methods, tools, and services. Advanced AI is successfully reshaping many transactional contexts such as image search and purchase recommendations, as well as contexts that involve repetitive activity, such as manufacturing. However, AI is progressing much more slowly in contexts that involve rich experiences aimed at advancing human intelligence and the overall human condition—for example, in education. A potentially unintended consequence of this is increased emphasis on the lower-hanging fruit of transactional and repetitive contexts, and less emphasis on the more complex human-development contexts that are critical for a healthy society. This article proposes a design approach for tackling the integration of AI into human-development contexts while promoting the development of new forms of cyber-human intelligence.
This paper proposes an approach to learn robust behavior representations in online platforms by addressing the challenges of user behavior skew and sparse participation. Latent behavior models are important in a wide variety of applications: recommender systems; prediction; user profiling; community characterization. Our framework is the first to jointly address skew and sparsity across graphical behavior models. We propose a generalizable bayesian approach to partition users in the presence of skew while simultaneously learning latent behavior profiles over these partitions to address user-level sparsity. Our behavior profiles incorporate the temporal activity and links between participants, although the proposed framework is flexible to introduce other definitions of participant behavior. Our approach explicitly discounts frequent behaviors and learns variable size partitions capturing diverse behavior trends. The partitioning approach is data-driven with no rigid assumptions, adapting to varying degrees of skew and sparsity. A qualitative analysis indicates our ability to discover niche and informative user groups on large online platforms. Results on User Characterization (+6-22% AUC); Content Recommendation (+6-43% AUC) and Future Activity Prediction (+12-25% RMSE) indicate significant gains over state-of-the-art baselines. Furthermore, user cluster quality is validated with magnified gains in the characterization of users with sparse activity.
In recent times, deep neural networks have found success in Collaborative Filtering (CF) based recommendation tasks. By parametrizing latent factor interactions of users and items with neural architectures, they achieve significant gains in scalability and performance over matrix factorization. However, the long-tail phenomenon in recommender performance persists on the massive inventories of online media or retail platforms. Given the diversity of neural architectures and applications, there is a need to develop a generalizable and principled strategy to enhance long-tail item coverage. In this paper, we propose a novel adversarial training strategy to enhance long-tail recommendations for users with Neural CF (NCF) models. The adversary network learns the implicit association structure of entities in the feedback data while the NCF model is simultaneously trained to reproduce these associations and avoid the adversarial penalty, resulting in enhanced long-tail performance. Experimental results show that even without auxiliary data, adversarial training can boost long-tail recall of state-of-the-art NCF models by up to 25%, without trading-off overall performance. We evaluate our approach on two diverse platforms, content tag recommendation in Q&A forums and movie recommendation.
Modern social platforms are characterized by the presence of rich user-behavior data associated with the publication, sharing and consumption of textual content. Users interact with content and with each other in a complex and dynamic social environment while simultaneously evolving over time. In order to effectively characterize users and predict their future behavior in such a setting, it is necessary to overcome several challenges. Content heterogeneity and temporal inconsistency of behavior data result in severe sparsity at the user level. In this paper, we propose a novel mutual-enhancement framework to simultaneously partition and learn latent activity profiles of users. We propose a flexible user partitioning approach to effectively discover rare behaviors and tackle user-level sparsity. We extensively evaluate the proposed framework on massive datasets from real-world platforms including Q&A networks and interactive online courses (MOOCs). Our results indicate significant gains over state-of-the-art behavior models ( 15% avg ) in a varied range of tasks and our gains are further magnified for users with limited interaction data. The proposed algorithms are amenable to parallelization, scale linearly in the size of datasets, and provide flexibility to model diverse facets of user behavior.
We propose a resource-constrained network growth model that explains the emergence of key structural properties of real-world directed networks: heavy-tailed in-degree distribution, high local clustering and degree-clustering relationship. In real-world networks, individuals form edges under constraints of limited network access and partial information. However, well-known growth models that preserve multiple structural properties do not incorporate these resource constraints. Conversely, existing resource-constrained models do not jointly preserve multiple structural properties of real-world networks. We propose a random walk growth model that explains how real-world network properties can jointly arise from edge formation under resource constraints. In our model, each node that joins the network selects a seed node from which it initiates a random walk. At each step of the walk, the new node either jumps back to the seed node or chooses an outgoing or incoming edge to visit another node. It links to each visited node with some probability and stops after forming a few edges. Our experimental results against four well-known growth models indicate improvement in accurately preserving structural properties of five citation networks. Our model also preserves two structural properties that most growth models cannot: skewed local clustering distribution and bivariate in-degree-clustering relationship.
In this paper, we model the community question answering (CQA) websites on Stack Exchange platform as knowledge markets, and analyze how and why these markets can fail at scale. Analyzing CQA websites as markets allows site operators to reason about the failures in knowledge markets, and design policies to prevent these failures. Our main contribution is to provide insight on knowledge market failures. We explore a set of interpretable economic production models to capture content generation dynamics in knowledge markets. The best performing of these, well-known in economic literature as Cobb-Douglas equation, provides an intuitive explanation for content generation in the knowledge markets. Specifically, it shows that (1) factors of content generation such as user participation and content dependency have constant elasticity--a percentage increase in any of the inputs leads to a constant percentage increase in the output, (2) in many markets, factors exhibit diminishing returns--the incremental, marginal output decreases as the input is incrementally increased, (3) markets vary according to their returns to scale--the increase in output resulting from a proportionate increase in all inputs, and finally (4) many markets exhibit diseconomies of scale--measures of market health decrease as a function of overall system size (number of participants).
We propose a probabilistic packet reception model for Bluetooth Low Energy (BLE) packets in indoor spaces and we validate the model by using it for indoor localization. We expect indoor localization to play an important role in indoor public spaces in the future. We model the probability of reception of a packet as a generalized quadratic function of distance, beacon power and advertising frequency. Then, we use a Bayesian formulation to determine the coefficients of the packet loss model using empirical observations from our testbed. We develop a new sequential Monte-Carlo algorithm that uses our packet count model. The algorithm is general enough to accommodate different spatial configurations. We have good indoor localization experiments: our approach has an average error of ~ 1.2m, 53% lower than the baseline range-free Monte-Carlo localization algorithm.
This paper introduces new techniques for sampling attributed networks to support standard Data Mining tasks. The problem is important for two reasons. First, it is commonplace to perform data mining tasks such as clustering and classification of network attributes (attributes of the nodes, including social media posts), on sampled graphs since real-world networks can be very large. And second, the early work on network samplers (e.g. ForestFire, Re-weighted Random Walk, Metropolis-Hastings Random Walk) focused on preserving structural properties of the network (e.g. degree distribution, diameter) in the sample. However, it is unclear if these data agnostic samplers tuned to preserve network structural properties would preserve salient characteristics of network content; preserving salient data characteristics is critical for clustering and classification tasks. There are three contributions of this paper. First, we introduce several data aware samplers based on Information Theoretic principles. Second, we carefully analyze data aware samplers with state of the art data agnostic samplers (which use only network structure to sample) for three different data mining tasks: data characterization, clustering and classification. Finally, our experimental results over large real-world datasets and synthetic benchmarks suggest a surprising result: there is no single sampler that is consistently the best across all tasks. We show that data aware samplers perform significantly better (p <0.05) than data agnostic samplers on data coverage, clustering, classification tasks.
In traditional public good experiments participants receive an endowment from the experimenter that can be invested in a public good or kept in a private account. In this paper we present an experimental environment where participants can invest time during five days to contribute to a public good. Participants can make contributions to a linear public good by logging into a web application and performing virtual actions. We compared four treatments, with different group sizes and information of (relative) performance of other groups. We find that information feedback about performance of other groups has a small positive effect if we control for various attributes of the groups. Moreover, we find a significant effect of the contributions of others in the group in the previous day on the number of points earned in the current day. Our results confirm that people participate more when participants in their group participate more, and are influenced by information about the relative performance of other groups.
We identify influential early adopters in a social network, where individuals are resource constrained, to maximize the spread of multiple, costly behaviors. A solution to this problem is especially important for viral marketing. The problem of maximizing influence in a social network is challenging since it is computationally intractable. We make three contributions. First, we propose a new model of collective behavior that incorporates individual intent, knowledge of neighbors actions and resource constraints. Second, we show that the multiple behavior influence maximization is NP-hard. Furthermore, we show that the problem is submodular, implying the existence of a greedy solution that approximates the optimal solution to within a constant. However, since the greedy algorithm is expensive for large networks, we propose efficient heuristics to identify the influential individuals, including heuristics to assign behaviors to the different early adopters. We test our approach on synthetic and real-world topologies with excellent results. We evaluate the effectiveness under three metrics: unique number of participants, total number of active behaviors and network resource utilization. Our heuristics produce 15-51% increase in expected resource utilization over the naïve approach.
This paper presents Lamina, a system for providing security and privacy to users in a public IoT space. Public IoT spaces, such as an IoT-enable retail store, have the potential to provide rich, targeted information and services to users within the environment. However, to fully realize such potential, users must be willing to share certain data regarding their habits and preferences with the public IoT spaces. To encourage users to share such information, we present Lamina, a system that ensures the user's data will not be leaked to third parties. Lamina uses CryptoCoP-based encryption and a unique MAC address rotation mechanism to ensure that a user's privacy is maintained and their data is protected while still allowing the public IoT space to collect sufficient information to effectively provide targeted services.
We study the problem of organizing a collection of objects—images, videos—into clusters, using crowdsourcing. This problem is no- toriously hard for computers to do automatically, and even with crowd workers, is challenging to orchestrate: (a) workers may cluster based on different latent hierarchies or perspectives; (b) work- ers may cluster at different granularities even when clustering using the same perspective; and (c) workers may only see a small portion of the objects when deciding how to cluster them (and therefore have limited understanding of the “big picture”). We develop cost-efficient, accurate algorithms for identifying the consensus organization (i.e., the organizing perspective most workers prefer to employ), and incorporate these algorithms into a cost-effective work- flow for organizing a collection of objects, termed ORCHESTRA. We compare our algorithms with other algorithms for clustering, on a variety of real-world datasets, and demonstrate that ORCHESTRA organizes items better and at significantly lower costs.
We have struck a Faustian bargain with major corporations---free information services in exchange for our web surfing behavioral data. Unfortunately, we have little control over not only what data is gathered about us, but also how long the data is stored and is used. Indeed, with the web, it is hard, if not impossible to be forgotten. As users interact with an Internet of Things (IoT) ecosystem, they leave behind traces of information about their presence, preferences and behavior. While the ecosystem can track individuals' movements to provide enhanced recommendations, individuals as with entities that track their web behavior, have little control over how this information is being used or distributed. Must the bargain between individuals and entities interested in tracking them in IoT environments, be asymmetric?
In response, we present Incognito, a secure and privacy preserving IoT framework where user information exposure is driven by the concept of identity. In particular, we advocate user-managed identities, leaving the control of the choice of identity in a given context, as well as the level of exposure, in the hands of the user. Using Incognito, users can create identities that work only within certain contexts and are meaningless outside of these contexts. Furthermore, Incognito allows for simple management of information exposure through contextual-policies for sharing as well as querying of an IoT ecosystem. By giving individuals full control over the information traces that they leave behind in an IoT infrastructure, Incognito, in essence, puts individuals on equal footing with the entities that want to track their behavioral data. Incognito fosters a symbiotic relationship; users will need to expose information in exchange for personalized recommendations and IoT organizations who provide sophisticated user experiences will see enhanced user engagement.
Real-world networks are often complex and large with millions of nodes, posing a great challenge for analysts to quickly see the big picture for more productive subsequent analysis. We aim at facilitating exploration of node-attributed networks by creating representations with conciseness, expressiveness, interpretability, and multi-resolution views. We develop such a representation as a map—among the first to explore principled network cartography for general networks. In parallel with common maps, ours has land- marks, which aggregate nodes homogeneous in their traits and interactions with nodes elsewhere, and roads, which represent the interactions between the landmarks. We capture such homogeneity by the similar roles the nodes played. Next, to concretely model the landmarks, we propose a probabilistic generative model of networks with roles as latent factors. Furthermore, to enable interactive zooming, we formulate novel model-based constrained optimization. Then, we design efficient linear-time algorithms for the optimizations. Experiments using real-world and synthetic net- works show that our method produces more expressive maps than existing methods, with up to 10 times improvement in network reconstruction quality. We also show that our method extracts landmarks with more homogeneous nodes, with up to 90% improvement in the average attribute/link entropy among the nodes over each landmark. Sense-making of a real-world network using a map computed by our method qualitatively verify the effectiveness of our method.
As users interact with an Internet of Things (IoT) ecosystem, they leave behind traces of information about their presence, preferences and behavior. While the ecosystem can track individuals' movements to provide enhanced recommendations, individuals have little control over how this information is being used or distributed. Such tracking has led to increasing privacy concerns over the use of IoT. While it is possible to develop systems to enable anonymous interaction with IoT, anonymity results in limited benefits to both individuals and IoT ecosystems. In response, we present Incognito, a secure and privacy preserving IoT framework where user information exposure is driven by the concept of identity. In particular, we advocate user-managed identities, leaving the control of the choice of identity in a given context, as well as the level of exposure, in the hands of the user. Using Incognito, users can create identities that work only within certain contexts and are meaningless outside of these contexts. Furthermore, Incognito allows for simple management of information exposure through contextual-policies for sharing as well as querying of an IoT ecosystem. By giving individuals full control over the information traces that they leave behind in an IoT infrastructure, Incognito, in essence, puts individuals on equal footing with the entities that want to track their behavioral data. Incognito fosters a symbiotic relationship; users will need to expose information in exchange for personalized recommendations and IoT organizations who provide sophisticated user experiences will see enhanced user engagement.
The rise of social media provides a great opportunity for people to reach out to their social connections to satisfy their information needs. However, generic social media platforms are not explicitly designed to assist information seeking of users. In this paper, we propose a novel framework to identify the social connections of a user able to satisfy his information needs. The information need of a social media user is subjective and personal, and we investigate the utility of his social context to identify people able to satisfy it. We present questions users post on Twitter as instances of information seeking activities in social media. We infer soft community memberships of the asker and his social connections by integrating network and content information. Drawing concepts from the social foci theory, we identify answerers who share communities with the asker w.r.t. the question. Our experiments demonstrate that the framework is effective in identifying answerers to social media questions.
This paper discusses the role of computing in engendering cooperation in social dilemmas such as sustainability and public health. These cooperative dilemmas exist at a large scale, within heterogeneous populations. Motivated by analysis of cooperation from empirical field studies, we argue that an integrative computational framework that analyzes social signals and verifies behaviors through smartphone sensors can shape and mold individual decisions to cooperate. We discuss four interconnected technical challenges and example solutions. The challenges include community discovery algorithms for construction of small homogenous groups, persuasion of individuals in resource constrained networks, activity monitoring in the wild and detection of large scale social coordination. We briefly discuss new applications that arise from a computational infrastructure for cooperation, including fighting childhood obesity, cybersecurity and improving public safety.
This article presents a personalized narrative on the early discussions within the Multimedia community and the subsequent research on experiential media systems. I discuss two different research initiatives—design of real-time, immersive multimedia feedback environments for stroke rehabilitation; exploratory environments for events that exploited the user's ability to make connections. I discuss the issue of foundations: the question of multisensory integration and superadditivity; the need for identification of “first-class” Multimedia problems; expanding the scope of Multimedia research.
Personal digital photo libraries embody a large amount of in- formation useful for research into photo organization, photo layout, and development of novel photo browser features. Even when anonymity can be ensured, amassing a sizable dataset from these libraries is still difficult due to the visi- bility and cost that would be required from such a study. We explore using the Mac App Store to reach more users to collect data from such personal digital photo libraries. More specifically, we compare and discuss how it differs from common data collection methods, e.g. Amazon Mechanical Turk, in terms of time, cost, quantity, and design of the data collection application. We have collected a large, openly available photo feature dataset using this manner. We illustrate the types of data that can be collected. In 60 days, we collected data from 20,778 photo sets (473,772 photos). Our study with the Mac App Store suggests that popular application distribu- tion channels is a viable means to acquire massive data col- lections for researchers.
Taskville is an interactive visualization that aims to increase awareness of tasks that occur in the workplace. It utilizes gameplay elements and playful interaction to motivate continued use. A preliminary study with 37 participants shows that Taskville succeeds at being a fun and enjoyable experience while also increasing awareness. A strong correlation was also found between two major study groups demonstrating its potential to increase awareness and stimulate task-based activity across work groups.
This paper reviews the state of the art and some emerging issues in research areas related to pattern analysis and monitoring of web-based social communities. This research area is important for several reasons. The presence of near ubiquitous low-cost computing and com- munication technologies have enabled people to access and share information at unprecedented scale, which necessitates new research for making sense of such content. Furthermore, popular websites with sophisticated media sharing and notification features allow users to stay in touch with friends and loved ones, and also to help form explicit and implicit groups. These social structures are an important source of information for better organizing and managing multimedia. In this article, we study how media-rich social networks provide additional insight into familiar multimedia research problems, including tagging and video ranking. In particular, we advance the idea that the contextual and social aspects of media semantics are as important for successful multimedia applications as the media content itself. We examine the inter-relationship between content and social context through the prism of three key questions. First, how do we extract context in which social interactions occur? Second, does social interaction provide value to the media object? And Finally, how does social media facilitate the re- purposing of shared content, and engender cultural memes? We present three case studies to examine these questions in detail. In the first case study, we show how to discover structure latent in the data, and use the structure to organize Flickr photo streams. In the second case study, we discuss how to determine the interestingness of conversations— and of participants—around videos uploaded to YouTube. Finally, we show how analysis of visual content—tracing content remixes, in particular—can help us understand the relationship amongst YouTube participants. For each case, we present an overview of recent work and review the state of the art. We also discuss two emerging issues related to the analysis of social networks—robust data sampling and scalable data analysis.
Presentation support tools, such as Microsoft PowerPoint, pose challenges both in terms of creating linear presentations from complex data and fluidly navigating such linear structures when presenting to diverse audiences. NextSlidePlease is a slideware application that addresses these challenges using a directed graph structure approach for authoring and delivering multimedia presentations. The application combines novel approaches for searching and analyzing presentation datasets, composing meaningfully structured presentations and efficiently delivering material under a variety of time constraints. We introduce and evaluate a presentation analysis algorithm intended to simplify the process of authoring dynamic presentations, and a time management and path selection algorithm that assists users in prioritizing content during the presentation process. Results from two comparative user studies indicate that the directed graph approach promotes the creation of hyperlinks, the consideration of connections between content items and a richer understanding of the time management consequences of including and selecting presentation material.
This paper focuses on detecting social, physical-world events from photos posted on social media sites. The problem is important: cheap media capture devices have significantly increased the number of photos shared on these sites. The main contribution of this paper is to incorporate online social interaction features in the detection of physical events. We believe that online social interaction reflect important signals among the participants on the “social affinity” of two photos, thereby helping event detection. We compute social affinity via a random-walk on a social interaction graph to determine similarity between two photos on the graph. We train a support vector machine classifier to combine the social affinity be- tween photos and photo-centric metadata including time, location, tags and description. Incremental clustering is then used to group photos to event clusters. We have very good results on two large scale real-world datasets: Upcoming and MediaEval. We show an improvement between 0.06–0.10 in F1 on these datasets
A photo stream is a chronological sequence of photos. Most existing photo stream segmentation methods assume that a photo stream comprises of photos from multiple events and their goal is to produce groups of photos, each corresponding to an event, i.e. they perform automatic albuming. Even if these photos are grouped by event, sifting through the abundance of photos in each event is cumbersome. To help make photos of each event more manageable, we propose a photo stream segmentation method for an event photo stream—the chronological sequence of photos of a single event—to produce groups of photos, each corresponding to a photo-worthy moment in the event.Our method is based on a hidden Markov model with parameters learned from time, EXIF metadata, and visual information from 1) training data of unlabelled, unsegmented event photo streams and 2) the event photo stream we want to segment. In an experiment with over 5000 photos from 28 personal photo sets, our method outperformed all six baselines with statistical significance (p<0.10 with the best baseline and p<0.005 with the others).
This paper explores photo organization within an event photo stream, i.e. the chronological sequence of photos from a single event. In our previous work, we have proposed a method to segment an event photo stream to produce groups of photos, each corresponding to a photo-worthy moment in the event. Building upon this work, we have developed a photo browser that uses our method to automatically group photos from a single event into smaller groups of photos we call chapters. The photo browser also affords users with a drag-and-drop interface to refine the chapter groupings. With the photo browser, we conducted an exploratory study of 23 college students with their 8096 personal photos from 92 events. In this paper, we report novel insights on how the subjects organized photos in each event into smaller groups and contrast our observations with existing literature on photo organization. We also explore how chapter- based photo organization affects photo-related tasks such as storytelling, searching and interpretation, through key aspects of the photo layouts. We found that subjects value the chronological order of the chapters more than maximizing screen space usage and that they value chapter consistency more than the chronological order of the photos. For automatic chapter groupings, having low chapter boundary misses is more important than having low chapter boundary false alarms; the choice of chapter criteria and granularity for chapter groupings are very subjective; and subjects found that chapter-based photo organization helps in all three tasks of the user study.
This is a position paper on the role of content analysis in media-rich online communities. We highlight changes in the multimedia generation and consumption process that has occurred the past decade, and find the new angles this has brought to multimedia analysis re- search. We first examine the content production, dissemination and consumption patterns in the recent social media studies literature. We derive an updated conceptual summary of media lifecycle. We present an update list of impact criteria and challenge areas for multimedia content analysis. Among the three criteria, two are existing but with new problems and solutions, one is new as a results of the community-driven content lifecycle. We present three case studies that addresses the impact criteria, and conclude with an outlook for emerging problems. This work uses the general methodology of a previous research column [9], while the observations and conclusions are new.
In the 21st century knowledge economy there is a growing need for the types of creative thinkers who can bridge the engineering mindset with the creative mindset, combining multiple types of skills. New economies will need workers who have "diagonal" skill sets, who can develop systems and content as an integrative process. This requires a new type of training and curriculum. In the newly formed "Digital Culture" undergraduate program at ASU, we attempt to support new types curricula by structuring differently the way students move through courses. With a constantly shifting and changing curriculum, structuring course enrollment using class prerequisites leads to fixed and rigid pathways through the curriculum. Instead, Digital Culture structures course sequences based on the students accumulation of abstract "Proficiencies" which are collected by students as they complete courses, and which act as keys to unlock access to higher level course. As a student accumulates more and more of these proficiencies, they are increasingly able to unlock new courses. This system leads to more flexible and adaptive pathways through courses while ensuring that students are prepared for entrance into more advanced classes. It is however more complicated and requires that students strategically plan their route through the curriculum. In order to support this kind of strategic planning we have designed and deployed a course planning system where students can simulate various possible paths through the curriculum. In this paper, we show our design process in coming up with our "Digital Culture Visual Planner". This design process starts with a network analysis of how all the Digital Culture courses are interrelated by, visualizing the relationships between proficiencies and courses. A number of possible design directions result from this analysis. Finally we select a single design and refine it to be understandable, useful and usable by new undergraduate Digital Culture majors.
In this paper, we present Wind Runners, which is a game designed for children with asthma. The goal of Wind Runners is to increase the likelihood of asthmatic children adhering to the NIH’s recommendation of measuring their peak expiratory flow (PEF) on a daily basis. We aim to accomplish this by incorporating both social gaming features and the actual medical regimen of measuring PEF into a mobile game.
Social network systems are significant scaffolds for political, economic and socio-cultural change. This is in part due to the widespread availability of sophisticated network technologies and the concurrent emergence of rich media websites. Social network sites provide new opportunities for social-technological research. Since we can inexpensively collect electronic records—over extended periods—of social data, spanning diverse populations, it is now possible to study social processes on a scale of tens of million individuals. To understand the large-scale dynamics of interpersonal interaction and its outcome, this article links the perspectives in the humanities for analysis of social networks to recent developments in data in- tensive computational approaches. With special emphasis on social communities mediated by network technologies, we review the historical research arc of community analysis, as well as methods applicable to community discovery in social media.
In this article, we present a novel algorithm to discover multi-relational structures from social media streams. A media item such as a photograph exists as part of a meaningful inter-relationship amongst several attributes including – time, visual content, users, and actions. Discovery of such relational structures enables us to understand the semantics of human activity and has applications in content organization, recommendation algorithms, and exploratory social network analysis. We are proposing a novel non-negative matrix factorization framework to characterize relational structures of group photo streams. The factorization incorporates image content features and contextual information. The idea is to consider a cluster as having similar relational patterns – each cluster consists of photos relating to similar content or context. Relations represent different aspects of the photo stream data, including visual content, associated tags, photo owners, and post times. The extracted structures minimize the mutual information of the predicted joint distribution. We also introduce a relational modularity function to determine the structure cost penalty, and hence determine the number of clusters. Extensive experiments on a large Flickr dataset suggest that our approach is able to extract meaningful relational patterns from group photo streams. We evaluate the utility of the discovered structures through a tag prediction task and through a user study. Our results show that our method based on relational structures, outperforms baseline methods, including feature and tag frequency based techniques, by 35%–420%. We have conducted a qualitative user study to evaluate the benefits of our framework in exploring group photo streams. The study indicates that users found the extracted clustering results clearly represent major themes in a group; the clustering results not only reflect how users describe the group data but often lead the users to discover the evolution of the group activity.
We propose SCENT, an innovative, scalable spectral analysis framework for internet scale monitoring of multi-relational social media data, encoded in the form of tensor streams. In particular, a significant challenge is to detect key changes in the social media data, which could reflect important events in the real world, sufficiently quickly. Social media data have three challenging characteristics. First, data sizes are enormous – recent technological advances allow hundreds of millions of users to create and share content within online social networks. Second, social data are often multi-faceted (i.e., have many dimensions of potential interest, from the textual content to user metadata). Finally, the data is dynamic – structural changes can occur at multiple time scales and be localized to a subset of users. Consequently, a framework for extracting useful in- formation from social media data needs to scale with data volume, and also with the number and diversity of the facets of the data. In SCENT, we focus on the computational cost of structural change detection in tensor streams. We extend compressed sensing (CS) to tensor data. We show that, through the use of randomized tensor ensembles, SCENT is able to encode the observed tensor streams in the form of compact descriptors. We show that the descriptors allow very fast detection of significant spectral changes in the tensor stream, which also reduce data collection, storage, and processing costs. Experiments over synthetic and real data show that SCENT is faster (17.7x–159x for change detection) and more accurate (above 0.9 F-score) than baseline methods.
This article presents the principles of an adaptive mixed reality rehabilitation (AMRR) system, as well as the training process and results from 2 stroke survivors who received AMRR therapy, to illustrate how the system can be used in the clinic. The AMRR system integrates traditional rehabilitation practices with state-of-the-art computational and motion capture technologies to create an engaging environment to train reaching movements. The system provides real-time, intuitive, and integrated audio and visual feedback (based on detailed kinematic data) representative of goal accomplishment, activity performance, and body function during a reaching task. The AMRR system also provides a quantitative kinematic evaluation that measures the deviation of the stroke survivor’s movement from an idealized, unimpaired movement. The therapist, using the quantitative measure and knowledge and observations, can adapt the feedback and physical environment of the AMRR system throughout therapy to address each participant’s individual impairments and progress. Individualized training plans, kinematic improvements measured over the entire therapy period, and the changes in relevant clinical scales and kinematic movement attributes before and after the month-long therapy are presented for 2 participants. The substantial improvements made by both participants after AMRR therapy demonstrate that this system has the potential to considerably enhance the recovery of stroke survivors with varying impairments for both kinematic improvements and functional ability.
This work aims at discovering community structure in rich media social networks through analysis of time- varying, multi-relational data. Community structure represents the latent social context of user actions. It has important applications such as search and recommendation. The problem is particularly useful in the enterprise domain where extracting emergent community structure on enterprise social media can help in forming new collaborative teams, in expertise discovery, and in the long term reorganization of enterprises based on collaboration patterns. There are several unique challenges: (a) In social media, the context of user actions is constantly changing and co-evolving; hence the social context contains time-evolving multi-dimensional relations. (b) The social context is determined by the available system features and is unique in each social media platform; hence the analysis of such data needs to flexibly incorporate various system features. In this article we propose MetaFac (MetaGraph Factorization), a framework that extracts community structures from dynamic, multi-dimensional social contexts and interactions. Our work has three key contributions: (1) metagraph, a novel relational hypergraph representation for modeling multi-relational and multi-dimensional social data; (2) an efficient multi-relational factorization method for community extraction on a given metagraph; (3) an on-line method to handle time-varying relations through incremental metagraph factorization. Extensive experiments on real-world social data collected from an enterprise and the public Digg social media website suggest that our technique is scalable and is able to extract meaningful communities per social media contexts. We illustrate the usefulness of our framework through two prediction tasks: (1) in the enterprise dataset, the task is to predict users’ future interests on tag usage, and (2) in the Digg dataset, the task is to predict users’ future interests on voting and commenting Digg stories. Our prediction significantly outperforms baseline methods (including aspect model and tensor analysis), indicating the promising direction of using metagraphs for handling time-varying social relational contexts.
We are motivated in our work by the following question: what factors influence individual participation in social media conversations? Conversations around user posted content, is central to the user experience in social media sites, including Facebook, YouTube and Flickr. Therefore, understanding why people participate, can have significant bearing on fundamental research questions in social network and media analysis, such as, network evolution, and information diffusion.
Our approach is as follows. We first identify several key aspects of social media conversations, distinct from both online forum discussions and other social networks. These aspects include intrinsic and extrinsic network factors. There are three factors intrinsic to the network: social awareness, community characteristics and creator reputation. The factors extrinsic to the network include: media context and conversational interestingness. Thereafter we test the effectiveness of each factor type in accounting for the observed participation of individuals using a Support Vector Regression based prediction framework. Our findings indicate that factors that influence participation depend on the media type: YouTube participation is different from a weblog such as Engadget. We further show that an optimal factor combination improves prediction accuracy of observed participation, by ~9--13% and ~8--11% over using just the best hypothesis and all hypotheses respectively. Implications of this work in understanding individual contributions in social media conversations, and the design of social sites in turn, are discussed.
This paper presents a novel, low-cost, real-time adaptive multimedia environment for home-based upper extremity rehabilitation of stroke survivors. The primary goal of this system is to provide an interactive tool with which the stroke survivor can sustain gains achieved within the clinical phase of therapy and increase the opportunity for functional recovery. This home-based mediated system has low cost sensing, off the shelf components for the auditory and visual feedback, and remote monitoring capability. The system is designed to continue active learning by reducing dependency on real-time feedback and focusing on summary feedback after a single task and sequences of tasks. To increase system effectiveness through customization, we use data from the training strategy developed by the therapist at the clinic for each stroke survivor to drive automated system adaptation at the home. The adaptation includes changing training focus, selecting proper feedback coupling both in real-time and in summary, and constructing appropriate dialogues with the stroke survivor to promote more efficient use of the system. This system also allows the therapist to review participant’s progress and adjust the training strategy weekly.
New motion capture technologies are allowing de- tailed, precise and complete monitoring of movement through real-time kinematic analysis. However, a clinically relevant understanding of movement impairment through kinematic analysis requires the development of computational models that integrate clinical expertise in the weighing of the kinematic parameters. The resulting kinematics based measures of movement impairment would further need to be integrated with existing clinical measures of activity disability. This is a challenging process requiring computational solutions that can extract correlations within and between three diverse data sets: human driven assessment of body function, kinematic based assessment of movement impairment and human driven assessment of activity. We propose to identify and characterize different sensorimotor control strategies used by normal individuals and by hemiparetic stroke survivors acquiring a skilled motor task. We will use novel quantitative approaches to further our understanding of how human motor function is coupled to multiple and simultaneous modes of feedback. The experiments rely on a novel interactive tasks environment developed by our team in which subjects are provided with rich auditory and visual feedback of movement variables to drive motor learning. Our proposed research will result in a computational framework for applying virtual information to assist motor learning for complex tasks that require coupling of proprioception, vision audio and haptic cues. We shall use the framework to devise a computational tool to assist with therapy of stroke survivors. This tool will utilize extracted relationships in a pre-clinical setting to generate effective and customized rehabilitation strategies.
This paper presents a novel generalized computational framework for quantitative kinematic evaluation of movement in a rehabilitation clinic setting. The framework integrates clinical knowledge and computational data-driven analysis together in a systematic manner. The framework provides three key benefits to rehabilitation: (a) the resulting continuous normalized measure allows the clinician to monitor movement quality on a fine scale and easily compare impairments across participants, (b) the framework reveals the effect of individual movement components on the composite movement performance helping the clinician decide the training foci, and (c) the evaluation runs in real-time, which allows the clinician to constantly track a patient‟s progress and make appropriate adaptations to the therapy protocol. The creation of such an evaluation is difficult because of the sparse amount of recorded clinical observations, the high dimensionality of movement and high variations in subject‟s performance. We address these issues by modeling the evaluation function as linear combination of multiple normalized kinematic attributes y=Σwiφi(xi) and estimating the attribute normalization function φi(·) by integrating distributions of idealized movement and deviated movement. The weights wi are derived from a therapist's pair-wise comparison using a modified RankSVM algorithm. We have applied this framework to evaluate upper limb movement for stroke survivors with excellent results – the evaluation results are highly correlated to the therapist's observations.
Raising awareness and motivating workers in a large collaborative enterprise is a challenging endeavor. In this paper we briefly describe Taskville, a distributed social media workplace game played by teams on large, public displays. Taskville gamifies the process of routine task management, introducing light competitive play within and between teams. We present the design and implementation of the Taskville game and offer insights and recommendations gained from two pilot studies.
This article analyzes communication within a set of individuals to extract the representative prototypical groups and provides a novel framework to establish the utility of such groups. Corporations may want to identify representative groups (which are indicative of the overall communication set) because it is easier to track the prototypical groups rather than the entire set. This can be useful for advertising, identifying “hot” spots of resource consumption as well as in mining representative moods or temperature of a community. Our framework has three parts: extraction, characterization, and utility of prototypical groups. First, we extract groups by developing features representing communication dynamics of the individuals. Second, to characterize the overall communication set, we identify a subset of groups within the community as the prototypical groups. Third, we justify the utility of these prototypical groups by using them as predictors of related external phenomena; specifically, stock market movement of technology companies and political polls of Presidential candidates in the 2008 U.S. elections. We have conducted extensive experiments on two popular blogs, Engadget and Huffington Post. We observe that the prototypical groups can predict stock market movement/political polls satisfactorily with mean error rate of 20.32%. Further, our method outperforms baseline methods based on alternative group extraction and prototypical group identification methods. We evaluate the quality of the extracted groups based on their conductance and coverage measures and develop metrics: predictivity and resilience to evaluate their ability to predict a related external time-series variable (stock market movement/political polls). This implies that communication dynamics of individuals are essential in extracting groups in a community, and the prototypical groups extracted by our method are meaningful in characterizing the overall communication sets.
This paper presents a novel mixed reality rehabilitation system used to help improve the reaching movements of people who have hemiparesis from stroke. The system provides real-time, multimodal, customizable, and adaptive feedback generated from the movement patterns of the subject's affected arm and torso during reaching to grasp. The feedback is provided via innovative visual and musical forms that present a stimulating, enriched environment in which to train the subjects and promote multimodal sensory-motor integration. A pilot study was conducted to test the system function, adaptation protocol and its feasibility for stroke rehabilitation. Three chronic stroke survivors underwent training using our system for six 75-min sessions over two weeks. After this relatively short time, all three subjects showed significant improvements in the movement parameters that were targeted during training. Improvements included faster and smoother reaches, increased joint coordination and reduced compensatory use of the torso and shoulder. The system was accepted by the subjects and shows promise as a useful tool for physical and occupational therapists to enhance stroke rehabilitation.
This chapter deals with the analysis of interpersonal communication dynamics in online social networks and social media. Communication is central to the evolution of social systems. Today, the different online social sites feature variegated interactional affordances, ranging from blogging, micro-blogging, sharing media elements (i.e., image, video) as well as a rich set of social actions such as tagging, voting, commenting and so on. Consequently, these communication tools have begun to redefine the ways in which we exchange information or concepts, and how the media channels impact our online interactional behavior. Our central hypothesis is that such communication dynamics between individuals manifest themselves via two key aspects: the information or concept that is the content of communication, and the channel i.e., the media via which communication takes place. We present computational models and discuss large-scale quantitative observational studies for both these organizing ideas. First, we develop a computational framework to determine the “interestingness” property of conversations cented around rich media. Second, we present user models of diffusion of social actions and study the impact of homophily on the diffusion process. The outcome of this research is twofold. First, extensive empirical studies on datasets from YouTube have indicated that on rich media sites, the conversations that are deemed “interesting” appear to have consequential impact on the properties of the social network they are associated with: in terms of degree of participation of the individuals in future conversations, thematic diffusion as well as emergent cohesiveness in activity among the concerned participants in the network. Second, observational and computational studies on large social media datasets such as Twitter have indicated that diffusion of social actions in a network can be indicative of future information cascades. Besides, given a topic, these cascades are often a function of attribute homophily existent among the participants. We believe that this chapter can make significant contribution into a better understanding of how we communicate online and how it is redefining our collective sociological behavior.
The emergence of the mediated social web—a distributed network of participants creating rich media content and engaging in interactive conversations through Internet-based communication technologies – has contributed to the evolution of powerful social, economic and cultural change. Online social network sites and blogs, such as Facebook, Twitter, Flickr and LiveJournal, thrive due to their fundamental sense of “community”. The growth of online communities offers both opportunities and challenges for researchers and practitioners. Participation in online communities has been observed to influence people’s behavior in diverse ways ranging from financial decision-making to political choices, suggesting the rich potential for diverse applications. However, although studies on the social web have been extensive, discovering communities from online social media remains challenging, due to the interdisciplinary nature of this subject. In this article, we present our recent work on characterization of communities in online social media using computational approaches grounded on the observations from social science.
This paper presents results from a clinical study of stroke survivors using an adaptive, mixed-reality rehabilitation (AMRR) system for reach and grasp therapy. The AMRR therapy provides audio and visual feedback on the therapy task, based on detailed motion capture, that places the movement in an abstract, artistic context. This type of environment promotes the generalizability of movement strategies, which is shown through kinematic improvements on an untrained reaching task and higher clinical scale scores, in addition to kinematic improvements in the trained task.
Platforms such as Twitter have provided researchers with ample opportunities to analytically study social phenomena. There are however, significant computational challenges due to the enormous rate of production of new information: re- searchers are therefore, often forced to analyze a judiciously selected “sample” of the data. Like other social media phenomena, information diffusion is a social process–it is affected by user context, and topic, in addition to the graph topology. This paper studies the impact of different attribute and topology based sampling strategies on the discovery of an important social media phenomena–information diffusion.We examine several widely-adopted sampling methods that select nodes based on attribute (random, location, and activity) and topology (forest fire) as well as study the impact of attribute based seed selection on topology based sampling. Then we develop a series of metrics for evaluating the quality of the sample, based on user activity (e.g. volume, number of seeds), topological (e.g. reach, spread) and temporal characteristics (e.g. rate). We additionally correlate the diffusion volume metric with two external variables–search and news trends. Our experiments reveal that for small sample sizes (30%), a sample that incorporates both topology and user- context (e.g. location, activity) can improve on naïve methods by a significant margin of ∼15-20%.
This paper presents a novel system architecture and evaluation metrics for an Adaptive Mixed Reality Rehabilitation (AMRR) system for stroke patient. This system provides a purposeful, engaging, hybrid (visual, auditory and physical) scene that encourages patients to improve their performance of a reaching and grasping task and promotes learning of generalizable movement strategies. This system is adaptive in that it provides assistive adaptation tools to help the rehabilitation team customize the training strategy. Our key insight is to combine the patients, rehabilitation team, multimodal hybrid environments and adaptation tools together as an adaptive experiential mixed reality system. There are three major contributions in this paper: (a) developing a computational deficit index for evaluating the patient's kinematic performance and a deficit-training-improvement (DTI) correlation for evaluating adaptive training strategy, (b) integrating assistive adaptation tools that help the rehabilitation team understand the relationship between the patient's performance and training and customize the training strategy, and (c) combining the interactive multimedia environment and physical environment together to encourage patients to transfer movement knowledge from media space to physical space. Our system has been used by two stroke patients for one-month mediated therapy. They have significant improvement in their reaching and grasping performance (+48.84% and +39.29%) compared to other two stroke patients who experienced traditional therapy (-18.31% and -8.06%).
Wearable, mobile computing platforms are envisioned to be used in out-patient monitoring and care. These systems continuously perform signal filtering, transformations, and classification, which are quite compute intensive, and quickly drain the system energy. The design space of these human activity sensors is large and includes the choice of sampling frequency, feature detection algorithm, length of the window of transition detection etc., and all these choices fundamentally trade-off power/performance for accuracy of detection. In this work, we explore this design space, and make several interesting conclusions that can be used as rules of thumb for quick, yet power-efficient designs of such systems. For instance, we find that the x-axis of our signal, which was oriented to be parallel to the forearm, is the most important signal to be monitored, for our set of hand activities. Our experimental results show that by carefully choosing system design parameters, there is considerable (5X) scope of improving the performance/power of the system, for minimal (5%) loss in accuracy.
We discover communities from social network data and analyze the community evolution. These communities are inherent characteristics of human interaction in online social networks, as well as paper citation networks. Also, communities may evolve over time, due to changes to individuals’ roles and social status in the network as well as changes to individuals’ research interests. We present an innovative algorithm that deviates from the traditional two-step approach to analyze community evolutions. In the traditional approach, communities are first detected for each time slice, and then compared to determine correspondences. We argue that this approach is inappropriate in applications with noisy data. In this paper, we propose FacetNet for analyzing communities and their evolutions through a robust unified process. This novel framework will discover communities and capture their evolution with temporal smoothness given by historic community structures. Our approach relies on formulating the problem in terms of maximum a posteriori (MAP) estimation, where the community structure is estimated both by the observed networked data and by the prior distribution given by historic community structures. Then we develop an iterative algorithm, with proven low time complexity, which is guaranteed to converge to an optimal solution. We per- form extensive experimental studies, on both synthetic datasets and real datasets, to demonstrate that our method discovers meaningful communities and provides additional insights not directly obtainable from traditional methods.
Transdisciplinary collaborations call for dynamic, responsive slide-ware presentations beyond the linear structure afforded by traditional tools. The NextSlidePlease application addresses this through a novel authoring and presentation interface. The application also features an innovative algorithm to enhance presentation time management. The cross-platform Java application is currently being evaluated in a variety of real-world presentation contexts.
In this position paper, we propose the idea that emergent and evolutionary aspects of semantics, which are complementary to the problem of semantic detection, are foundational to multimedia computing. We show that media rich social networks reveal certain implicit assumptions in concept learning about semantics, including semantic stability, emergence, and stability of context. We study the problem of semantic evolution in the context of media rich networks — (a) since meaning is an emergent artifact of human activity, it is crucial to study how human beings interact with, consume and share media data. (b) The ready availability of large scale social interaction datasets of blogs including sites such as Flickr and YouTube, allows us to instrument the relationship between media and human activity at a scale not available to earlier researchers. We have identified three initial problem areas critical to evolutionary aspects of semantics — community discovery, information flow and semantic diversity. We shall present examples of research problems addressed in each of the three areas.
Minimizing the number of computations a low-power device makes is important to achieve long battery life. In this paper we present a framework for a low-power device to minimize the number of calculations needed to detect and classify simple activities of daily living such as sitting, standing, walking, reaching, and eating. This technique uses wavelet analysis as part of the feature set extracted from accelerometer data. A log-likelihood ratio test and Hidden Markov Models (HMM) are used to detect transitions and classify different activities. A tradeoff is made between power and accuracy.
Online social networking sites such as Flickr and Facebook provide a diverse range of functionalities that foster online communities to create and share media content. In particular, Flickr groups are increasingly used to aggregate and share photos about a wide array of topics or themes. Unlike photo repositories where images are typically organized with respect to static topics, the photo sharing process as in Flickr often results in complex time-evolving social and visual patterns. Characterizing such time-evolving patterns can enrich media exploring experience in a social media repository. In this paper, we propose a novel framework that characterizes distinct time evolving patterns of group photo streams. We use a nonnegative joint matrix factorization approach to incorporate image content features and contextual information, including associated tags, photo owners and post times. In our framework, we consider a group as a mixture of themes — each theme exhibits similar patterns of image content and context. The theme extraction is to best explain the observed image content features and associations with tags, users and times. Extensive experiments on a Flickr dataset suggest that our approach is able to extract meaningful evolutionary patterns from group photo streams. We evaluate our method through a tag prediction task. Our prediction results outperform baseline methods, which indicate the utility of our theme based joint analysis.
This paper presents a novel social media summarization framework. Summarizing media created and shared in large scale online social networks unfolds challenging research problems. The networks exhibit heterogeneous social interactions and temporal dynamics. Our proposed framework relies on the co-presence of multiple important facets: who (users), what (concepts and media), how (actions) and when (time). First, we impose a syntactic structure of the social activity (relating users, media and concepts via specific actions) in our temporal multi-graph mining algorithm. Second, important activities along each facet are extracted as activity themes over time. Experiments on Flickr datasets demonstrate that our technique captures nontrivial evolution of media use in social networks.
We propose a computational framework to predict synchrony of action in online social media. Synchrony is a temporal social network phenomenon in which a large number of users are observed to mimic a certain action over a period of time with sustained participation from early users. Understanding social synchrony can be helpful in identifying suitable time periods of viral marketing. Our method consists of two parts – the learning framework and the evolution framework. In the learning framework, we develop a DBN based representation that includes an understanding of user context to predict the probability of user actions over a set of time slices into the future. In the evolution framework, we evolve the social network and the user models over a set of future time slices to predict social synchrony. Extensive experiments on a large dataset crawled from the popular social media site Digg (comprising ~7M diggs) show that our model yields low error (15.2∓4.3%) in predicting user actions during periods with and without synchrony. Comparison with baseline methods indicates that our method shows significant improvement in predicting user actions.
This paper aims at discovering community structure in rich media social networks, through analysis of time-varying, multi-relational data. Community structure represents the latent social context of user actions. It has important applications in information tasks such as search and recommendation. Social media has several unique challenges. (a) In social media, the context of user actions is constantly changing and co-evolving; hence the social context contains time-evolving multi-dimensional relations. (b) The social context is determined by the available system features and is unique in each social media website. In this paper we propose MetaFac (MetaGraph Factorization), a framework that extracts community structures from various social contexts and interactions. Our work has three key contributions: (1) metagraph, a novel relational hypergraph representation for modeling multi-relational and multi-dimensional social data; (2) an efficient factorization method for community extraction on a given metagraph; (3) an on-line method to handle time-varying relations through incremental metagraph factorization. Extensive experiments on real-world social data collected from the Digg social media website suggest that our technique is scalable and is able to extract meaningful communities based on the social media contexts. We illustrate the usefulness of our framework through prediction tasks. We outperform baseline methods (including aspect model and tensor analysis) by an order of magnitude.
In this paper we develop a recommendation framework to connect image content with communities in online social media. The problem is important because users are looking for useful feedback on their uploaded content, but finding the right community for feedback is challenging for the end user. Social media are characterized by both content and community. Hence, in our approach, we characterize images through three types of features: visual features, user generated text tags, and social interaction (user communication history in the form of comments). A recommendation framework based on learning a latent space representation of the groups is developed to recommend the most likely groups for a given image. The model was tested on a large corpus of Flickr images comprising 15,689 images. Our method outperforms the baseline method, with a mean precision 0.62 and mean recall 0.69. Importantly, we show that fusing image content, text tags with social interaction features outperforms the case of only using image content or tags.
This paper presents JAM (Joint Action Matrix Factorization), a novel framework to summarize social activity from rich media social networks. Summarizing social network activities requires an understanding of the relationships among concepts, users, and the context in which the concepts are used. Our work has three contributions: First, we propose a novel summarization method which extracts the co-evolution on multiple facets of social activity – who (users), what (concepts), how (actions) and when (time), and constructs a context rich summary called "activity theme". Second, we provide an efficient algorithm for mining activity themes over time. The algorithm extracts representative elements in each facet based on their co-occurrences with other facets through specific actions. Third, we propose new metrics for evaluating the summarization results based on the temporal and topological relationship among activity themes. Extensive experiments on real-world Flickr datasets demonstrate that our technique significantly outperforms several baseline algorithms. The results explore nontrivial evolution in Flickr photo-sharing communities.
The tasks in the physical environments are mainly information centric processes, such as search and exploration of physical objects. We have developed an informational environment, AURA that supports object searches in the physical world1. The goal of AURA is to enable individuals to use the environment in which they function as a living(short-term) memory of their activities and of the objects with which they interact in this environment. To support physical searches, the environment that the user is occupying must be transparently embedded with relevant information and made accessible by in-situ search mechanisms. We achieve this through innovative algorithms that re-imagine a collection of environmentally distributed RFID tags to act as a distributed storage cloud that encodes the required information for attribute-based object search. Since RFID tags lack radio transmitters and, thus, cannot communicate among each other, aura Prop and auraSearch leverage the movements of the humans in the environment to propagate information: as they move in the environment, users not only leave traces (or auras) of their own activities, but also help further disseminate auras of prior activities in the same space. This scheme creates an information-gradient in the physical environment which AURA then leverages to direct the user toward the object of interest. auraSearch significantly reduces the number of steps that the user has to walk while searching for a given object.
Rich media social networks promote not only creation and consumption of media, but also communication about the posted media item. What causes a conversation to be interesting, that prompts a user to participate in the discussion on a posted video? We conjecture that people participate in conversations when they find the conversation theme interesting, see comments by people whom they are familiar with, or observe an engaging dialogue between two or more people (absorbing back and forth exchange of comments). Importantly, a conversation that is interesting must be consequential - i.e. it must impact the social network itself. Our framework has three parts: characterizing themes, characterizing participants for determining interestingness and measures of consequences of a conversation deemed to be interesting. First, we detect conversational themes using a mixture model approach. Second, we determine interestingness of participants and interestingness of conversations based on a random walk model. Third, we measure the consequence of a conversation by measuring how interestingness affects the following three variables - participation in related themes, participant cohesiveness and theme diffusion. We have conducted extensive experiments using dataset from the popular video sharing site, YouTube. Our results show that our method of interestingness maximizes the mutual information, and is significantly better (twice as large) than three other baseline methods (number of comments, number of new participants and PageRank based assessment).
Social media websites promote diverse user interaction on media objects as well as user actions with respect to other users. The goal of this work is to discover community structure in rich media social networks, and observe how it evolves over time, through analysis of multi-relational data. The problem is important in the enterprise domain where extracting emergent community structure on enterprise social media, can help in forming new collaborative teams, aid in expertise discovery, and guide long term enterprise reorganization. Our approach consists of three main parts: (1) a relational hypergraph model for modeling various social context and interactions; (2) a novel hypergraph factorization method for community extraction on multi-relational social data; (3) an on-line method to handle temporal evolution through incremental hypergraph factorization. Extensive experiments on real-world enterprise data suggest that our technique is scalable and can extract meaningful communities. To evaluate the quality of our mining results, we use our method to predict users' future interests. Our prediction outperforms baseline methods (frequency counts, pLSA) by 36-250% on the average, indicating the utility of leveraging multi-relational social context by using our method.
In this paper, we introduce AURA, a novel framework for enriching the physical environment with information about objects and activities in order to support searches in the physical world. The goal is to enable individuals to use the environment in which they function as a living (short-term) memory of their activities and of the objects with which they interact in this environment. In order to act as a memory, the physical environment must be transparently embedded with relevant information and made accessible by in-situ search mechanisms. We achieve this embedding through innovative algorithms that leverage a collection of parasitic RFID tags distributed in the environment to act as a distributed storage cloud. Information about the activities of the users and objects with which they interact are encoded and stored, in a decentralized way, on these RFID tags to support attribute-based search. A novel auraProp algorithm disseminates information in the environment and a complementary auraSearch algorithm implements spatial searches for physical objects in the environment. Parasitic RFID tags are not self-powered and thus cannot communicate among each other. AURA leverages human movement in the environment to propagate information: as they move in the environment, users not only leave traces (or auras) of their own activities, but also help further disseminate auras of prior activities in the same space. AURA relies on a novel signature based information dissemination mechanism and a randomized information erasure scheme to ensure that the extremely limited storage spaces available on the RFID tags are used effectively. The erasure scheme also helps create an information gradient in the physical environment, which the auraSearch algorithm uses to direct the user towards the object of interest.
Experiential media systems refer to real time, physically grounded multimedia systems in which the user is both the producer and consumer of meaning. These systems require embodied interaction on part of the user to gain new knowledge. In this chapter we have presented our efforts to develop a real-time, multimodal biofeedback system for stroke patients. It is a highly specialized experiential media system where the knowledge that is imparted refers to a functional task – the ability to reach and grasp an object. There are several key ideas in this chapter: we show how to derive critical motion features using a biomechanical model for the reaching functional task. Then we determine the formal progression of the feedback and its relationship to action. We show how to map movement parameters into auditory and visual parameters in real-time. We develop novel validation metrics for spatial accuracy, opening, flow and consistency. Our real-world experiments with unimpaired subjects show that we are able to communicate key aspects of motion through feedback. Importantly they demonstrate the messages encoded in the feedback can be parsed by the unimpaired subjects.
In this article, we present a media adaptation framework for an immersive biofeedback system for stroke patient rehabilitation. In our biofeedback system, media adaptation refers to changes in audio/visual feedback as well as changes in physical environment. Effective media adaptation frameworks help patients recover generative plans for arm movement with potential for significantly shortened therapeutic time. The media adaptation problem has significant challenges—(a) high dimensionality of adaptation parameter space; (b) variability in the patient performance across and within sessions; (c) the actual rehabilitation plan is typically a non-first-order Markov process, making the learning task hard. Our key insight is to understand media adaptation as a real-time feedback control problem. We use a mixture-of-experts based Dynamic Decision Network (DDN) for online media adaptation. We train DDN mixtures per patient, per session. The mixture models address two basic questions—(a) given a specific adaptation suggested by the domain experts, predict the patient performance, and (b) given the expected performance, determine the optimal adaptation decision. The questions are answered through an optimality criterion based search on DDN models trained in previous sessions. We have also developed new validation metrics and have very good results for both questions on actual stroke rehabilitation data.
We have developed a computational framework to characterize social network dynamics in the blogosphere at individual, group and community levels. Such characterization could be used by corporations to help drive targeted advertising and to track the moods and sentiments of consumers. We tested our model on a widely read technology blog called Engadget. Our results show that communities transit between states of high and low entropy, depending on sentiments (positive / negative) about external happenings. We also propose an innovative method to establish the utility of the extracted knowledge, by correlating the mined knowledge with an external time series data (the stock market). Our validation results show that the characterized groups exhibit high stock market movement predictability (89%) and removal of 'impactful' groups makes the community less resilient by lowering predictability (26%) and affecting the composition of the groups in the rest of the community.
We present a framework for automatically summarizing social group activity over time. The problem is important in understanding large scale online social networks, which have diverse social interactions and exhibit temporal dynamics. In this work we construct summarization by extracting activity themes. We propose a novel unified temporal multi-graph framework for extracting activity themes over time. We use non-negative matrix factorization (NMF) approach to derive two interrelated latent spaces for users and concepts. Activity themes are extracted from the derived latent spaces to construct group activity summary. Experiments on real-world Flickr datasets demonstrate that our technique outperforms baseline algorithms such as LSI, and is additionally able to extract temporally representative activities to construct meaningful group activity summary.
In this paper, we develop a temporally evolving representation framework for context that can efficiently predict communication flow in social networks between a given pair of individuals. The problem is important because it facilitates determining social and market trends as well as efficient information paths among people. We describe communication flow by two parameters: the intent to communicate and communication delay. To estimate these parameters, we design features to characterize communication and social context. Communication context refers to the attributes of current communication. Social context refers to the patterns of participation in communication (information roles) and the degree of overlap of friends between two people (strength of ties). A subset of optimal features of the communication and social context is chosen at a given time instant using five different feature selection strategies. The features are thereafter used in a Support Vector Regression framework to predict the intent to communicate and the delay between a pair of individuals. We have excellent results on a real world dataset from the most popular social networking site, www.myspace.com. We observe interestingly that while context can reasonably predict intent, delay seems to be more dependent on the personal contextual changes and other latent factors characterizing communication, e.g. 'age' of information transmitted and presence of cliques among people.
In this paper, we develop a simple model to study and analyze communication dynamics in the blogosphere and use these dynamics to determine interesting correlations with stock market movement. This work can drive targeted advertising on the web as well as facilitate understanding community evolution in the blogosphere. We describe the communication dynamics by several simple contextual properties of communication, e.g. the number of posts, the number of comments, the length and response time of comments, strength of comments and the different information roles that can be acquired by people (early responders / late trailers, loyals / outliers). We study a "technology-savvy" community called Engadget (http://www.engadget.com). There are two key contributions in this paper: (a) we identify information roles and the contextual properties for four technology companies, and (b) we model them as a regression problem in a Support Vector Machine framework and train the model with stock movements of the companies. It is interestingly observed that the communication activity on the blogosphere has considerable correlations with stock market movement. These correlation measures are further cross-validated against two baseline methods. Our results are promising yielding about 78% accuracy in predicting the magnitude of movement and 87% for the direction of movement.
We discover communities from social network data, and analyze the community evolution. These communities are inherent characteristics of human interaction in online social networks, as well as paper citation networks. Also, communities may evolve over time, due to changes to individuals' roles and social status in the network as well as changes to individuals' research interests. We present an innovative algorithm that deviates from the traditional two-step approach to analyze community evolutions. In the traditional approach, communities are first detected for each time slice, and then compared to determine correspondences. We argue that this approach is inappropriate in applications with noisy data. In this paper, we propose FacetNet for analyzing communities and their evolutions through a robust unified process. In this novel framework, communities not only generate evolutions, they also are regularized by the temporal smoothness of evolutions. As a result, this framework will discover communities that jointly maximize the fit to the observed data and the temporal evolution. Our approach relies on formulating the problem in terms of non-negative matrix factorization, where communities and their evolutions are factorized in a unified way. Then we develop an iterative algorithm, with proven low time complexity, which is guaranteed to converge to an optimal solution. We perform extensive experimental studies, on both synthetic datasets and real datasets, to demonstrate that our method discovers meaningful communities and provides additional insights not directly obtainable from traditional methods.
This paper aims to develop a generalized framework to systematically trade off computational complexity with output distortion in linear transforms such as the DCT, in an optimal manner. The problem is important in real-time systems where the computational resources available are time-dependent. Our approach is generic and applies to any linear transform and we use the DCT as a specific example. There are three key ideas: (a) a joint transform pruning and Haar basis projection-based approximation technique. The idea is to save computations by factoring the DCT transform into signal-independent and signal-dependent parts. The signal-dependent calculation is done in real-time and combined with the stored signal-independent part, saving calculations. (b) We propose the idea of the complexity-distortion framework and present an algorithm to efficiently estimate the complexity distortion function and search for optimal transform approximation using several approximation candidate sets. We also propose a measure to select the optimal approximation candidate set, and (c) an adaptive approximation framework in which the operating points on the C-D curve are embedded in the metadata. We also present a framework to perform adaptive approximation in real time for changing computational resources by using the embedded metadata. Our results validate our theoretical approach by showing that we can reduce transform computational complexity significantly while minimizing distortion.
Events are real-world occurrences that unfold over space and time. Event mining from multimedia streams improves the access and reuse of large media collections, and it has been an active area of research with notable progress. This paper contains a survey on the problems and solutions in event mining, approached from three aspects: event description, event-modeling components, and current event mining systems. We present a general characterization of multimedia events, motivated by the maxim of five "W's" and one "H" for reporting real-world events in journalism: when, where, who, what, why, and how. We discuss the causes for semantic variability in real-world descriptions, including multilevel event semantics, implicit semantics facets, and the influence of context. We discuss five main aspects of an event detection system. These aspects are: the variants of tasks and event definitions that constrain system design, the media capture setup that collectively define the available data and necessary domain assumptions, the feature extraction step that converts the captured data into perceptually significant numeric or symbolic forms, statistical models that map the feature representations to richer semantic descriptions, and applications that use event metadata to help in different information-seeking tasks. We review current event-mining systems in detail, grouping them by the problem formulations and approaches. The review includes detection of events and actions in one or more continuous sequences, events in edited video streams, unsupervised event discovery, events in a collection of media objects, and a discussion on ongoing benchmark activities. These problems span a wide range of multimedia domains such as surveillance, meetings, broadcast news, sports, documentary, and films, as well as personal and online media collections. We conclude this survey with a brief outlook on open research directions.
This article addresses the problem of spam blog (splog) detection using temporal and structural regularity of content, post time and links. Splogs are undesirable blogs meant to attract search engine traffic, used solely for promoting affiliate sites. Blogs represent popular online media, and splogs not only degrade the quality of search engine results, but also waste network resources. The splog detection problem is made difficult due to the lack of stable content descriptors. We have developed a new technique for detecting splogs, based on the observation that a blog is a dynamic, growing sequence of entries (or posts) rather than a collection of individual pages. In our approach, splogs are recognized by their temporal characteristics and content. There are three key ideas in our splog detection framework. (a) We represent the blog temporal dynamics using self-similarity matrices defined on the histogram intersection similarity measure of the time, content, and link attributes of posts, to investigate the temporal changes of the post sequence. (b) We study the blog temporal characteristics using a visual representation derived from the self-similarity measures. The visual signature reveals correlation between attributes and posts, depending on the type of blogs (normal blogs and splogs). (c) We propose two types of novel temporal features to capture the splog temporal characteristics. In our splog detector, these novel features are combined with content based features. We extract a content based feature vector from blog home pages as well as from different parts of the blog. The dimensionality of the feature vector is reduced by Fisher linear discriminant analysis. We have tested an SVM-based splog detector using proposed features on real world datasets, with appreciable results (90% accuracy).
The paper develops a novel computational framework for predicting communication flow in social networks based on several contextual features. The problem is important because prediction of communication flow can impact timely sharing of specific information across a wide array of communities. We determine the intent to communicate and communication delay between users based on several contextual features in a social network corresponding to (a) neighborhood context, (b) topic context and (c) recipient context. The intent to communicate and communication delay are modeled as regression problems which are efficiently estimated using Support Vector Regression. We predict the intent and the delay, on an interval of time using past communication data. We have excellent prediction results on a real-world dataset from MySpace.com with an accuracy of 13-16%. We show that the intent to communicate is more significantly influenced by contextual factors compared to the delay.
There are information needs involving costly decisions that cannot be efficiently satisfied through conventional Web search engines. Alternately, community centric search can provide multiple viewpoints to facilitate decision making. We propose to discover and model the temporal dynamics of thematic communities based on mutual awareness, where the awareness arises due to observable blogger actions and the expansion of mutual awareness leads to community formation. Given a query, we construct a directed action graph that is time-dependent, and weighted with respect to the query. We model the process of mutual awareness expansion using a random walk process and extract communities based on the model. We propose an interaction space based representation to quantify community dynamics. Each community is represented as a vector in the interaction space and its evolution is determined by a novel interaction correlation method. We have conducted experiments with a real-world blog dataset and have promising results for detection as well as insightful results for community evolution.
In this paper, we present a media adaptation framework for an immersive biofeedback system for stroke patient rehabilitation. In our biofeedback system, media adaptation refers to changes in audio/visual feedback as well as changes in physical environment. Effective media adaptation frameworks help patients recover generative plans for arm movement with potential for significantly shortened therapeutic time. The media adaptation problem has significant challenges - (a) high dimensionality of adaptation parameter space (b) variability in the patient performance across and within sessions(c) the actual rehabilitation plan is typically a non first-order Markov process, making the learning task hard. Our key insight is to understand media adaptation as a real-time feedback control problem. We use a mixture-of-experts based Dynamic Decision Network (DDN) for online media adaptation. We train DDN mixtures per patient, per session. The mixture models address two basic questions - (a) given a specific adaptation suggested by the domain expert, predict patient performance and (b) given an expected performance, determine optimal adaptation decision. The questions are answered through an optimality criterion based search on DDN models trained in previous sessions. We have also developed new validation metrics and have very good results for both questions on actual stroke rehabilitation data.
In this paper, we present a novel visual design for information dense summaries of patient data with applications in biofeedback rehabilitation. The problem is important in review of large medical datasets where the clinicians require that both summary and all the performance details be shown at the same time. There are two main ideas (a) Summarizing data along the conceptual facets (accuracy / flow / openness) and the temporal facets (session / set / trial) in the biofeedback therapy. The conceptual facets represent key information needed by the experts to review patient performance. (b) Effectively present the data trends and the details in context of the entire performance. The summary incorporates ideas from graphic design and reveals the performance data at two time scales.
This paper focuses on the development of an event driven media sharing repository to facilitate community awareness. In this paper, an event refers to a real-world occurrences that unfolds over space and time. Our event model implementation supports creation of events using the standard facets of who, where, when and what. A key novelty in this research lies in the support of arbitrary event-event semantic relationships. We facilitate global as well as personalized event relationships. Each relationship can be unary or binary and can be at multiple granularities. The relationships can exist between events, between media, and between media and events. We have implemented a web based media archive system that allows people to create, explore and mange events. We have implemented an RSS based notification system that promotes awareness of actions. The initial user feedback has been positive and we are in the process of conducting a longitudinal study.
In this paper, we present a novel visual design for information dense summaries of patient data with applications in biofeedback rehabilitation. The problem is important in review of large medical datasets where the clinicians require that both summary and all the performance details be shown at the same time. There are two main ideas (a) Summarizing data along the conceptual facets (accuracy / flow / openness) and the temporal facets (session / set / trial) in the biofeedback therapy. The conceptual facets represent key information needed by the experts to review patient performance. (b) Effectively present the data trends and the details in context of the entire performance. The summary incorporates ideas from graphic design and reveals the performance data at two time scales.
This work deals with the problem of event annotation in social networks. The problem is made difficult due to variability of semantics and due to scarcity of labeled data. Events refer to real-world phenomena that occur at a specific time and place, and media and text tags are treated as facets of the event metadata. We are proposing a novel mechanism for event annotation by leveraging related sources (other annotators) in a social network. Our approach exploits event concept similarity, concept co-occurrence and annotator trust. We compute concept similarity measures across all facets. These measures are then used to compute event-event and user-user activity correlation. We compute inter-facet concept co-occurrence statistics from the annotations by each user. The annotator trust is determined by first requesting the trusted annotators (seeds) from each user and then propagating the trust amongst the social network using the biased PageRank algorithm. For a specific media instance to be annotated, we start the process from an initial query vector and the optimal recommendations are determined by using a coupling strategy between the global similarity matrix, and the trust weighted global co-occurrence matrix. The coupling links the common shared knowledge (similarity between concepts) that exists within the social network with trusted and personalized observations (concept co-occurrences). Our initial experiments on annotated everyday events are promising and show substantial gains against traditional SVM based techniques.
This paper focuses on spam blog (splog) detection. Blogs are highly popular, new media social communication mechanisms. The presence of splogs degrades blog search results as well as wastes network resources. In our approach we exploit unique blog temporal dynamics to detect splogs. There are three key ideas in our splog detection framework. We first represent the blog temporal dynamics using self-similarity matrices defined on the histogram intersection similarity measure of the time, content, and link attributes of posts. Second, we show via a novel visualization that the blog temporal characteristics reveal attribute correlation, depending on type of the blog (normal blogs and splogs). Third, we propose the use of temporal structural properties computed from self-similarity matrices across different attributes. In a splog detector, these novel features are combined with content based features. We extract a content based feature vector from different parts of the blog -- URLs, post content, etc. The dimensionality of the feature vector is reduced by Fisher linear discriminant analysis. We have tested an SVM based splog detector using proposed features on real world datasets, with excellent results (90% accuracy).
This paper aims to develop a novel framework to systematically trade-off computational complexity with output distortion in linear multimedia transforms, in an optimal manner. The problem is important in real-time systems where the computational resources available are time-dependent. We solve the real-time adaptation problem by developing an approximate transform framework. There are three key contributions of this paper—(a) a fast basis projection approximation framework that allows us to store signal independent partial transform results to be used in real-time, (b) estimating the complexity distortion curve for the linear transform approximation using a given basis projection approximation set and searching for optimal transform approximation which satisfies the complexity constraint with minimum distortion and (c) determining optimal operating points on complexity distortion function and a meta-data embedding algorithm for images that allows for real-time adaptation. We have applied this approach on the FFT approximation for images with excellent results.
This paper describes a framework to annotate images using personal and social network contexts. The problem is important as the correct context reduces the number of image annotation choices.. Social network context is useful as real-world activities of members of the social network are often correlated within a specific context. The correlation can serve as a powerful resource to effectively increase the ground truth available for annotation. There are three main contributions of this paper: (a) development of an event context framework and definition of quantitative measures for contextual correlations based on concept similarity in each facet of event context; (b) recommendation algorithms based on spreading activations that exploit personal context as well as social network context; (c) experiments on real-world, everyday images that verified both the existence of inter-user semantic disagreement and the improvement in annotation when incorporating both the user and social network context. We have conducted two user studies, and our quantitative and qualitative results indicate that context (both personal and social) facilitates effective image annotation.
This paper focuses on spam blog (splog) detection. Blogs are highly popular, new media social communication mechanisms and splogs corrupt blog search results as well as waste network resources. In our approach we exploit unique blog temporal dynamics to detect splogs. The key idea is that splogs exhibit high temporal regularity in content and post time, as well as consistent linking patterns. Temporal content regularity is detected using a novel autocorrelation of post content. Temporal structural regularity is determined using the entropy of the post time difference distribution, while the link regularity is computed using a HITS based hub score measure. Experiments based on the annotated ground truth on real world dataset show excellent results on splog detection tasks with 90% accuracy.
In this paper, we present a framework to analyze and summarize the temporal dynamics within personal blogs. Blog temporal dynamics are difficult to capture using a few class descriptors. Our approach comprises of (1) a representation of blog dynamics using self-similarity matrices, (2) theme extraction using non-negative self-similarity matrix factorization, and (3) a visualization representing blog theme evolution. Summaries based on large real-world blog datasets reveals interesting temporal characteristics for four blog types - personal blog, cooperative blog, power blog and spam blogs.
This paper describes our framework to annotate events using personal and social network contexts. The problem is important as the correct context is critical to effective annotation. Social network context is useful as real-world activities of members of the social network are often correlated, within a specific context. There are two main contributions of this paper: (a) development of an event context framework and definition of quantitative measures for contextual correlations based on concept similarity (b) recommendation algorithms based on spreading activations that exploit personal context as well as social network context. We have very good experimental results. Our user study with real world personal images indicates that context (both personal and social) facilitates effective image annotation.
In this paper, we develop a theoretical understanding of multi-sensory knowledge and user context and their inter-relationships. This is used to develop a generic representation framework for multi-sensory knowledge and context. A representation framework for context can have a significant impact on media applications that dynamically adapt to user needs. There are three key contributions of this work: (a) theoretical analysis, (b) representation framework and (c) experimental validation. Knowledge is understood to be a dynamic set of multi-sensory facts with three key properties – multi-sensory, emergent and dynamic. Context is the dynamic subset of knowledge that affects the communication between entities. We develop a graph based, multi-relational representation framework for knowledge, and model its temporal dynamics using a linear dynamical system. Our approach results in a stable and convergent system. We applied our representation framework to a image retrieval system with a large collection of photographs from everyday events. Our experimental validation with the retrieval evaluated against two reference algorithms indicates that our context based approach provides significant gains in real-world usage scenarios.
This paper describes a novel and functional application of data sonification as an element in an immersive stroke rehabilitation system. For two years, we have been developing a task-based experiential media biofeedback system that incorporates musical feedback as a means to maintain patient interest and impart movement information to the patient. This paper delivers project background, system goals, a description of our system including an in-depth look at our audio engine, and lastly an overview of proof of concept experiments with both unimpaired subjects and actual stroke patients suffering from right-arm impairment.
This paper presents a novel interdisciplinary approach for the training of media scientists and engineers. The approach is being implemented at the Arts, Media and Engineering program at Arizona State University. Our graduates will produce new paradigms for the integration of computation and media in the physical human experience. Their work will result in hybrid physical-digital environments that address significant challenges in key areas of the human condition. The training model has the following characteristics: it is driven by research problems rather than by disciplines; it is based on an extensive interdisciplinary network; it is organized under the dimensions of research and application and promotes work at their intersection; it provides multiple paths for combining different types of disciplinary training with interdisciplinary training; it also provides a transdisciplinary training path. This paper presents the reasoning and implementation structures of these key facets, discusses our evaluation approach and presents preliminary results.
In this paper we propose a framework for the computational extraction of spatial and time characteristics of a single choreographic work.Computational frameworks can aid in revealing non-salient compositional structures in modern dance. The computational extraction of such features allows for the creation of interactive works where the movement and the digital feedback (graphics,sound etc)are integrally connected at deep level of structures. It also facilitates a better understanding of the choreographic process. There are two key contributions in this paper: (a) a systematic analysis of the observable and non-salient aspects of solo dance form, (b) computational analysis of spatio-temporal phrasing structures guided by critical understanding of observable form. Our analysis results are excellent indicating the presence of rich, latent spatio-temporal organization in specific semi- improvisatory modern dance works that may provide rich structural material for interactivity.
A previous design of a biofeedback system for Neurorehabilitation in an interactive multimodal environment has demonstrated the potential of engaging stroke patients in task-oriented neuromotor rehabilitation. This report explores the new concept and alternative designs of multimedia based biofeedback systems. In this system, the new interactive multimodal environment was constructed with abstract presentation of movement parameters. Scenery images or pictures and their clarity and orientation are used to reflect the arm movement and relative position to the target instead of the animated arm. The multiple biofeedback parameters were classified into different hierarchical levels w.r.t. importance of each movement parameter to performance. A new quantified measurement for these parameters were developed to assess the patient’s performance both real-time and offline. These parameters were represented by combined visual and auditory presentations with various distinct music instruments. Overall, the objective of newly designed system is to explore what information and how to feedback information in interactive virtual environment could enhance the sensorimotor integration that may facilitate the efficient design and application of virtual environment based therapeutic intervention.
This paper aims to develop a novel framework to systematically trade-off computational complexity with output distortion, in linear multimedia transforms, in an optimal manner. The problem is important in real-time systems where the computational resources available are time-dependent. We solve the real-time adaptation problem by developing an approximate transform framework. There are three key contributions of this paper—(a) a fast basis approximation framework that allows us to store signal independent partial transform results to be used in real-time, (b) estimating the complexity distortion curve for the linear transform using a basis set and (c) determining optimal operating points and a meta-data embedding algorithm for images that allows for real-time adaptation. We have applied this approach on the FFT transform with excellent results.
In this paper we present our work on a system to support real-time multimodal archiving, collaborative annotation and offline information visualization for a biofeedback stroke-rehabilitation application. Our archiving / annotation / visualization system can play a critical role in the long-term biofeedback stroke therapy by supporting cooperative data analysis and media feedback as well as by providing the therapist with insight into computing- supported therapy. There are three contributions of this paper: (a) the design of a robust archiving system that archives in real time parametric model data (motion capture, motion analysis and audio / visual synthesis parameters) as well as audio / video from the biofeedback environment. (b) a web-based annotation tool designed with low cognitive load (c) a hierarchical information visualization tool that enables the therapist and other team members to examine quantitative motion analysis of subject performance with the context of media feedback, thus enabling collaborative insights. Our user studies indicate that the system performs well.
Today, media creators form a significant fraction of those who use signal processing tools. Environments such Max/MSP [1] and Photoshop [2] play a key role in the creation of music and images. In these environments, authors frequently use advanced signal processing algorithms in the form of filters as well as effects such as sound reverberations and delays. However, most of the people involved in media production do not have a math background. The content creators are keen to develop a sophisticated understanding of their tools. This allows them to predict the effects of their creative decisions before experiencing them. It also helps foster an understanding of why certain choices (e.g., size of the convolution ker- nel) can have a big impact on the final result. It is important to note that the computer music community [3], [4] has played a significant role in broadening the reach of signal processing. In this article, we describe our experiences with developing a signal processing course specifically designed for students with a background in the arts. The course was developed at the new Arts Media and Engineering (AME) program at Arizona State University.
This book chapter provides a definition for experiential media systems, discusses motivating ideas and presents example applications.
This paper is focused on the development of serendipitous interfaces that promote casual and chance encounters within a geographically distributed community. The problem is particularly important for distributed workforces, where there is little opportunity for chance encounters that are crucial to the formation of a sense of community. There are three contributions of this paper. (a) development of a robust communication architecture facilitating serendipitous casual interaction using online media repositories coupled to two multimodal interfaces (b) development of multimodal interfaces that allow users to browse, leave audio comments, and asynchronously listen to other community members, and (c) a multimodal gesture driven control (vision and ultrasonic) of the audio-visual display. Our user studies reveal that the interfaces are well liked, and promote social interaction.
In this demo, we present the use of the ARIA platform for modular design of media processing and retrieval applications. ARIA is a middleware for describing and executing media processing workflows to process, filter, and fuse sensory inputs and actuate responses in real- time. ARIA is designed with the goal of maximum modularity and ease of integration of a diverse collection of media processing components and data sources. Moreover, ARIA is cognizant of the fact that various media operators and data structures are adaptable in nature; i.e, the delay, size, and quality/precision characteristics of these operators can be controlled via various parameters. In this demo, we present the ARIA design interface in different image processing and retrieval scenarios.
In this paper we propose a representation framework for dynamic multi-sensory knowledge and user context, and its application in media retrieval. We provide a definition of context, the relationship between context and knowledge and the importance of communication both as a means for the building of context as well as the end achieved by the context. We then propose a model of user context and demonstrate its application in a photo retrieval application. Our experiments demonstrate the advantages of the context-aware media retrieval over other media retrieval approaches especially relevance feedback.
Spam blogs (splogs) have become a major problem in the increasingly popular blogosphere. Splogs are detrimental in that they corrupt the quality of information retrieved and they waste tremendous network and storage resources. We study several research issues in splog detection. First, in comparison to web spam and email spam, we identify some unique characteristics of splog. Second, we propose a new online task that captures the unique characteristics of splog, in addition to tasks based on the traditional IR evaluation framework. The new task introduces a novel time-sensitive detection evaluation to indicate how quickly a detector can identify splogs. Third, we propose a splog detection algorithm that combines traditional content features with temporal and link regularity features that are unique to blogs. Finally, we develop an annotation tool to generate ground truth on a sampled subset of the TREC-Blog dataset. We conducted experiments on both offline (traditional splog detection) and our proposed online splog detection task. Experiments based on the annotated ground truth set show excellent results on both offline and online splog detection tasks.
Blogs have many fast growing communities on the Internet. Discovering such communities in the blogosphere is important for sustaining and encouraging new blogger participation. We focus on extracting communities based on two key insights—(a) communities form due to individual blogger actions that are mutually observable; (b) semantics of the hyperlink structure are different from traditional web analysis problems. Our approach involves developing computational models for mutual awareness that incorporates the specific action type, frequency and time of occurrence. We use the mutual awareness feature with a ranking- based community extraction algorithm to discover communities. To validate our approach, four performance measures are used on the WWW2006 Blog Workshop dataset and the NEC focused blog dataset with excellent quantitative results. The extracted communities also demonstrate to be semantically cohesive with respect to their topics of interest.
In this paper we propose a framework for the computational extraction of time characteristics of a single choreographic work. Computational frameworks can aid in revealing non-salient compositional structures in modern dance. The computational extraction of such features allows for the creation of interactive works where the movement and the digital feedback (graphics, sound etc) are integrally connected at deep level of structures. It also facilitates a better understanding of the choreographic process. There are two key contributions in this paper: (a) a systematic analysis of the observable and non-salient aspects of solo dance form, (b) computational analysis of temporal phrasing structures guided by critical understanding of observable form. Our analysis results are excellent indicating the presence of rich, latent temporal organization in specific semi-improvisatory modern dance works that may provide rich structural material for interactivity.
This paper deals with the problem of estimating the effort required to maintain a static pose by human beings. The problem is important in developing dance summarization and rehabilitation applications. We estimate the human pose effort using two kinds of body constraints—skeletal constraints and gravitational constraints. The extracted features are combined together using SVM regression to estimate the pose effort. We tested our algorithm on 55 dance poses with different annotated efforts with excellent results. Our user studies additionally validate our approach.
This paper presents a novel real-time, multi-modal biofeedback system for stroke patient therapy. The problem is important as traditional mechanisms of rehabilitation are monotonous, and do not incorporate detailed quantitative assessment of recovery in addition to traditional clinical schemes. We have been working on developing an experiential media system that integrates task dependent physical therapy and cognitive stimuli within an interactive, multimodal environment. The environment provides a purposeful, engaging, visual and auditory scene in which patients can practice functional therapeutic reaching tasks, while receiving different types of simultaneous feedback indicating measures of both performance and results. There are three contributions of this paper—(a) identification of features and goals for the functional task (b) The development of sophisticated feedback (auditory and visual) mechanisms that match the semantics of action of the task. We additionally develop novel action-feedback coupling mechanisms. (c) New metrics to validate the ability of the system to promote learnability, stylization and engagement. We have validated the system for nine subjects with excellent results.
This paper deals with the problem of summarization and visualization of communication patterns in a large scale corporate social network. The solution to the problem can have significant impact in understanding large scale social network dynamics. There are three key aspects to our approach. First we propose a ring based network representation scheme—the insight is that visual displays of temporal dynamics of large scale social networks can be accomplished without using graph based layout mechanisms. Second, we detect three specific network activity patterns—periodicity, isolated and widespread patterns at multiple time scales. For each pattern we develop specific visualizations within the overall ring based framework. Finally we develop an activity pattern ranking scheme and a visualization that enables us to summarize key social network activities in a single snapshot. We have validated our approach by using the large Enron corpus—we have excellent activity detection results, and very good preliminary user study results for the visualization.
This paper deals with the problem of estimating 2D shape complexity. This has important applications in computer vision as well as in developing efficient shape classification algorithms. We define shape complexity using correlates of Kolmogorov complexity—entropy measures of global distance and local angle, and a measure of shape randomness. We tested our algorithm on synthetic and real world datasets with excellent results. We also conducted user studies that indicate that our measure is highly correlated with human perception. They also reveal an intuitive shape sensitivity curve—simple shapes are easily distinguished by small complexity variations, while complex shapes require significant complexity differences to be differentiated.
This paper describes the design and preliminary implementation, of a generative model for dynamic, real time soundscape creation. Our model is based on the work of the Acoustic Ecology community and provides a framework for the automated creation of compelling sonic environments that are both real and imagined. We outline extensions to the model that include interaction paradigms, context modeling, sound acquisition, and sound synthesis. Our work is flexible and extensible to a variety of different applications.
In this paper, we develop a novel real-time, interactive, automatic multimodal exploratory environment that dynamically adapts the media presented, to user context. There are two key contributions of this paper—(a) development of multimodal user-context model and (b) modeling the dynamics of the presentation to maximize coherence. We develop a novel user-context model comprising interests, media history, interaction behavior and tasks, that evolves based on the specific interaction. We also develop novel metrics between media elements and the user context. The presentation environment dynamically adapts to the current user context. We develop an optimal media selection and display framework that maximizes coherence, while constrained by the user-context, user goals and the structure of the knowledge in the exploratory environment. The experimental results indicate that the system performs well. The results also show that user-context models significantly improve presentation coherence.
In this paper, we present a joint multimodal (audio, visual and text) framework to map the informational complexity of the media elements to comprehension time. The problem is important for interactive multimodal presentations. We propose the joint comprehension time to be a function of the media Kolmogorov complexity. For audio and images, the complexity is estimated using a lossless universal coding scheme. The text complexity is derived by analyzing the sentence structure. For all three channels, we conduct user-studies to map media complexity to comprehension time. For estimating the joint comprehension time, we assume channel independence resulting in a conservative comprehension time estimate. The time for the visual channels (text and images) are deemed additive, and the joint time is then the maximum of the visual and the auditory comprehension times. The user studies indicate that the model works very well, when compared with fixed-time multimodal presentations.
In this paper, we present a novel image annotation approach with an emphasis on—(a) common sense based semantic propagation, (b) visual annotation interfaces and (c) novel evaluation schemes. The annotation system is interactive, intuitive and real-time. We attempt to propagate semantics of the annotations, by using WordNet and ConceptNet, and low-level features extracted from the images. We introduce novel semantic dissimilarity measures, and propagation frameworks. We develop a novel visual annotation interface that allows a user to group images by creating visual concepts using direct manipulation metaphors without manual annotation. We also develop a new evaluation technique for annotation that is based on relationship between concepts based on commonsensical relationships. Our Experimental results on three different datasets, indicate that the annotation system performs very well. The semantic propagation results are good—we converge close to the semantics of the image by annotating a small number (~16.8%) of database images.
This paper describes our system that enables members of a social network to collaboratively annotate a shared media collection. The problem is important since online social networks are emerging as conduits for exchange of everyday experiences. Our collaborative annotation system provides personalized recommendations to each user, based on (a) media features, (b) context, (c) commonsensical relationships and (d) linguistic relationships. We also develop novel concept specificity and abstractness / concreteness measures that further adapt the recommendations to the specific concept. Our preliminary user studies indicate that the system performs well and is more useful as compared to standard web browser recommendation schemes.
In this paper, we present an efficient 3D shape rejection algorithm for unlabeled 3D markers. The problem is important in domains such as rehabilitation and the performing arts. There are three key innovations in our approach—(a) a multi-resolution shape representation using Haar wavelets for unlabeled markers, (b) a multi-resolution shape metric and (c) a shape rejection algorithm that is predicated on the simple idea that we do not need to compute the entire distance to conclude that two shapes are dissimilar. We tested the approach on a real-world pose classification problem with excellent results. We achieved a classification accuracy of 98% with an order of magnitude improvement in terms of computational complexity over a baseline shape matching algorithm.
The engineering, arts and science disciplines involved in media training, research and education at Arizona State University have come together to create the Arts, Media and Engineering (AME) graduate education and research program. The education component of this program consists of formalized graduate concentrations within existing degrees that allow faculty and students to combine extensive training in their chosen discipline offered through their home department with hybrid engineering-arts-sciences training offered through AME. This paper states a basic education and training problem in arts and media and presents a program that was formed at Arizona State University to address this problem. The structures, participation, associated sub-areas of this program are also described.
Recently, we introduced a novel ARchitecture for Interactive Arts (ARIA) middleware that processes, filters, and fuses sensory inputs and actuates responses in real-time while providing various Quality of Service (QoS) guarantees. The objective of ARIA is to incorporate realtime, sensed, and archived media and audience responses into live performances, on demand. An ARIA media workflow graph describes how the data sensed through media capture devices will be pro- cessed and what audio-visual responses will be actuated. Thus, each data object streamed between ARIA processing components is subject to transformations, as described by a media workflow graph. The media capture and processing components, such as media filters and fusion operators, are programmable and adaptable; i.e, the delay, size, frequency, and quality/precision characteristics of individual operators can be controlled via a number of parameters. In [1, 4, 5], we developed static and dynamic optimization algorithms which maximize the quality of the actuated responses, minimize the corresponding delay and the resource usage. In this demonstration, we present the ARIA GUI and the underlying kernel. More specifically, we describe how to design a media processing workflow, with adaptive operators, using the ARIA GUI and how to use the various optimization and adaptation alternatives provided by the ARIA kernel to execute media processing workflows.
This paper involves the description and discussion of a proposed dynamic presentation scheme for learning environments. The presentation scheme is an investigation into how high school-level content (i.e. geography) might address and adapt to the comprehension level, existing knowledge base and worldview of individual students, plus be comprehended quickly and clearly. The spatio-temporal arrangement of content that is employed within the scheme is meant to respond to the relative preparedness of students to comprehend the content, as well as to respond to the nature and complexity of the joint comprehension of multimodal information. The form of the scheme may be described as functioning similarly to a “chapter” within a media-rich, electronic “textbook.” This richness is dependant upon the simultaneous presentation of associated textual, visual and auditory information. The relationships explored and addressed within the context models described in this paper include the semantic interrelationships among concepts, linguistic and statistical relationships and common sense rules. Also, the environment offers students some opportunity to process information (i.e. “the causes and effects of population density”) in a thorough, holistic and active manner, through the analysis and synthesis of associated sets of content. The abstract framework of the learning environment consists of a content repository, an intelligent interaction interface and a dynamic presentation engine. This framework is based upon what have been determined to be the necessary components of an experiential learning environment: the efficient representation of content; the opportunity for an optimal selection of information; context-sensitive and dynamic presentation synthesis; and temporal content adaptation. A model mechanism was developed and implemented for the objective evaluation of user experience with—as well as for the comprehension of content presented within—the proposed dynamic presentation scheme. A pilot study evaluation of the model elicited some encouraging results. It seems that a multimodal user context may be a valid approach after which to model optimal dynamic user context for electronic media-based applications. For one, such a context may be applied to the temporal adaptation of information in order to create information summaries.
This paper describes the design and implementation of a generative model for the creation of soundscapes in real time. The model extends the work of the Acoustic Ecology community, and offers several extensions. In particular, the model is adaptive to individual user contexts and integrates audio techniques for creating immersive sonic environments. The authors outline tools and methods for the creation of annotated media databases that document daily experiences, and describe relevant applications that utilize this work.
In this paper, we present our efforts towards creating interfaces for networked media exploration and collaborative annotation. The problem is important since online social networks are emerging as conduits for exchange of everyday experiences. These networks do not currently provide media-rich communication environments. Our approach has two parts—collaborative annotation, and a media exploration framework. The collaborative annotation takes place through a web based interface, and provides to each user personalized recommendations, based on media features, and by using a common sense inference toolkit. We develop three media exploration interfaces that allow for two-way interaction amongst the participants—(a) spatio-temporal evolution, (b) event cones and (c) viewpoint centric interaction. We also analyze the user activity to determine important people and events, for each user. We also develop subtle visual interface cues for activity feedback. Preliminary user studies indicate that the system performs well and is well liked by the users.
We are developing an adaptive and programmable media-flow ARchitecture for Interactive Arts (ARIA) to enable real-time control of audio, video, and lighting on an intelligent stage. The intelligent stage is being equipped with a matrix of floor sensors for object localization, microphone arrays for sound localization, beam forming and motion capture system. ARIA system provides an interface for specifying intended mappings of the sensory inputs to audio-visual responses. Based on the specifications, the sensory inputs are streamed, filtered and fused, actuate a controllable projection system, surround sound and lighting system. The actuated responses take place in real-time and satisfy QoS requirements in live performance. In this paper, we present the ARIA quality-adaptive architecture. We model the basic information unit as a data object with a meta-data header and object payload streamed be- tween nodes in the system and use a directed acyclic network to model media stream processing. We define performance metrics for the output precision, resource consumption, and end-to-end delay. The filters and fusion operators are being implemented by quality aware signal processing algorithms. The proper node behavior is chosen at runtime to achieve the QoS requirements and adapt to input object properties. For this purpose, ARIA utilizes a two-phase approach: static pre-optimization and dynamic run-time adaptation.
This is a position paper that frames a networked home as a situated, user-centric multimedia system. The problem is important for two reasons—(a) the emergence of high speed networked connections alter media consumption and interaction practices and (b) ordinary consumers currently communicate everyday experiences through limited means (e.g. e-mail attachments). We need new mechanisms for networked creation and consumption of media, as well as new interaction paradigms that will allow us to utilize the full potential of the networked, multimedia environment. We envision an augmented user-context adaptive home that enables the user to rest, reflect, interact and communicate everyday experiences through multimedia. A key insight is that the practice of consumption, communication and interaction with media, across different devices and interaction modalities, affect the user context, and in turn is affected by it. The result is a highly personalized media practice for each user. We discuss three focal areas of our current research—(a) models for user context, (b) communication of meaning and (c) situated interaction. Modeling user context is challenging, and we present a novel multimodal context framework. In media communication, we examine research issues in media acquisition, media presentation and networked sharing. Situated multimedia frameworks are physically grounded systems, that require new analytical models, interaction paradigms, and additionally require new real-time concerns. Our framework is promising, and we believe will lead to rich collection of multimedia problems that incorporate networked interaction.
This paper deals with phrase structure detection in contemporary western dance. Phrases are a sequence of movements that exist at a higher semantic abstraction than gestures. The problem is important since phrasal structure in dance, plays a key role in communicating meaning. We detect two fundamental dance structures— ABA and the Rondo, as they form the basis for more complex movement sequences. There are two key ideas in our work—(a) the use of a topological framework for deterministic structure detection and (b) novel phrasal distance metrics. The topological graph formulation succinctly captures the domain knowledge about the structure. We show how an objective function can be constructed given the topology. The minimization of this function yields the phrasal structure and phrase boundaries. The distance incorporates both movement and hierarchical body structure. The results are excellent with low median error of 7% (ABA) and 15% (Rondo).
In this paper, we present our approach to the problem of communicating everyday experiences. This is a challenging problem, since media from everyday events are unstructured, and often poorly annotated. We first attempt to communicate everyday experiences using a dramatic framework, by categorizing media and by introducing causal relations. Based on our experience of the dramatic framework for the everyday media, we introduce an event based framework as well as a viewpoint centric visualization that allows the viewer to have agency, in a highly interactive, non-linear manner. Our approach focuses on structured interaction for consumption of everyday experiences, in contrast to non-interactive consumption of structured communication. Our results indicate that dramatic structures do not work well with everyday media, and novel interactions / visualizations are needed. Experimental results indicate that the viewpoint centric visualization works well. We are in the process of creating a large event database of everyday events, and we are creating the necessary recording and annotation tools.
This paper describes a novel, interactive multimodal framework that enables a network of friends to effectively visualize and browse a shared image collection. The framework is very useful for geographically disconnected friends to share experiences. Our solution involves three components—(a) an event model, (b) three new spatio-temporal event exploration schemes, and (c) a novel technique for summarizing the user interaction. We develop a simple multimedia event model, that additionally incorporates the idea of user viewpoints. We also develop new dissimilarity measures between events, that additionally incorporate user context. We develop three, task driven, event exploration environments—(a) spatio-temporal evolution, (b) event cones and (c) viewpoint centric interaction. An original contribution of this paper is to summarize the user-interaction using an interactive framework. We conjecture that an interactive summary serves to recall the original content better, than a static image-based summary. Our user studies indicate that the exploratory environment performs very well.
In this book-chapter, we discuss research issues and promising techniques related to three important aspects of audio-visual content analysis—(a) segmentation, (b) event analysis and (c) summarization. Each component plays an important role in the greater semantic understanding of the audio-visual data.
In this paper, we develop formal computational models for three aspects of experiential systems for browsing media—(a) context (b) interactivity through hyper-mediation and (c) context evolution using a memory model. Experiential systems deal with the problem of developing context adaptive mechanisms for knowledge acquisition and insight. Context is modeled as a union of graphs whose nodes represent concepts and where the edges represent the semantic relationships. The system context is the union of the contexts of the user, the environment and the media being accessed. We also develop a novel concept dissimilarity. We then develop algorithms to determine the optimal hyperlink for each media element by determining the relationship between the user context and the media. As the user navigates through the hyper-linked sources, the memory model captures the interaction of the user with the hyper-linked sources and updates the user context. Finally, this results in new hyper-links for the media. Our pilot user studies show excellent results, validating our framework.
In this paper, we present a novel annotation paradigm with an emphasis on two facets—(a) the end user experience and (b) semantic propagation. The annotation problem is important since media semantics play a key role in new multimedia applications. However, there is currently very little incentive for end users to annotate. The annotation system, is interactive and experiential. We attempt to propagate semantics of the annotations, by using WordNet, a lexicographic arrangement of words, and low-level features extracted from the images. We introduce novel semantic dissimilarity measures, and propagation frameworks. The system provides insight to the user, by providing her with knowledge sources, that are constrained by the user and media context. The knowledge sources are presented using context-aware hyper-mediation. Our experimental results indicates that the systems performs well. We tested the new annotation experience using a pilot user study, the users agreed that the new framework was more useful that a traditional annotation interface. The semantic propagation results are good as well—we converge close to the semantics of the image by annotating a small number (~15%) of database images.
In this paper, we present a novel algorithm for generating audio-visual skims from computable scenes. Skims are useful for browsing digital libraries, and for on-demand summaries in set- top boxes. A computable scene is a chunk of data that exhibits consistencies with respect to chromaticity, lighting and sound. There are three key aspects to our approach: (a) visual complexity and grammar, (b) robust audio segmentation and (c) an utility model for skim generation. We define a measure of visual complexity of a shot, and map complexity to the minimum time for comprehending the shot. Then, we analyze the underlying visual grammar, since it makes the shot sequence meaningful. We segment the audio data into four classes, and then detect significant phrases in the speech segments. The utility functions are defined in terms of complexity and duration of the segment. The target skim is created using a general constrained utility maximization procedure that maximizes the information content and the coherence of the resulting skim. The objective function is constrained due to multimedia synchronization constraints, visual syntax and by penalty functions on audio and video segments. The user study results indicate that the optimal skims show statistically significant differences with other skims with compression rates up to 90%.
This paper presents a new conceptual framework for summarization that considers the relationship between entities, device properties and user information needs. We summarize using a skim—an audio-visual clip that is a drastically condensed version of the original video. An entity is defined to be a sequence of elements that are related to each other by a certain property. In this paper we discuss the causes, the different entity types and also present a skim taxonomy. Each entity is associated with a utility. The skim is generated by a constrained utility maximization over those entity-utilities that satisfy the user information needs as well as the device rendering capabilities. We construct an optimal skim within this framework that retains a particular subset of entities. These entities have been chosen since they can be automatically computed in a robust manner. The user studies show that the optimal skims perform well in a statistically significant sense, at compression rates as high as 90%.
In this paper, we present a computational scene model and also derive novel algorithms for computing audio and visual scenes and within-scene structures in films. We use constraints de- rived from film-making rules and from experimental results in the psychology of audition, in our computational scene model. Central to the computational model is the notion of a causal, finite-memory viewer model. We segment the audio and video data separately. In each case, we determine the degree of correlation of the most re- cent data in the memory with the past. The audio and video scene boundaries are determined using local maxima and minima, respectively. We derive four types of computable scenes that arise due to different kinds of audio and video scene boundary synchronizations. We show how to exploit the local topology of an image sequence in conjunction with statistical tests, to determine dialogs. We also derive a simple algorithm to detect silences in audio. An important feature of our work is to introduce semantic constraints based on structure and silence in our computational model. This results in computable scenes that are more consistent with human observations. The algorithms were tested on a difficult data set: three commercial films. We take the first hour of data from each of the three films. The best results: computational scene detection: 94%; dialogue detection: 91%; and recall 100% precision.
In this paper, we present a novel algorithm to condense computable scenes. A computable scene is a chunk of data that exhibits consistencies with respect to chromaticity, lighting and sound. We attempt to condense such scenes in two ways. First, we define visual complexity of a shot to be its Kolmogorov complexity. Then, we conduct experiments that help us map the complexity of a shot into the minimum time required for its comprehension. Second, we analyze the grammar of the film language, since it makes the shot sequence meaningful. These grammatical rules are used to condense scenes, in parallel to the shot level condensation. We've implemented a system that generates a skim given a time budget. Our user studies show good results on skims with compression rates between 60~80%.
In this paper, we present a novel algorithm to generate visual skims, that do not contain audio, from computable scenes. Visual skims are useful for browsing digital libraries, and for on-demand summaries in set-top boxes. A computable scene is a chunk of data that exhibits consistencies with respect to chromaticity, lighting and sound. First, we define visual complexity of a shot to be its Kolmogorov complexity. Then, we conduct experiments that help us map the complexity of a shot into the minimum time required for its comprehension. Second, we analyze the grammar of the film language, since it makes the shot sequence meaningful. We achieve a target skim time by minimizing a sequence utility function. It is subject to shot duration constraints, and penalty functions based on sequence rhythm, and information loss. This helps us determine individual shot durations as well as the shots to drop. Our user studies show good results on skims with compression rates up to 80%.
In this paper we present an algorithm for audio scene segmentation. An audio scene is a semantically consistent sound segment that is characterized by a few dominant sources of sound. A scene change occurs when a majority of the sources present in the data change. Our segmentation framework has three parts: (a) A definition of an audio scene (b) multiple feature models that characterize the dominant sources and (c) a simple, causal listener model, which mimics human audition using multiple time-scales. We define a correlation function that determines correlation with past data to determine segmentation boundaries. The algorithm was tested on a difficult data set, a 1 hour audio segment of a film, with impressive results. It achieves an audio scene change detection accuracy of 97%.
In this paper we present a novel algorithm for video scene segmentation. We model a scene as a semantically consistent chunk of audio-visual data. Central to the segmentation framework is the idea of a finite-memory model. We separately segment the audio and video data into scenes, using data in the memory. The audio segmentation algorithm determines the correlations amongst the envelopes of audio features. The video segmentation algorithm determines the correlations amongst shot key-frames. The scene boundaries in both cases are determined using local correlation minima. Then, we fuse the resulting segments using a nearest neighbor algorithm that is further refined using a time-alignment distribution derived from the ground truth. The algorithm was tested on a difficult data set; the first hour of a commercial film with good results. It achieves a scene segmentation accuracy of 84%.
In this paper we discuss our recent research and open issues in structural and semantic analysis of digital videos. Specifically, we focus on segmentation, summarization and classification of digital video. In each area, we also emphasize the importance of understanding domain-specific characteristics. In scene segmentation, we introduce the idea of a computable scene as a chunk of audio-visual data that exhibits long-term consistency with regard to several audio-visual properties. In summarization, we discuss shot and program level summaries. We describe classification schemes based on Bayesian networks, which model interaction of multiple classes at different levels using multi-media. We also discuss classification techniques that exploit domain-specific spatial structural constraints as well as temporal transitional models.
In this paper we present novel algorithms for computing scenes and within-scene structures in films. We begin by mapping insights from film-making rules and experimental results from the psychology of audition into a computational scene model. We define a computable scene to be a chunk of audio-visual data that exhibits long-term consistency with regard to three properties: (a) chromaticity (b) lighting (c) ambient sound. Central to the computational model is the notion of a causal, finite-memory model. We segment the audio and video data separately. In each case we determine the degree of correlation of the most recent data in the memory with the past. The respective scene boundaries are determined using local minima and aligned using a nearest neighbor algorithm. We introduce the idea of a discrete object series to automatically determine the structure within a scene. We then use statistical tests on the series to determine the presence of dialogue. The algorithms were tested on a difficult data set: five commercial films. We take the first hour of data from each of the five films. The best results: scene detection: 88% recall and 72% precision, dialogue detection: 91% recall and 100% precision.
This paper presents algorithms to deal with problems associated with indexing high-dimensional feature vectors, which characterize video data. Indexing high-dimensional vectors is well known to be computationally expensive. Our solution is to optimally split the high dimensional vector into a few low dimensional feature vectors and querying the system for each feature vector. This involves solving an important subproblem: developing a model of retrieval which enables us to query the system efficiently. Once we formulate the retrieval problem in terms of a retrieval model, we present an optimality criterion to maximize the number of results using this model. The criterion is based on a novel idea of using the underlying probability distribution of the feature vectors. A branch-and-prune strategy optimized per each query, is developed. This uses the set of features derived from the optimality criterion. Our results show that the algorithm performs well, giving a speedup of a factor of 25 with respect to a linear search, while retaining the same level of recall.
The rapid growth of visual data over the last few years has lead to many schemes for retrieving such data. With content-based systems today, there exists a significant gap between the user's information needs and what the systems can deliver. We propose to bridge this gap, by introducing the novel idea of Semantic Visual Templates (SVT). Each template represents a personalized view of concepts (e.g. slalom, meetings, sunsets etc.), The SVT is represented using a set of successful queries, which are generated by a two-way interaction between the user and the system. We have developed algorithms that interact with the user and converge upon a small set of exemplar queries that maximize recall. SVT's emphasize intuitive models that allow for easy manipulation and queries to be composited. The resulting system performs well, for example with small number of queries in the “sunset” template, we are able to achieve 50% recall and 24% precision over a large unannotated database.
The rapidity with which digital information, particularly video, is being generated, has necessitated the development of tools for efficient search of these media. Content based visual queries have been primarily focused on still image retrieval. In this paper, we propose a novel, interactive sys- tem on the Web, based on the visual paradigm, with spatio- temporal attributes playing a key role in video retrieval. The resulting system VideoQ, is the first on-line video search engine supporting automatic object based indexing and spatio- temporal queries.
The rapidity with which digital information, particularly video, is being generated has necessitated the development of tools for efficient search of these media. Content-based visual queries have been primarily focused on still image retrieval. In this paper, we propose a novel, interactive system on the Web, based on the visual paradigm, with spatiotemporal attributes playing a key role in video retrieval. We have developed innovative algorithms for automated video object segmentation and tracking, and use real-time video editing techniques while responding to user queries. The resulting system, called VideoQ (demo available at http://www.ctr.columbia.edu/VideoQ/), is the first on-line video search engine supporting automatic object- based indexing and spatiotemporal queries. The system performs well, with the user being able to retrieve complex video clips such as those of skiers and baseball players with ease.
The scale transform is a new representation for signals, offering perspective that is different from the Fourier transform. In this correspondence, we introduce the notion of a scale periodic function. These functions are then represented through the discrete scale series. We also define the notion of a strictly scale-limited signal. Analogous to the Shannon interpolation formula, we show that such signals can be exactly reconstructed from exponentially spaced samples of the signal in the time domain. As an interesting, practical application, we show how properties unique to the scale transform make it very useful in computing depth maps of a scene.
The rapidity with which digital information, particularly video, is being generated, has necessitated the development of tools for efficient search of these media. Content based visual queries have been primarily focussed on still image retrieval. In this paper, we propose a novel, real-time, interactive system on the Web, based on the visual paradigm, with spatio-temporal attributes playing a key role in video retrieval. We have developed algorithms for automated video object segmentation and tracking and use real-time video editing techniques while responding to user queries. The resulting system performs well, with the user being able to retrieve complex video clips such as those of skiers, baseball players, with ease.
It is widely accepted that textureless surfaces cannot be recovered using passive sensing techniques. The problem is approached by viewing image formation as a fully three- dimensional mapping. It is shown that the lens encodes structural information of the scene within a compact three- dimensional space behind it. After analyzing the information content of this space and by using its properties we derive necessary and sufficient conditions for the recovery of textureless scenes. Based on these conditions, a simple procedure for recovering textureless scenes is described. We experimentally demonstrate the recovery of three textureless surfaces, namely, a line, a plane, and a paraboloid. Since textureless surfaces represent the worst case recovery scenario, all the results and the recovery procedure are naturally applicable to scenes with texture.