IBM Research Internship Roles
IBM Research Internship Roles
A. Future of computing:
Our team is inventing, prototyping and evaluating novel solutions for future computing systems across hardware, system software and middleware. We have two focus areas. The first is composable systems where custom nodes can be created from pools of disaggregated resources and the second is systems for accelerated discovery.
Example projects interns can contribute to include: design and evaluation of Kubernetes control-plane extensions for disaggregated systems; orchestration and composition of large-scale, complex workflows; and system self-adaptation based on observed workload characteristics. We offer a stimulating research environment comprising cross-disciplinary expertise in a world-class research organization, access to cutting-edge experimental and product staging infrastructures and strong ties with industrial research labs and academia around the globe.
B. NLP for Information Extraction and Explainable Argumentation Mining
Nowadays, Natural Language Processing (NLP) is one of the key aspects in AI Research.
At the Dublin Research Lab, we exploit NLP in several projects and we are interested in exploring novel and competitive solutions to NLP tasks. From research to product, we are interested in scientific contribution to the field, as well as real world innovative solutions.
Our researchers contribute and participate to the main NLP conferences like ACL, NACL, EMNLP, COLING, as well as related conferences like AAAI, etc, on topics that vary from Information Extraction, Argumentation Mining, Sentiment Analysis, Information Retrieval, Knowledge Graph and many others.
Different and varied internships opportunities are open in the area:
1. Transfer learning for extracting complex rules from text:
- Statutes and regulations are primary sources of legal rules. However sentences in these documents tend to be complex and difficult to understand even for legal professionals. In this intern project, we are trying to build discourse parsing models to extract and interprete rules in regulatory text. The end goal of the internship is a research prototype and submissions in top NLP venues (ACL, TACL, EMNLP, NAACL, COLING).
2. Leaderboard construction beyond accuracy
- Previous work (Hou et. al., 2019) has tried to extract task/dataset/metric entities from NLP literature and build leaderboards for various tasks automatically. In this project, we are trying to go one step further to extract more information about computational resources (e.g., model size, training time) associated to each entry in a leaderboard. The goal is to provide more dimensions to evaluate different models. The end goal of the internship is a research prototype and submissions in top NLP venues (ACL, TACL, EMNLP, NAACL, COLING).
3. Knowledge graph construction from biomedical literature:
- Biomedical literature holds our knowledge on medicine and healthcare. A corpus of articles like Pubmed is very valuable to try to process and represent as a knowledge graph to answer queries we have on the field. In this internship, we investigate methods to index PubMed articles and concepts, and build a knowledge graph directly able to answer real-world queries.
4. Health and social care: Information extraction from PubMed articles
- Health and Social Care (HSC) is a complex domain where human skills, technology and legislation meet and have a profound impact in people's lives and wellbeing. HSC is also a very significant spending area for governments throughout the world. In IBM Research Europe we are aiming at supporting governments and HSC agencies in effectively delivering HSC services at population-scale and rapidly responding to evolving situations and policy changes on social and health benefit domain. In this context we are working at developing a system that enables the extraction of health and social information from randomised controlled trials in order to enhance the governments and agencies abilities to match the population needs and overcome several problematic situations such as food poverty, housing issues, etc. Specifically we aim at extracting population characteristics and interventions features from published studies and create a knowledge base to be used as starting point for subsequent analyses.
- The aim of the internship is to design and implement strategies for information extraction from PubMed articles having in mind both the scientific advancement and the business needs of IBM related products. The results of the internship will mainly target to AI/NLP conferences (e.g., ACL, EMNLP, etc.), but could extend to health informatics conferences ( such as AMIA, MIE).
- Required Skills:
- Research expertise or experience in NLP, Machine Learning.
- Strong programming skills in Python/Java
- Good communication skills
C. Risk analysis and Probabilistic Graphical Models in the domain of healthcare
Probabilistic graphical models allow to intuitively explore a dataset representing a population, through various characteristics, e.g., demographics, socio-economic features and health status. Understanding the conditional probabilities existing between social determinants of health, behaviours and chronic conditions can be of great help in tailoring healthcare programs or allocating resources, and identifying the segments of population that would benefit most of social and health-related programs. It can also help identifying the best approaches to use, e.g. behavioural change, self-management of multimorbidity, informal care advice, etc.
1. Accuracy Evaluation of a Population Healthcare Bayesian Network model
Literature review, formalisation and implementation of an evaluation pipeline for existing healthcare BN models. E.g. cross-validation using metrics such as area under the ROC curve.
2. Model building based on a health-related dataset
Internship in the framework of the SEURO EU project http://seuro2020.eu/ on evaluating effectiveness of digital health on healthcare system.
3. Dynamically create and update models over patient live data
Using Bayesian Networks to analyse live data about patients presents new challenges in terms of format (punctual rather than tabular) and volatility (changing over time). Using Synthea population simulator, we want to explore Bayesian Network around transitional patient data.
- Strong programming skills in Python (pandas, scikit-learn, numpy) handling tabular data (SQL a plus)
- Knowledge of cloud deployments (Kubernetes, OpenShift, Travis) and UI development (React, TypeScript) a plus
- Good communication skills