Data Scientist, Portfolio Optimization
Formation Bio is a tech and AI driven pharma company focused on accelerating drug development. As a Data Scientist on the platform prediction team, you will work on translating probability of success predictions into portfolio-level outcomes and architecting core systems for portfolio management and risk monitoring.
Responsibilities
- Work with the team to implement and maintain core portfolio engine: order management system, execution simulation layer, portfolio construction service, and performance tracking
- Design risk frameworks that quantify exposure across a portfolio of drug development bets with radically different risk profiles, timelines, and failure modes
- Run rigorous backtesting experiments with strict temporal constraints to evaluate Formation strategies against baseline approaches and measure marginal signal from new evidence sources
- Coordinate across the organization to integrate internal Formation data sources (clinical trial data, genomic evidence, real-world data) and proprietary tooling into portfolio analytics pipelines
- Work with product and engineering teams to build dashboards and reporting that communicate portfolio performance, risk metrics, and strategy comparisons to both technical and executive stakeholders
- Collaborate with the broader data science team to ensure portfolio-level evaluation feeds back into model improvement and evidence prioritization
Skills
- MS or PhD in a quantitative field (statistics, finance, physics, computational science, engineering, or related)
- 1-3 years in a quantitative research, data science, or analytics role — finance, healthcare, academic research, or consulting all count; substantive internships qualify
- Strong Python programming skills with experience in data-intensive workflows (pandas, numpy, scipy)
- Solid grasp of core portfolio construction and risk concepts: position sizing, rebalancing, Sharpe ratio, drawdown, volatility, benchmark comparison
- Demonstrated ability to work with messy, real-world datasets — comfortable with data wrangling, deduplication, and quality assessment
- Clear communicator who can present quantitative results to both technical peers and business stakeholders
- Experience with backtesting frameworks or portfolio simulation (vectorbt, Backtrader, or custom implementations)
- Exposure to healthcare, pharma, or biotech data (clinical trials, claims data, -omics, real-world evidence)
- Familiarity with alternative data in a research or investment context
- Experience with probability-of-success modeling, drug development decision analysis, or health economics
- Comfort with LLMs or AI/ML pipelines in a production or research setting
- Familiarity with dashboard/visualization tools (Streamlit, Plotly, Dash) and pipeline orchestration (Dagster, Airflow)
- Healthcare OR finance domain knowledge is valued; both are not required
Benefits
- Equity
- Comprehensive benefits
- Generous perks
- Hybrid model requiring 3 days per week in office
Company Overview