Skip to main content
Home

Navigation Menu

  • Back
  • About
    • Back
    • About

      Contact Us

      Business Address
      5241 Broad Branch Rd. NW

      Washington , DC 20015
      United States place Map
      Call Us (202) 387-640
    • Who We Are
      • Back
      • Leadership
      • Our Blueprint For Discovery
      • Board & Advisory Committee
      • Financial Stewardship
      • Awards & Accolades
      • History
    • Connect with Us
      • Back
      • Outreach & Education
      • Newsletter
      • Yearbook
    • Working at Carnegie
      • Back
      • Applications Open: Postdoctoral Fellowships

    Contact Us

    Business Address
    5241 Broad Branch Rd. NW

    Washington , DC 20015
    United States place Map
    Call Us (202) 387-6400
  • Research
    • Back
    • Research Areas & Topics
    • Research Areas & Topics
      • Back
      • Research Areas
      • From genomes to ecosystems and from planets to the cosmos, Carnegie Science is an incubator for cutting-edge, interdisciplinary research.
      • Astronomy & Astrophysics
        • Back
        • Astronomy & Astrophysics
        • Astrophysical Theory
        • Cosmology
        • Distant Galaxies
        • Milky Way & Stellar Evolution
        • Planet Formation & Evolution
        • Solar System & Exoplanets
        • Telescope Instrumentation
        • Transient & Compact Objects
      • Earth Science
        • Back
        • Earth Science
        • Experimental Petrology
        • Geochemistry
        • Geophysics & Geodynamics
        • Mineralogy & Mineral Physics
      • Ecology
        • Back
        • Ecology
        • Atmospheric Science & Energy
        • Adaptation to Climate Change
        • Water Quality & Scarcity
      • Genetics & Developmental Biology
        • Back
        • Genetics & Developmental Biology
        • Adaptation to Climate Change
        • Developmental Biology & Human Health
        • Genomics
        • Model Organism Development
        • Nested Ecosystems
        • Symbiosis
      • Matter at Extreme States
        • Back
        • Matter at Extreme States
        • Extreme Environments
        • Extreme Materials
        • Mineralogy & Mineral Physics
      • Planetary Science
        • Back
        • Planetary Science
        • Astrobiology
        • Cosmochemistry
        • Mineralogy & Mineral Physics
        • Planet Formation & Evolution
        • Solar System & Exoplanets
      • Plant Science
        • Back
        • Plant Science
        • Adaptation to Climate Change
        • Nested Ecosystems
        • Photosynthesis
        • Symbiosis
    • Divisions
      • Back
      • Divisions
      • Biosphere Sciences & Engineering
        • Back
        • Biosphere Sciences & Engineering
        • About

          Contact Us

          Business Address
          5241 Broad Branch Rd. NW

          Washington , DC 20015
          United States place Map
          Call Us (202) 387-640
        • Research
        • Culture
      • Earth & Planets Laboratory
        • Back
        • Earth & Planets Laboratory
        • About

          Contact Us

          Business Address
          5241 Broad Branch Rd. NW

          Washington , DC 20015
          United States place Map
          Call Us (202) 387-640
        • Research
        • Culture
        • Campus
      • Observatories
        • Back
        • Observatories
        • About

          Contact Us

          Business Address
          5241 Broad Branch Rd. NW

          Washington , DC 20015
          United States place Map
          Call Us (202) 387-640
        • Research
        • Culture
        • Campus
    • Instrumentation
      • Back
      • Instrumentation
      • Our Telescopes
        • Back
        • Our Telescopes
        • Magellan Telescopes
        • Swope Telescope
        • du Pont Telescope
      • Observatories Machine Shop
      • EPL Research Facilities
      • EPL Machine Shop
      • Mass Spectrometry Facility
      • Advanced Imaging Facility
  • People
    • Back
    • People
      Observatory Staff

      Featured Staff Member

      Staff Member

      Staff Member

      Professional Title

      Learn More
      Observatory Staff

      Search For

    • Search All People
      • Back
      • Staff Scientists
      • Leadership
      • Biosphere Science & Engineering People
      • Earth & Planets Laboratory People
      • Observatories People
    Observatory Staff
    Dr. Johanna Teske
    Staff Scientist

    Featured Staff Member

    Johanna Test Portrait

    Dr. Johanna Teske

    Staff Scientist

    Learn More
    Observatory Staff
    Dr. Johanna Teske
    Staff Scientist

    Johanna Teske's research focuses on quantifying the diversity of exoplanet compositions and understanding the origin of that diversity.

    Search For

    Search All Staff
  • Events
    • Back
    • Events
    • Search All Events
      • Back
      • Public Events
      • Biosphere Science & Engineering Events
      • Earth & Planets Laboratory Events
      • Observatories Events

    Upcoming Events

    Events

    Events

    Lava exoplanet
    Seminar

    Katelyn Horstman (Caltech)

    Searching for exo-satellites and brown dwarf binaries using the Keck Planet Imager and Characterizer (KPIC)

    January 30

    12:15pm PST

    Colloquium

    Dr. Ken Shen (UC Berkeley)

    A paradigm shift in the landscape of Type Ia supernova progenitors

    February 3

    11:00am PST

    Fire image
    Seminar

    The carbon balance of fiery ecosystems: unpacking the role of soils, disturbances and climate solutions

    Adam Pellegrini

    February 4

    11:00am PST

  • News
    • Back
    • News
    • Search All News
      • Back
      • Biosphere Science & Engineering News
      • Earth & Planets Laboratory News
      • Observatories News
      • Carnegie Science News
    News

    Recent News

    News

    Latest

    • - Any -
    • Biosphere Sciences & Engineering
    • Carnegie Administration
    • Earth & Planets Laboratory
    • Observatories
    expand_more
    Read all News
    Pulsing xenia with clownfish
    Breaking News
    January 29, 2026

    Carnegie Science Celebrates Second Annual Carnegie Science Day

    An illustration of cataloging exoplanet diversity courtesy of NASA
    Breaking News
    January 28, 2026

    A cornucopia of distant worlds

    Dark background with an illuminated coral
    Breaking News
    January 27, 2026

    It’s the microbe’s world; we’re just living in it

  • Resources
    • Back
    • Resources
    • Search All
      • Back
      • Employee Resources
      • Scientific Resources
      • Postdoc Resources
      • Media Resources
      • Archival Resources
    • Quick Links
      • Back
      • Employee Intranet
      • Dayforce
      • Careers
      • Observing at LCO
      • Locations and Addresses
  • Donate
    • Back
    • Donate
      - ,

    • Make a Donation
      • Back
      • Support Scientific Research
      • The Impact of Your Gift
      • Carnegie Champions
      • Planned Giving
    Jo Ann Eder

    I feel passionately about the power of nonprofits to bolster healthy communities.

    - Jo Ann Eder , Astronomer and Alumna

    Header Text

    Postdoctoral alumna Jo Ann Eder is committed to making the world a better place by supporting organizations, like Carnegie, that create and foster STEM learning opportunities for all. 

    Learn more arrow_forward
  • Home

Minjie Hu 2021 headshot

Minjie Hu

Postdoctoral Associate

Abstract
Gene function curation via Gene Ontology (GO) annotation is a common task among Model Organism Database groups. Owing to its manual nature, this task is considered one of the bottlenecks in literature curation. There have been many previous attempts at automatic identification of GO terms and supporting information from full text. However, few systems have delivered an accuracy that is comparable with humans. One recognized challenge in developing such systems is the lack of marked sentence-level evidence text that provides the basis for making GO annotations. We aim to create a corpus that includes the GO evidence text along with the three core elements of GO annotations: (i) a gene or gene product, (ii) a GO term and (iii) a GO evidence code. To ensure our results are consistent with real-life GO data, we recruited eight professional GO curators and asked them to follow their routine GO annotation protocols. Our annotators marked up more than 5000 text passages in 200 articles for 1356 distinct GO terms. For evidence sentence selection, the inter-annotator agreement (IAA) results are 9.3% (strict) and 42.7% (relaxed) in F-1-measures. For GO term selection, the IAAs are 47% (strict) and 62.9% (hierarchical). Our corpus analysis further shows that abstracts contain similar to 10% of relevant evidence sentences and 30% distinct GO terms, while the Results/Experiment section has nearly 60% relevant sentences and >70% GO terms. Further, of those evidence sentences found in abstracts, less than one-third contain enough experimental detail to fulfill the three core criteria of a GO annotation. This result demonstrates the need of using full-text articles for text mining GO annotations. Through its use at the BioCreative IV GO (BC4GO) task, we expect our corpus to become a valuable resource for the BioNLP research community.
View Full Publication open_in_new
Abstract
Gene function curation via Gene Ontology (GO) annotation is a common task among Model Organism Database groups. Owing to its manual nature, this task is considered one of the bottlenecks in literature curation. There have been many previous attempts at automatic identification of GO terms and supporting information from full text. However, few systems have delivered an accuracy that is comparable with humans. One recognized challenge in developing such systems is the lack of marked sentence-level evidence text that provides the basis for making GO annotations. We aim to create a corpus that includes the GO evidence text along with the three core elements of GO annotations: (i) a gene or gene product, (ii) a GO term and (iii) a GO evidence code. To ensure our results are consistent with real-life GO data, we recruited eight professional GO curators and asked them to follow their routine GO annotation protocols. Our annotators marked up more than 5000 text passages in 200 articles for 1356 distinct GO terms. For evidence sentence selection, the inter-annotator agreement (IAA) results are 9.3% (strict) and 42.7% (relaxed) in F-1-measures. For GO term selection, the IAAs are 47% (strict) and 62.9% (hierarchical). Our corpus analysis further shows that abstracts contain similar to 10% of relevant evidence sentences and 30% distinct GO terms, while the Results/Experiment section has nearly 60% relevant sentences and >70% GO terms. Further, of those evidence sentences found in abstracts, less than one-third contain enough experimental detail to fulfill the three core criteria of a GO annotation. This result demonstrates the need of using full-text articles for text mining GO annotations. Through its use at the BioCreative IV GO (BC4GO) task, we expect our corpus to become a valuable resource for the BioNLP research community.
View Full Publication open_in_new
Abstract
Gene Ontology (GO) annotation is a common task among model organism databases (MODs) for capturing gene function data from journal articles. It is a time-consuming and labor-intensive task, and is thus often considered as one of the bottlenecks in literature curation. There is a growing need for semiautomated or fully automated GO curation techniques that will help database curators to rapidly and accurately identify gene function information in full-length articles. Despite multiple attempts in the past, few studies have proven to be useful with regard to assisting real-world GO curation. The shortage of sentence-level training data and opportunities for interaction between text-mining developers and GO curators has limited the advances in algorithm development and corresponding use in practical circumstances. To this end, we organized a text-mining challenge task for literature-based GO annotation in BioCreative IV. More specifically, we developed two subtasks: (i) to automatically locate text passages that contain GO-relevant information (a text retrieval task) and (ii) to automatically identify relevant GO terms for the genes in a given article (a concept-recognition task). With the support from five MODs, we provided teams with >4000 unique text passages that served as the basis for each GO annotation in our task data. Such evidence text information has long been recognized as critical for text-mining algorithm development but was never made available because of the high cost of curation. In total, seven teams participated in the challenge task. From the team results, we conclude that the state of the art in automatically mining GO terms from literature has improved over the past decade while much progress is still needed for computer-assisted GO curation. Future work should focus on addressing remaining technical challenges for improved performance of automatic GO concept recognition and incorporating practical benefits of text-mining tools into real-world GO annotation.
View Full Publication open_in_new
Abstract
Gene Ontology (GO) annotation is a common task among model organism databases (MODs) for capturing gene function data from journal articles. It is a time-consuming and labor-intensive task, and is thus often considered as one of the bottlenecks in literature curation. There is a growing need for semiautomated or fully automated GO curation techniques that will help database curators to rapidly and accurately identify gene function information in full-length articles. Despite multiple attempts in the past, few studies have proven to be useful with regard to assisting real-world GO curation. The shortage of sentence-level training data and opportunities for interaction between text-mining developers and GO curators has limited the advances in algorithm development and corresponding use in practical circumstances. To this end, we organized a text-mining challenge task for literature-based GO annotation in BioCreative IV. More specifically, we developed two subtasks: (i) to automatically locate text passages that contain GO-relevant information (a text retrieval task) and (ii) to automatically identify relevant GO terms for the genes in a given article (a concept-recognition task). With the support from five MODs, we provided teams with >4000 unique text passages that served as the basis for each GO annotation in our task data. Such evidence text information has long been recognized as critical for text-mining algorithm development but was never made available because of the high cost of curation. In total, seven teams participated in the challenge task. From the team results, we conclude that the state of the art in automatically mining GO terms from literature has improved over the past decade while much progress is still needed for computer-assisted GO curation. Future work should focus on addressing remaining technical challenges for improved performance of automatic GO concept recognition and incorporating practical benefits of text-mining tools into real-world GO annotation.
View Full Publication open_in_new
Abstract
We measure how the properties of star-forming central galaxies correlate with large-scale environment, delta, measured on 10 h(-1) Mpc scales. We use galaxy group catalogues to isolate a robust sample of central galaxies with high purity and completeness. The galaxy properties we investigate are star formation rate (SFR), exponential disc scale length R-exp, and Sersic index of the galaxy light profile, n(S). We find that, at all stellar masses, there is an inverse correlation between SFR and delta, meaning that above-average star-forming centrals live in underdense regions. For n(S) and R-exp, there is no correlation with delta at M-* less than or similar to 10(10.5) M-circle dot, but at higher masses there are positive correlations; a weak correlation with R-exp and a strong correlation with n(S). These data are evidence of assembly bias within the star-forming population. The results for SFR are consistent with a model in which SFR correlates with present-day halo accretion rate, (M) over dot(h). In this model, galaxies are assigned to haloes using the abundance-matching ansatz, which maps galaxy stellar mass onto halo mass. At fixed halo mass, SFR is then assigned to galaxies using the same approach, but. (M) over dot(h) is used to map onto SFR. The best-fitting model requires some scatter in the (M) over dot(h)-SFR relation. The R-exp and n(S) measurements are consistent with a model in which both of these quantities are correlated with the spin parameter of the halo, lambda. Halo spin does not correlate with delta at low halo masses, but for higher mass haloes, high-spin haloes live in higher density environments at fixed M-h. Put together with the earlier instalments of this series, these data demonstrate that quenching processes have limited correlation with halo formation history, but the growth of active galaxies, as well as other detailed galaxies properties, are influenced by the details of halo assembly.
View Full Publication open_in_new
Abstract
We measure how the properties of star-forming central galaxies correlate with large-scale environment, delta, measured on 10 h(-1) Mpc scales. We use galaxy group catalogues to isolate a robust sample of central galaxies with high purity and completeness. The galaxy properties we investigate are star formation rate (SFR), exponential disc scale length R-exp, and Sersic index of the galaxy light profile, n(S). We find that, at all stellar masses, there is an inverse correlation between SFR and delta, meaning that above-average star-forming centrals live in underdense regions. For n(S) and R-exp, there is no correlation with delta at M-* less than or similar to 10(10.5) M-circle dot, but at higher masses there are positive correlations; a weak correlation with R-exp and a strong correlation with n(S). These data are evidence of assembly bias within the star-forming population. The results for SFR are consistent with a model in which SFR correlates with present-day halo accretion rate, (M) over dot(h). In this model, galaxies are assigned to haloes using the abundance-matching ansatz, which maps galaxy stellar mass onto halo mass. At fixed halo mass, SFR is then assigned to galaxies using the same approach, but. (M) over dot(h) is used to map onto SFR. The best-fitting model requires some scatter in the (M) over dot(h)-SFR relation. The R-exp and n(S) measurements are consistent with a model in which both of these quantities are correlated with the spin parameter of the halo, lambda. Halo spin does not correlate with delta at low halo masses, but for higher mass haloes, high-spin haloes live in higher density environments at fixed M-h. Put together with the earlier instalments of this series, these data demonstrate that quenching processes have limited correlation with halo formation history, but the growth of active galaxies, as well as other detailed galaxies properties, are influenced by the details of halo assembly.
View Full Publication open_in_new
Abstract
We identify subhalos in dark matter-only (DMO) zoom-in simulations that are likely to be disrupted due to baryonic effects by using a random forest classifier trained on two hydrodynamic simulations of Milky Way (MW)-mass host halos from the Latte suite of the Feedback in Realistic Environments (FIRE) project. We train our classifier using five properties of each disrupted and surviving subhalo: pericentric distance and scale factor at first pericentric passage after accretion and scale factor, virial mass, and maximum circular velocity at accretion. Our five-property classifier identifies disrupted subhalos in the FIRE simulations with an 85% out-of-bag classification score. We predict surviving subhalo populations in DMO simulations of the FIRE host halos, finding excellent agreement with the hydrodynamic results; in particular, our classifier outperforms DMO zoom-in simulations that include the gravitational potential of the central galactic disk in each hydrodynamic simulation, indicating that it captures both the dynamical effects of a central disk and additional baryonic physics. We also predict surviving subhalo populations for a suite of DMO zoom-in simulations of MW-mass host halos, finding that baryons impact each system consistently and that the predicted amount of subhalo disruption is larger than the host-to-host scatter among the subhalo populations. Although the small size and specific baryonic physics prescription of our training set limits the generality of our results, our work suggests that machine-learning classification algorithms trained on hydrodynamic zoom-in simulations can efficiently predict realistic subhalo populations.
View Full Publication open_in_new
Abstract
We identify subhalos in dark matter-only (DMO) zoom-in simulations that are likely to be disrupted due to baryonic effects by using a random forest classifier trained on two hydrodynamic simulations of Milky Way (MW)-mass host halos from the Latte suite of the Feedback in Realistic Environments (FIRE) project. We train our classifier using five properties of each disrupted and surviving subhalo: pericentric distance and scale factor at first pericentric passage after accretion and scale factor, virial mass, and maximum circular velocity at accretion. Our five-property classifier identifies disrupted subhalos in the FIRE simulations with an 85% out-of-bag classification score. We predict surviving subhalo populations in DMO simulations of the FIRE host halos, finding excellent agreement with the hydrodynamic results; in particular, our classifier outperforms DMO zoom-in simulations that include the gravitational potential of the central galactic disk in each hydrodynamic simulation, indicating that it captures both the dynamical effects of a central disk and additional baryonic physics. We also predict surviving subhalo populations for a suite of DMO zoom-in simulations of MW-mass host halos, finding that baryons impact each system consistently and that the predicted amount of subhalo disruption is larger than the host-to-host scatter among the subhalo populations. Although the small size and specific baryonic physics prescription of our training set limits the generality of our results, our work suggests that machine-learning classification algorithms trained on hydrodynamic zoom-in simulations can efficiently predict realistic subhalo populations.
View Full Publication open_in_new
Abstract
Exploring the structural and physical properties of new vanadium dioxide (VO2) allotropes has attracted considerable interest because of the structure diversity and unique physical properties of VO2. Here, we demonstrate a reversible pressure-induced structural transition and metallization of the novel metastable polymorph VO2(Mx') and a thermally driven structural transition from VO2(Mx') to the monoclinic phase VO2(M1) at relative low temperature based on X-ray diffraction (XRD) and Raman and infrared spectroscopy. It is shown that the metastable phase VO2(Mx') undergoes the structural transitions of VO2(Mx')-(12 GPa) VO2(Mx '')-(30-80 GPa) VO2(X) upon compression, obviously different from the pressure-induced amorphization observed in other metastable phases VO2(A) and VO2(B). Moreover, the IR data demonstrated that the pressure-induced metallization (PIM) occurs in the VO2(Mx '') phase at about 40 GPa, which is mainly associated with electron-electron correlations. Further analysis suggests that all of the sample transforming into the same high-pressure VO2(X) phase with phase could mainly result from the VO6 octahedra and empty spaces between VO6 octahedra in their intermediate high pressure phases VO2(Mx '') and VO2(M1') following similar variations under pressure. These findings present new insight into the differences of structural transitions and physical properties between the stable and metastable phases of transition-metal oxides under pressure.
View Full Publication open_in_new

Pagination

  • Previous page chevron_left
  • …
  • Page 631
  • Page 632
  • Page 633
  • Page 634
  • Current page 635
  • Page 636
  • Page 637
  • Page 638
  • Page 639
  • …
  • Next page chevron_right
Subscribe to

Get the latest

Subscribe to our newsletters.

Privacy Policy
Home
  • Instagram instagram
  • Twitter twitter
  • Youtube youtube
  • Facebook facebook

Science

  • Biosphere Sciences & Engineering
  • Earth & Planets Laboratory
  • Observatories
  • Our Research Areas
  • Our Blueprint For Discovery

Legal

  • Financial Statements
  • Conflict of Interest Policy
  • Privacy Policy

Careers

  • Working at Carnegie
  • Scientific and Technical Jobs
  • Administrative & Support Jobs
  • Postdoctoral Program
  • Carnegie Connect (For Employees)

Contact Us

  • Contact Administration
  • Media Contacts

Business Address

5241 Broad Branch Rd. NW

Washington, DC 20015

place Map

© Copyright Carnegie Science 2026