Home Strategy Activities Grants Publications People Sponsors Blog Contact Us 


This shows you the differences between two versions of the page.

Link to this comparison view

publications:publi:tabseh17ascom [2019/04/25 08:49] (current)
Line 1: Line 1:
 +<​html><​div id="​bib">​
 +<p> <h1> TaBSEh17ascom</​h1>​
 + <​p><​span class="​BibAuthor">​D. Tapiador, A Berihuete, L.M. Sarro, F. Julbe, E. Huedo</​span>​. <span class="​BibJournalTitle">​Enabling data science in the Gaia mission archive: The present-day mass function and age distribution</​span>​. <span class="​BibJournalName">​Astronomy and Computing</​span>,​ 19:1-15, 2017.</​P><​p>​
 +<a name="​abstract"></​a><​h2>​ Abstract ​ </h2> <​P> ​
 +Recent advances in large scale computing architectures enable new opportunities to extract value out of the vast amounts of data being currently generated. However, their successful adoption is not straightforward in areas like science, as there are still some barriers that need to be overcome. Those comprise (i) the existence of legacy code that needs to be ported, (ii) the lack of high-level and use case specific frameworks that facilitate a smoother transition, or (iii) the scarcity of profiles with the balanced skill sets between the technological and scientific domains. The European Space Agency’s Gaia mission will create the largest and most precise three dimensional chart of our galaxy (the Milky Way), providing unprecedented position, parallax and proper motion measurements for about one billion stars. The successful exploitation of this data archive will depend on the ability to offer the proper infrastructure upon which scientists will be able to do exploration and modelling with this huge data set. In this paper, we present and contextualize these challenges by building two probabilistic models using Hierarchical Bayesian Modelling. These models represent a key challenge in astronomy and are of paramount importance for the Gaia mission itself. Moreover, we approach the implementation by leveraging a generic distributed processing engine through an existing software package for Markov chain Monte Carlo sampling. The two computationally intensive models are then validated with simulated data in different scenarios under specific restrictions,​ and their performance is assessed to prove their scalability. We argue that this approach will not only serve for the models in hand but also for exemplifying how to address similar problems in science, which may need to both scale to bigger data sets and reuse existing software as much as possible. This will lead to shorter time to science in massive data archives<​p>​
 + <a name="​keyword"></​a>​ <​h2>​Keywords </h2> <p> [ <a href="/​doku.php?​id=publications:​keyword:​tin2015-65469-p">​Tin2015-65469-p</​a>​ ] [ <a href="/​doku.php?​id=publications:​keyword:​cloud">​Cloud</​a>​ ] [ <a href="/​doku.php?​id=publications:​keyword:​grid">​Grid</​a>​ ] 
 +<a name="​contact"></​a><​h2>​ Contact ​ </h2> <​P> ​
 +<a href="​mailto:​ehuedo@fdi.ucm.es">​Eduardo ​ Huedo</​a> ​ <a href="/​ehuedo">​ <img src="/​lib/​exe/​fetch.php?​w=&​h=&​cache=cache&​media=html_icon.png"​ align=top border=0 alt =""></​a><​br> ​
 +<a name="​bib"></​a><​h2>​ BibTex Reference ​ </h2> <​P> ​
 +@article{TaBSEh17ascom,​ <​br>&​nbsp;&​nbsp;&​nbsp;​Author = {Tapiador, D. and Berihuete, A and Sarro, L.M. and Julbe, F. and Huedo, E.},<​br>&​nbsp;&​nbsp;&​nbsp;​Title = {Enabling data science in the Gaia mission archive: The present-day mass function and age distribution},<​br>&​nbsp;&​nbsp;&​nbsp;​Journal = {Astronomy and Computing},<​br>&​nbsp;&​nbsp;&​nbsp;​Volume = {19},<​br>&​nbsp;&​nbsp;&​nbsp;​Pages = {1--15},<​br>&​nbsp;&​nbsp;&​nbsp;​Year = {2017}<​br>​} <​br><​p>​
Admin · Log In