Page MenuHomePhabricator

Hannah_Bast
User

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Sep 8 2021, 4:13 AM (164 w, 1 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
Hannah Bast [ Global Accounts ]

Recent Activity

Sep 13 2024

Hannah_Bast added a comment to T330525: Migrate Wikidata off of Blazegraph.

@Sj Getting the updates in batches would be perfectly fine. But how do you want to verify that it works without having a reference endpoint to compare to?

Sep 13 2024, 12:06 AM · Wikidata, Wikidata-Query-Service

Sep 6 2024

Hannah_Bast added a comment to T330525: Migrate Wikidata off of Blazegraph.

@tfmorris It was a showstopper for using it as a drop-in replacement for Blazegraph two years ago. SPARQL 1.1 Update was always on QLever's agenda (already two years ago), a first proof of concept was implemented in March 2023, a functional version has been available since May 2024, and we are currently in the process of fully integrating it into the main branch. Unfortunately, Wikidata still does not provide a publicly accessible update stream (this is difficult for a variety of reasons). As soon as that is available, we could provide a SPARQL endpoint that is in sync with the public Wikidata SPARQL endpoint.

Sep 6 2024, 6:15 PM · Wikidata, Wikidata-Query-Service

Feb 21 2024

Hannah_Bast added a comment to T339347: qlever dblp endpoint for wikidata federated query nomination.

@RKemper Is your point that the queries should return a result? Neither DBLP nor Wikidata have the predicate foaf:name, so it's clear that both SERVICE queries return an empty result. Here is an example for a query that gives a result:

Feb 21 2024, 11:48 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Wikidata-Query-Service, Wikidata

Feb 11 2024

Hannah_Bast added a comment to T294133: Expose rdf-streaming-updater.mutation content through EventStreams.

@Harej Thanks for the quick reply, James! Are you saying that the script is scraping + parsing https://www.wikidata.org/wiki/Special:RecentChanges to obtain the triples to be added and deleted? Or is there a different way to access that page, which gives you the added and deleted triples in a more machine-friendly format?

Feb 11 2024, 6:22 AM · Epic, Discovery-Search (Current work), Data-Engineering, Event-Platform, Analytics, Wikidata, EventStreams
Hannah_Bast added a comment to T294133: Expose rdf-streaming-updater.mutation content through EventStreams.

I just looked into https://github.com/wikimedia/wikidata-query-rdf, which provides a tool runUpdate.sh. When I run it for a Blazegraph instance with exactly one triple of the form <http://www.wikidata.org> <http://schema.org/dateModified> "2024-02-11T05:42Z"^^xsd:dateTime, it will continuously update the instance with all changes since that date. I have two questions:

Feb 11 2024, 6:01 AM · Epic, Discovery-Search (Current work), Data-Engineering, Event-Platform, Analytics, Wikidata, EventStreams

Feb 1 2024

Hannah_Bast added a comment to T339347: qlever dblp endpoint for wikidata federated query nomination.

Yes, https://qlever.cs.uni-freiburg.de/api/dblp is the URL for API calls, whereas https://qlever.cs.uni-freiburg.de/dblp (without the /api) is the URL of the QLever UI. Same for all the other endpoints.

Feb 1 2024, 12:12 AM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Wikidata-Query-Service, Wikidata

Sep 27 2023

Hannah_Bast added a comment to T339347: qlever dblp endpoint for wikidata federated query nomination.

@dcausse @Gehel @WolfgangFahl QLever can now also produce application/sparql-results+xml. Here is an example:

Sep 27 2023, 9:24 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Wikidata-Query-Service, Wikidata

Sep 8 2023

Hannah_Bast added a comment to T339347: qlever dblp endpoint for wikidata federated query nomination.

@dcausse I am confused, where does https://data.nlg.gr/sparql come from? I thought the endpoint in question were https://qlever.cs.uni-freiburg.de/api/dblp and https://qlever.cs.uni-freiburg.de/api/wikidata, where the following command lines work just fine:

Sep 8 2023, 2:11 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Wikidata-Query-Service, Wikidata
Hannah_Bast added a comment to T339347: qlever dblp endpoint for wikidata federated query nomination.

@Gehel Thanks for the reply! But to clarify, what I am asking is not to do something different for different deferation endpoints. It's the same for every federation endpoint, namely sending the header

Sep 8 2023, 8:13 AM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Wikidata-Query-Service, Wikidata

Aug 30 2023

Hannah_Bast added a comment to T339347: qlever dblp endpoint for wikidata federated query nomination.

Is it possible to configure Blazegraph to send the following Accept header:

Aug 30 2023, 1:06 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Wikidata-Query-Service, Wikidata

Feb 26 2022

Hannah_Bast added a comment to T291903: Evaluate QLever as a time lagging SPARQL backend to offload the BlazeGraph cluster.

To add to this, the two-index approach has another rather beautiful property:

Feb 26 2022, 7:34 PM · Wikidata, Wikidata-Query-Service

Feb 5 2022

Hannah_Bast added a comment to T289621: Evaluate Halyard as alternative to Blazegraph.

@Pulquero AFAIK the two main problems with Blazegraph are:

Feb 5 2022, 4:14 PM · MediaWiki-Stakeholders-Group, Wikidata, Epic, Wikidata-Query-Service
Hannah_Bast added a comment to T289621: Evaluate Halyard as alternative to Blazegraph.

@Hannah_Bast maybe this interests you? Do you think this system would perform well considering the load on WDQS and the type of queries we have?

Feb 5 2022, 2:14 PM · MediaWiki-Stakeholders-Group, Wikidata, Epic, Wikidata-Query-Service
Hannah_Bast added a comment to T289621: Evaluate Halyard as alternative to Blazegraph.

@Pulquero Thank you for this interesting piece of information. I have a few questions:

Feb 5 2022, 1:53 PM · MediaWiki-Stakeholders-Group, Wikidata, Epic, Wikidata-Query-Service

Dec 11 2021

Hannah_Bast added a comment to T206560: [Epic] Evaluate alternatives to Blazegraph.

I am taking the liberty to polute the thread with a reference to "MillenniumDB: A Persistent, Open-Source, Graph Database" https://arxiv.org/pdf/2111.01540.pdf from November 2021. Millennium may have some serious limitations in terms of requirements that can be setup, but interestingly they write "However, MillenniumDB was designed with the complete version of Wikidata – including qualifiers, references, etc. – in mind." and their benchmarks seems strong. They compare against Blazegraph, Jena, Virtuoso and Neo4J.

Dec 11 2021, 10:12 AM · Wikidata, Epic, Wikidata-Query-Service

Oct 15 2021

Hannah_Bast added a comment to T206560: [Epic] Evaluate alternatives to Blazegraph.

I imported the wikidata-DB into neo4j and it works quite well.

Oct 15 2021, 3:07 PM · Wikidata, Epic, Wikidata-Query-Service

Oct 8 2021

Hannah_Bast added a comment to T291903: Evaluate QLever as a time lagging SPARQL backend to offload the BlazeGraph cluster.

@DD063520: You find some details at https://github.com/ad-freiburg/qlever/blob/master/docs/quickstart.md .

Oct 8 2021, 12:50 PM · Wikidata, Wikidata-Query-Service
Hannah_Bast added a comment to T291903: Evaluate QLever as a time lagging SPARQL backend to offload the BlazeGraph cluster.

Oh, ok. Could you give an example of a query that has no "highly selective triples" so I can test it on QLever vs. BG?

Oct 8 2021, 6:16 AM · Wikidata, Wikidata-Query-Service

Oct 2 2021

Hannah_Bast added a comment to T291903: Evaluate QLever as a time lagging SPARQL backend to offload the BlazeGraph cluster.

@Justin0x2004 Thanks, Justin. QLever already supports something like named subqueries. You can simply have the same subquery in multiple places and it will be evaluated only once and for the other occurrences, the result will be reused.

Oct 2 2021, 2:46 AM · Wikidata, Wikidata-Query-Service

Sep 30 2021

Hannah_Bast added a comment to T290839: Evaluate a double backend strategy for WDQS.

@So9q I have commented on your comments concerning Rya in the "Evaluate Apache Rya as alternative to Blazegraph": https://phabricator.wikimedia.org/T289561#7393732

Sep 30 2021, 11:53 PM · Wikidata, Wikidata-Query-Service
Hannah_Bast added a comment to T291903: Evaluate QLever as a time lagging SPARQL backend to offload the BlazeGraph cluster.

I have now revised QLever's Quickstart page: https://github.com/ad-freiburg/qlever

Sep 30 2021, 11:43 PM · Wikidata, Wikidata-Query-Service
Hannah_Bast added a comment to T289561: Evaluate Apache Rya as alternative to Blazegraph.

We looked a bit into Apache Rya. A couple of observations:

Sep 30 2021, 11:25 PM · Wikidata, Wikidata-Query-Service

Sep 28 2021

Hannah_Bast added a comment to T291903: Evaluate QLever as a time lagging SPARQL backend to offload the BlazeGraph cluster.

I will provide a detailed reply later today, also to the other thread. Four things for now:

Sep 28 2021, 8:51 AM · Wikidata, Wikidata-Query-Service

Sep 17 2021

Hannah_Bast added a comment to T290839: Evaluate a double backend strategy for WDQS.

It's of course up to you (the Wikidata team) to decide this. But I wouldn't dismiss this idea so easily.

Sep 17 2021, 12:08 AM · Wikidata, Wikidata-Query-Service

Sep 15 2021

Hannah_Bast added a comment to T290839: Evaluate a double backend strategy for WDQS.

Wikibase doesn’t store data in RDF, so dumping the data set means parsing the native representation (JSON) and writing it out again as RDF, including some metadata for each page.

Sep 15 2021, 7:08 PM · Wikidata, Wikidata-Query-Service
Hannah_Bast added a comment to T290839: Evaluate a double backend strategy for WDQS.

Can you or anyone else explain why the data dump takes so long, Lukas? One would expect that it is much easier to dump a (snapshot of a) dataset than to build a complex data structure from it. Also, dumping and compression are easily parallelized. And the pure volume isn't that large (< 100 GB compressed).

Sep 15 2021, 12:49 PM · Wikidata, Wikidata-Query-Service
Hannah_Bast added a comment to T206561: Evaluate Virtuoso as alternative to Blazegraph.

I agree with Kingsley that you don't need a distributed SPARQL engine when the knowledge graph fits on a single machine and will do so also in the future. Which is clearly the case for Wikidata, since it's even the case for the ten times larger UniProt (which at the time of this writing already contains over 90 billion triples).

Sep 15 2021, 5:41 AM · Wikidata, Wikidata-Query-Service
Hannah_Bast added a comment to T206560: [Epic] Evaluate alternatives to Blazegraph.

For whoever is interested, I wrote more about the QLever SPARQL engine on this thread: https://phabricator.wikimedia.org/T290839 .

Sep 15 2021, 2:51 AM · Wikidata, Epic, Wikidata-Query-Service
Hannah_Bast added a comment to T290839: Evaluate a double backend strategy for WDQS.

PS: Note that large query throughputs are not a problem for a SPARQL engine that runs on a single standard PC or server. Depending on the overall demand, you can just run multiple instances on separate machines and trivially distribute the queries. What's more important, I think, is the processing time for individual queries because you cannot easily distribute the processing of an individual query. And it does make quite a difference for the user experience whether a query takes seconds, minutes, or hours. The current SPARQL endpoint for Wikidata (realized using Blazegraph) times out a lot when the queries are a bit harder.

Sep 15 2021, 2:43 AM · Wikidata, Wikidata-Query-Service
Hannah_Bast added a comment to T290839: Evaluate a double backend strategy for WDQS.

Yes, QLever is developed in our group at the University of Freiburg. I presented it to the Wikidata team in March. You can try out a demo on the complete Wikidata on https://qlever.cs.uni-freiburg.de/wikidata . You can also select other interesting large knowledge graphs there, for example, the complete OpenStreetMap data.

Sep 15 2021, 2:35 AM · Wikidata, Wikidata-Query-Service

Sep 14 2021

Hannah_Bast added a comment to T290839: Evaluate a double backend strategy for WDQS.

I have already talked about Sage with Lukas last November. I don't think that Sage is an option for Wikidata. The focus of Sage is on the ability to pause and resume SPARQL queries (which is a very useful feature), not on efficiency. For example, if you run the people-professions query from https://phabricator.wikimedia.org/T206560 on their demo instance of Wikidata http://sage.univ-nantes.fr/#query (which has only 2.3B triples), it takes forever. Also simple queries are quite slow. For example, the following query (all humans) produces results at a rate of around a thousand rows per second:

Sep 14 2021, 12:40 PM · Wikidata, Wikidata-Query-Service

Sep 13 2021

Hannah_Bast added a comment to T290839: Evaluate a double backend strategy for WDQS.

Would it be an option that one of these two backends uses a SPARQL engine that does not support the SPARQL Update operation, but instead rebuilds its index periodically, for example, every 24 hours?

Sep 13 2021, 6:00 AM · Wikidata, Wikidata-Query-Service

Sep 9 2021

Hannah_Bast added a comment to T206560: [Epic] Evaluate alternatives to Blazegraph.

Thanks, Kingsley, that explains it!

Sep 9 2021, 6:54 PM · Wikidata, Epic, Wikidata-Query-Service

Sep 8 2021

Hannah_Bast added a comment to T206560: [Epic] Evaluate alternatives to Blazegraph.

[1] Our Live Wikidata SPARQL Query Endpoint

Sep 8 2021, 4:37 AM · Wikidata, Epic, Wikidata-Query-Service