User Details
- User Since
- Sep 8 2021, 4:13 AM (164 w, 1 d)
- Availability
- Available
- LDAP User
- Unknown
- MediaWiki User
- Hannah Bast [ Global Accounts ]
Sep 13 2024
@Sj Getting the updates in batches would be perfectly fine. But how do you want to verify that it works without having a reference endpoint to compare to?
Sep 6 2024
@tfmorris It was a showstopper for using it as a drop-in replacement for Blazegraph two years ago. SPARQL 1.1 Update was always on QLever's agenda (already two years ago), a first proof of concept was implemented in March 2023, a functional version has been available since May 2024, and we are currently in the process of fully integrating it into the main branch. Unfortunately, Wikidata still does not provide a publicly accessible update stream (this is difficult for a variety of reasons). As soon as that is available, we could provide a SPARQL endpoint that is in sync with the public Wikidata SPARQL endpoint.
Feb 21 2024
@RKemper Is your point that the queries should return a result? Neither DBLP nor Wikidata have the predicate foaf:name, so it's clear that both SERVICE queries return an empty result. Here is an example for a query that gives a result:
Feb 11 2024
@Harej Thanks for the quick reply, James! Are you saying that the script is scraping + parsing https://www.wikidata.org/wiki/Special:RecentChanges to obtain the triples to be added and deleted? Or is there a different way to access that page, which gives you the added and deleted triples in a more machine-friendly format?
I just looked into https://github.com/wikimedia/wikidata-query-rdf, which provides a tool runUpdate.sh. When I run it for a Blazegraph instance with exactly one triple of the form <http://www.wikidata.org> <http://schema.org/dateModified> "2024-02-11T05:42Z"^^xsd:dateTime, it will continuously update the instance with all changes since that date. I have two questions:
Feb 1 2024
Yes, https://qlever.cs.uni-freiburg.de/api/dblp is the URL for API calls, whereas https://qlever.cs.uni-freiburg.de/dblp (without the /api) is the URL of the QLever UI. Same for all the other endpoints.
Sep 27 2023
@dcausse @Gehel @WolfgangFahl QLever can now also produce application/sparql-results+xml. Here is an example:
Sep 8 2023
@dcausse I am confused, where does https://data.nlg.gr/sparql come from? I thought the endpoint in question were https://qlever.cs.uni-freiburg.de/api/dblp and https://qlever.cs.uni-freiburg.de/api/wikidata, where the following command lines work just fine:
@Gehel Thanks for the reply! But to clarify, what I am asking is not to do something different for different deferation endpoints. It's the same for every federation endpoint, namely sending the header
Aug 30 2023
Is it possible to configure Blazegraph to send the following Accept header:
Feb 26 2022
To add to this, the two-index approach has another rather beautiful property:
Feb 5 2022
@Pulquero AFAIK the two main problems with Blazegraph are:
@Hannah_Bast maybe this interests you? Do you think this system would perform well considering the load on WDQS and the type of queries we have?
@Pulquero Thank you for this interesting piece of information. I have a few questions:
Dec 11 2021
Oct 15 2021
Oct 8 2021
@DD063520: You find some details at https://github.com/ad-freiburg/qlever/blob/master/docs/quickstart.md .
Oh, ok. Could you give an example of a query that has no "highly selective triples" so I can test it on QLever vs. BG?
Oct 2 2021
@Justin0x2004 Thanks, Justin. QLever already supports something like named subqueries. You can simply have the same subquery in multiple places and it will be evaluated only once and for the other occurrences, the result will be reused.
Sep 30 2021
@So9q I have commented on your comments concerning Rya in the "Evaluate Apache Rya as alternative to Blazegraph": https://phabricator.wikimedia.org/T289561#7393732
I have now revised QLever's Quickstart page: https://github.com/ad-freiburg/qlever
We looked a bit into Apache Rya. A couple of observations:
Sep 28 2021
I will provide a detailed reply later today, also to the other thread. Four things for now:
Sep 17 2021
It's of course up to you (the Wikidata team) to decide this. But I wouldn't dismiss this idea so easily.
Sep 15 2021
Wikibase doesn’t store data in RDF, so dumping the data set means parsing the native representation (JSON) and writing it out again as RDF, including some metadata for each page.
Can you or anyone else explain why the data dump takes so long, Lukas? One would expect that it is much easier to dump a (snapshot of a) dataset than to build a complex data structure from it. Also, dumping and compression are easily parallelized. And the pure volume isn't that large (< 100 GB compressed).
I agree with Kingsley that you don't need a distributed SPARQL engine when the knowledge graph fits on a single machine and will do so also in the future. Which is clearly the case for Wikidata, since it's even the case for the ten times larger UniProt (which at the time of this writing already contains over 90 billion triples).
For whoever is interested, I wrote more about the QLever SPARQL engine on this thread: https://phabricator.wikimedia.org/T290839 .
PS: Note that large query throughputs are not a problem for a SPARQL engine that runs on a single standard PC or server. Depending on the overall demand, you can just run multiple instances on separate machines and trivially distribute the queries. What's more important, I think, is the processing time for individual queries because you cannot easily distribute the processing of an individual query. And it does make quite a difference for the user experience whether a query takes seconds, minutes, or hours. The current SPARQL endpoint for Wikidata (realized using Blazegraph) times out a lot when the queries are a bit harder.
Yes, QLever is developed in our group at the University of Freiburg. I presented it to the Wikidata team in March. You can try out a demo on the complete Wikidata on https://qlever.cs.uni-freiburg.de/wikidata . You can also select other interesting large knowledge graphs there, for example, the complete OpenStreetMap data.
Sep 14 2021
I have already talked about Sage with Lukas last November. I don't think that Sage is an option for Wikidata. The focus of Sage is on the ability to pause and resume SPARQL queries (which is a very useful feature), not on efficiency. For example, if you run the people-professions query from https://phabricator.wikimedia.org/T206560 on their demo instance of Wikidata http://sage.univ-nantes.fr/#query (which has only 2.3B triples), it takes forever. Also simple queries are quite slow. For example, the following query (all humans) produces results at a rate of around a thousand rows per second:
Sep 13 2021
Would it be an option that one of these two backends uses a SPARQL engine that does not support the SPARQL Update operation, but instead rebuilds its index periodically, for example, every 24 hours?
Sep 9 2021
Thanks, Kingsley, that explains it!