Claude 3.5 Sonnet comes out on top in Galileoâ€™s Hallucination Index

The AI company Galileo has just announced its latest Hallucination Index, which is a framework that evaluates 22 leading generative AI models.Â

Models are tested using a metric called context adherence, which measures â€œclosed-domain hallucinations: cases where your model said things that were not provided in the context.â€

The best performing model overall for RAG, according to the ranking, is Claude 3.5 Sonnet from Anthropic. Galileo said that this model and Anthropicâ€™s other model Claude 3 Opus had near perfect scores, beating out OpenAIâ€™s models, which won last year.Â

From a cost perspective, the best performing model was Googleâ€™s Gemini 1.5 Flash. And Alibabaâ€™s Qwen2-72B-Instruct was overall the best performing open source model, though in short context RAG tests, Metaâ€™s llama-3-60b-instruct was the best.Â

Broken down by context length, the best closed-source model in short context RAG was Claude 3.5 Sonnet, in medium context RAG was Googleâ€™s Gemini-1.5-flash-001 (with cost being the tiebreaker with other models that also scored a perfect score), and in large context RAG was again Claude 3.5 Sonnet.Â

â€œIn todayâ€™s rapidly evolving AI landscape, developers and enterprises face a critical challenge: how to harness the power of generative AI while balancing cost, accuracy, and reliability. Current benchmarks are often based on academic use-cases, rather than real-world applications. Our new Index seeks to address this by testing models in real-world use cases that require the LLMs to retrieve data, a common practice in enterprise AI implementations,â€ says Vikram Chatterji, CEO and co-founder of Galileo. â€œAs hallucinations continue to be a major hurdle, our goal wasnâ€™t to just rank models, but rather give AI teams and leaders the real-world data they need to adopt the right model, for the right task, at the right price.â€

You may also likeâ€¦

Anthropicâ€™s new Claude 3.5 Sonnet model already competitive with GPT-4o and Gemini 1.5 Pro on multiple benchmarks

Metaâ€™s new Llama 3.1 model competes with GPT-4o and Claude 3.5 Sonnet

The post Claude 3.5 Sonnet comes out on top in Galileoâ€™s Hallucination Index appeared first on SD Times.

Source: Read MoreÂ

Unable to run feature files parallel with JUnit 4 and “mvn test” command?

June 23, 2024

I want to run my 5 Cucumber feature files with “Mvn test” Command parallely. Now, the “mvn test” or ‘mvn build” runs fine and maven builds the project, but feature file dont run. Can this be due to version mismatch? I am using JUnit 4, Cucumber 7.4.0 and Maven 6.9.3. Tried with lastest 7.18.0 Cucumber but nothing works. Also, I am using maven surefire plugin to run 5 feature files parallely, but nothing runs. NO feature file is running. Do in need to make multiple runners? I Just want parallel execution at feature levels, not scenario level. Please provide some assistance. I am stuck.
[INFO]
[INFO] — maven-surefire-plugin:3.3.0:test (default-test) @ SmokeTestAutomation —
[INFO] ————————————————————————
[INFO] BUILD SUCCESS
[INFO] ————————————————————————
[INFO] Total time: 22.154 s
[INFO] Finished at: 2024-06-23T17:36:09+05:30
[INFO] ————————————————————————

POM.xml:
<dependencies>
<!– https://mvnrepository.com/artifact/org.apache.logging.log4j/log4j-core –>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>2.23.1</version>
</dependency>
<!– https://mvnrepository.com/artifact/org.apache.logging.log4j/log4j-api –>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>2.23.1</version>
</dependency>
<dependency>
<groupId>net.sourceforge.jtds</groupId>
<artifactId>jtds</artifactId>
<version>1.3.1</version>
</dependency>
<!– https://mvnrepository.com/artifact/org.apache.poi/poi-examples –>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-examples</artifactId>
<version>5.1.0</version>
</dependency>
<!– https://mvnrepository.com/artifact/org.apache.poi/poi-ooxml –>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>5.1.0</version>
</dependency>
<!– https://mvnrepository.com/artifact/org.seleniumhq.selenium/selenium-java –>
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-java</artifactId>
<version>4.13.0</version>
</dependency>
<!– https://mvnrepository.com/artifact/org.seleniumhq.selenium/selenium-api –>
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-api</artifactId>
<version>4.13.0</version>
</dependency>
<!– https://mvnrepository.com/artifact/io.cucumber/cucumber-testng –>
<dependency>
<groupId>io.cucumber</groupId>
<artifactId>cucumber-java</artifactId>
<version>7.4.0</version>
</dependency>
<!– https://mvnrepository.com/artifact/io.cucumber/cucumber-core –>
<dependency>
<groupId>io.cucumber</groupId>
<artifactId>cucumber-core</artifactId>
<version>7.4.0</version>
</dependency>
<!– https://mvnrepository.com/artifact/io.cucumber/cucumber-jvm –>
<dependency>
<groupId>io.cucumber</groupId>
<artifactId>cucumber-jvm</artifactId>
<version>7.4.0</version>
<type>pom</type>
</dependency>
<!– https://mvnrepository.com/artifact/io.cucumber/cucumber-junit –>
<dependency>
<groupId>io.cucumber</groupId>
<artifactId>cucumber-junit</artifactId>
<version>7.4.0</version>
</dependency>
<!– https://mvnrepository.com/artifact/io.cucumber/cucumber-testng –>
<dependency>
<groupId>io.cucumber</groupId>
<artifactId>cucumber-testng</artifactId>
<version>7.4.0</version>
</dependency>
<!– https://mvnrepository.com/artifact/org.testng/testng –>
<dependency>
<groupId>org.testng</groupId>
<artifactId>testng</artifactId>
<version>7.4.0</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>openxml4j</artifactId>
<version>1.0-beta</version>
</dependency>
<!– https://mvnrepository.com/artifact/commons-collections/commons-collections –>
<dependency>
<groupId>commons-collections</groupId>
<artifactId>commons-collections</artifactId>
<version>3.2.1</version>
</dependency>
<dependency>
<groupId>com.opencsv</groupId>
<artifactId>opencsv</artifactId>
<version>4.1</version>
</dependency>
<!– <dependency>
<groupId>org.testng</groupId>
<artifactId>testng</artifactId>
<version>7.4.0</version>
</dependency> –>
<!– https://mvnrepository.com/artifact/org.junit.jupiter/junit-jupiter-api –>
<!– https://mvnrepository.com/artifact/junit/junit –>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.13.2</version>
</dependency>
<!– https://mvnrepository.com/artifact/com.jcabi/jcabi-log –>
<dependency>
<groupId>com.aventstack</groupId>
<artifactId>extentreports</artifactId>
<version>5.1.1</version>
</dependency>
<dependency>
<groupId>tech.grasshopper</groupId>
<artifactId>extentreports-cucumber7-adapter</artifactId>
<version>1.14.0</version>
</dependency>
</dependencies>
<build>

<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.8.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>3.3.0</version>
<configuration>
<includes>
<include>**/TestRunner.java</include>
</includes>
<parallel>methods</parallel>
<useUnlimitedThreads>true</useUnlimitedThreads>
<testFailureIgnore>true</testFailureIgnore>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-failsafe-plugin</artifactId>
<version>3.3.0</version>
<configuration>
<includes>
<include>**/TestRunner.java</include>
</includes>
<parallel>methods</parallel>
<useUnlimitedThreads>true</useUnlimitedThreads>
</configuration>
</plugin>
</plugins>
</build>
</project>

TestRUnner.Java
import org.junit.runner.RunWith;

import io.cucumber.junit.Cucumber;
import io.cucumber.junit.CucumberOptions;

@RunWith(Cucumber.class)
@CucumberOptions(glue = { “stepDefinitions” }, features = { “src/test/resources/features” },

monochrome = false, tags = “@UserLevel and @AdminLevel”)

public class TestRunner {

}

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

If you think you can do better than Xbox or PlayStation in the Console Wars, you may just want to try out this card game

Surviving a 10 year stint in dev hell, this retro-styled hack n’ slash has finally arrived on Xbox

Save $400 on the best Samsung TVs, laptops, tablets, and more when you sign up for Verizon 5G Home or Home Internet

NodeSource N|Solid Runtime Release – May 2025: Performance, Stability & the Final Update for v18

NodeSource N|Solid Runtime Release – May 2025: Performance, Stability & the Final Update for v18

Big Changes at Meteor Software: Our Next Chapter

Apps in Generative AI – Transforming the Digital Experience

Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

If you think you can do better than Xbox or PlayStation in the Console Wars, you may just want to try out this card game

Surviving a 10 year stint in dev hell, this retro-styled hack n’ slash has finally arrived on Xbox

Claude 3.5 Sonnet comes out on top in Galileoâ€™s Hallucination Index

February 2025 Baseline monthly digest

Learn A1 Level Spanish

Balancing discipline, time management, and effective communication presents ongoing challenges in parenting

CVE-2025-48175 – Libavif Integer Overflow Vulnerability

I love Microsoft’s Avowed, but it further cements how much I dislike the concept of paid ‘Early Access’ and a forced FOMO.

New Razor Blade 16 Laptop with an RTX 5060 starts at $1999, according to leaked spec sheet

Unable to run feature files parallel with JUnit 4 and “mvn test” command?

As Part of JMeter WEB DRIVER(Selenium Scripts) Integration With Azure Pipeline and how can I add Headless Browser in Azure Pipeline

Can Microsoft convince gamers to upgrade to Windows 11?

Aider: An AI Tool that Lets You Do Pair Programming in Your Terminal

Claude 3.5 Sonnet comes out on top in Galileoâ€™s Hallucination Index

Related Posts